perf 工具测量 cache 命中率

2023-11-22 21:17:50

前言

通过之前的文章《缓存一致性》，我们知道， cache 的命中与否，对程序的性能影响非常大。这点在网络性能方面表现地更为强烈，如果要处理的数据包不在 cache 中，将极大地拉低吞吐量。
之前我们通过程序的运行，直观地感受到了由缓存一致性造成的 cache miss 引起的程序性能下降。今天我们使用 perf 工具，实际测量 cache miss 的比率是多少，这将会使你更加深刻地了解 cache 及其对性能的影响。

代码示例

代码还是原来的代码

#include 
#include 
#include 
#include #define COUNT 1000000000struct _t {// long p1, p2, p3, p4, p5, p6, p7;long x;// long p9, p10, p11, p12, p13, p14, p15;
};struct _t a;
struct _t b;void *test_thread1(void *arg)
{for (long i = 0; i < COUNT; i++)a.x = i;return NULL;
}void *test_thread2(void *arg)
{for (long i = 0; i < COUNT; i++)b.x = i;return NULL;
}int main(int argc, char *argv[])
{pthread_t test1_thread_t;pthread_t test2_thread_t;if (pthread_create(&test1_thread_t, NULL, test_thread1, "test_1_thread") != 0) {printf("test1_thread_t create error\n");exit(1);}if (pthread_create(&test2_thread_t, NULL, test_thread2, "test_2_thread") != 0) {printf("test2_thread_t create error\n");exit(1);}pthread_join(test1_thread_t, NULL);pthread_join(test2_thread_t, NULL);return EXIT_SUCCESS;
}

Makefile


CC=/home/liyongjun/project/board/buildroot/OrangePiPC/host/bin/arm-linux-gccTARGET=cacheline_not_fill
# TARGET=cacheline_fillall:${CC} ${TARGET}.c -g -O0 -o ${TARGET}.out -Wall -l pthreadclean:rm *.outtftp:cp ${TARGET}.out ~/tftp

运行

# ./perf stat -e cache-references -e cache-misses ./cacheline_not_fill.outPerformance counter stats for './cacheline_not_fill.out':12005744527      cache-references986698086      cache-misses              #    8.219 % of all cache refs15.095276549 seconds time elapsed29.822868000 seconds user0.000000000 seconds sys

# ./perf stat -e cache-references -e cache-misses ./cacheline_fill.outPerformance counter stats for './cacheline_fill.out':12005381835      cache-references63555      cache-misses              #    0.001 % of all cache refs13.942023631 seconds time elapsed27.839129000 seconds user0.000000000 seconds sys

没有缓存行填充的代码，cache-misses 达 8.219%，运行时长为 15s；
进行缓存行填充的代码，cache-misses 只有 0.001%，运行时长 13.9s。
程序的执行效率提高了 7.3%，这个提高要是放在网络吞吐量上是非常可观的。

总结

cache 命中率低将会严重影响程序性能、网络吞吐量等，因此写代码时应尽量避免程序 cache miss。可使用的方法如在《iCache && dCache》介绍的代码段按功能布局、预取、缓存行对齐等。并且可以使用 perf 工具实际测量缓存命中率。

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > cache命中率怎么检测
下一篇 > oracle 各种命中率,oracle 存命中率收集

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

perf 工具测量 cache 命中率

前言

代码示例

运行

总结

相关文章