Python数据科学手册（笔记一）

2023-11-24 08:44:55

所有代码运行在jupyter notebook中

1、利用%lprun进行逐行分析

用 %prun 对代码中的每个函数进行分析非常有用，但有时逐行代码分析报告更方便。首先利用 Python 的包管理工具 pip 安装 line_profiler 包：

 pip install line_profiler

接下来可以 line_profiler 包提供的 IPython 扩展：
In [1]：

%load_ext line_profiler

In [2]：

def sum_of_lists(N):total = 0for i in range(5):L = [j^(j >> i) for j in range(N)]total +=sum(L)return total

现在 %lprun 命令就可以对所有函数进行逐行分析了。在下面的例子中，我们需要明确指出要分析哪些函数：
In [3]：

%prun sum_of_lists(1000000)

Out [3]：

Timer unit: 1e-07 sTotal time: 0.0108332 s
File: <ipython-input-2-54cf9a3b9717>
Function: sum_of_lists at line 1Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================1                                           def sum_of_lists(N):2         1         19.0     19.0      0.0      total = 03         6        109.0     18.2      0.1      for i in range(5):4         5     104182.0  20836.4     96.2          L = [j^(j >> i) for j in range(N)]5         5       4015.0    803.0      3.7          total +=sum(L)6         1          7.0      7.0      0.0      return total

我们可以看到程序中哪些地方最耗时。可以通过这些信息修改代码，使其更高效地实现我们的目的。

2、利用%memit和%mprun进行内存分析

另一种分析是分析一个操作所用的内存量，首先用 pip 安装这个扩展：

pip install memory_profiler

然后导入该扩展：
In [1]：

%load_ext memory_profiler

内存分析扩展包括两个有用的魔法函数：%memit 魔法函数（它提供的内存消耗计算功能类似于 %timeit）和 %mprun 魔法函数（它提供的内存消耗计算功能类似于 %lprun）。%memit 函数用起来很简单：
In [2]：

%memit sum_of_lists(1000000)

Out [2]：

peak memory: 125.06 MiB, increment: 73.29 MiB

可以看到，这个函数大概消耗了 125.86MB 的内存。

对于逐行代码的内存消耗描述，首先用 %%file 魔法函数创建一个简单的模块，将该模块命名为 mprun_demo.py。它包含 sum_of_lists 函数，该函数中包含一次加法，能使内存分析结果更清晰：
In [3]：

%%file mprun_demo.py
def sum_of_lists(N):total = 0for i in range(5):L = [j^(j >> i) for j in range(N)]total +=sum(L)del Lreturn total

Out [3]：

Writing mprun_demo.py

重新导入函数，并运行逐行内存分析器：
In [4]：

from mprun_demo import sum_of_lists 
%mprun -f sum_of_lists sum_of_lists(100000)

Out [4]：

Filename: F:\代码01Data\数据分析学习代码\Python数据科学手册\练习代码\mprun_demo.pyLine #    Mem usage    Increment   Line Contents
================================================1     81.0 MiB     81.0 MiB   def sum_of_lists(N):2     81.0 MiB      0.0 MiB       total = 03     82.8 MiB      0.0 MiB       for i in range(5):4     83.3 MiB      0.6 MiB           L = [j^(j >> i) for j in range(N)]5     83.3 MiB      0.0 MiB           total +=sum(L)6     82.8 MiB      0.0 MiB           del L7     82.8 MiB      0.0 MiB       return total

可以更改代码，把这利用到分析其他代码的内存上，这样分析可以提高算法的效率。
Increment 列告诉我们每行代码对总内存预算的影响。如果想了解这个函数怎么用，在jupyter notebook里输入：（用符号?获取文档，通过符号??获取源代码）

%memit?
# %memit??

结果为：

Docstring:
Measure memory usage of a Python statementUsage, in line mode:%memit [-r<R>t<T>i<I>] statementUsage, in cell mode:%%memit [-r<R>t<T>i<I>] setup_codecode...code...This function can be used both as a line and cell magic:- In line mode you can measure a single-line statement (though multipleones can be chained with using semicolons).- In cell mode, the statement in the first line is used as setup code(executed but not measured) and the body of the cell is measured.The cell body has access to any variables created in the setup code.Options:
-r<R>: repeat the loop iteration <R> times and take the best result.
Default: 1-t<T>: timeout after <T> seconds. Default: None-i<I>: Get time information at an interval of I times per second.Defaults to 0.1 so that there is ten measurements per second.-c: If present, add the memory usage of any children process to the report.-o: If present, return a object containing memit run details-q: If present, be quiet and do not output a result.Examples
--------
::In [1]: %memit range(10000)peak memory: 21.42 MiB, increment: 0.41 MiBIn [2]: %memit range(1000000)peak memory: 52.10 MiB, increment: 31.08 MiBIn [3]: %%memit l=range(1000000)...: len(l)...:peak memory: 52.14 MiB, increment: 0.08 MiB

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

Python数据科学手册（笔记一）

1、利用%lprun进行逐行分析

2、利用%memit和%mprun进行内存分析

相关文章