Pandas基础1.1｜Python学习笔记

2023-12-08 03:42:18

【练习一】现有一份关于美剧《权力的游戏》剧本的数据集，请解决以下问题：
（a）在所有的数据中，一共出现了多少人物？

import pandas as pd
import numpy as np

df = pd.read_csv('C:/Users/PuLinYue/Desktop/joyful-pandas/data/Game_of_Thrones_Script.csv')
df.head()

	Release Date	Season	Episode	Episode Title	Name	Sentence
0	2011/4/17	Season 1	Episode 1	Winter is Coming	waymar royce	What do you expect? They're savages. One lot s...
1	2011/4/17	Season 1	Episode 1	Winter is Coming	will	I've never seen wildlings do a thing like this...
2	2011/4/17	Season 1	Episode 1	Winter is Coming	waymar royce	How close did you get?
3	2011/4/17	Season 1	Episode 1	Winter is Coming	will	Close as any man would.
4	2011/4/17	Season 1	Episode 1	Winter is Coming	gared	We should head back to the wall.

df.describe()

	Release Date	Season	Episode	Episode Title	Name	Sentence
count	23911	23911	23911	23911	23911	23911
unique	73	8	10	73	564	22300
top	2017/8/13	Season 2	Episode 5	Eastwatch	tyrion lannister	No.
freq	505	3914	3083	505	1760	103

df['Name'].nunique() #显示Name有多少个唯一值

（b）以单元格计数（即简单把一个单元格视作一句），谁说了最多的话？

df['Name'].value_counts()

tyrion lannister          1760
jon snow                  1133
daenerys targaryen        1048
cersei lannister          1005
jaime lannister            945... 
janos slunt                  1
steward of house stark       1
archmaester                  1
night watch stable boy       1
bryndel                      1
Name: Name, Length: 564, dtype: int64

df['Name'].value_counts().index[0]

'tyrion lannister'

（c）以单词计数，谁说了最多的单词？

#apply(lambda x:len(x.split())) apply函数看每个句子里有多少个单词
df_words = df.assign(Words=df['Sentence'].apply(lambda x:len(x.split()))).sort_values(by='Name')
df_words.head()

	Release Date	Season	Episode	Episode Title	Name	Sentence	Words
276	2011/4/17	Season 1	Episode 1	Winter is Coming	a voice	It's Maester Luwin, my lord.	5
3012	2011/6/19	Season 1	Episode 10	Fire and Blood	addam marbrand	ls it true about Stannis and Renly?	7
3017	2011/6/19	Season 1	Episode 10	Fire and Blood	addam marbrand	Kevan Lannister	2
13610	2014/6/8	Season 4	Episode 9	The Watchers on the Wall	aemon	And what is it that couldn't wait until mornin...	10
13614	2014/6/8	Season 4	Episode 9	The Watchers on the Wall	aemon	Oh, no need. I know my way around this library...	48

df.assign(Words=df['Sentence'].apply(lambda x:len(x.split())))

	Release Date	Season	Episode	Episode Title	Name	Sentence	Words
0	2011/4/17	Season 1	Episode 1	Winter is Coming	waymar royce	What do you expect? They're savages. One lot s...	25
1	2011/4/17	Season 1	Episode 1	Winter is Coming	will	I've never seen wildlings do a thing like this...	21
2	2011/4/17	Season 1	Episode 1	Winter is Coming	waymar royce	How close did you get?	5
3	2011/4/17	Season 1	Episode 1	Winter is Coming	will	Close as any man would.	5
4	2011/4/17	Season 1	Episode 1	Winter is Coming	gared	We should head back to the wall.	7
...	...	...	...	...	...	...	...
23906	2019/5/19	Season 8	Episode 6	The Iron Throne	brienne	I think we can all agree that ships take prece...	12
23907	2019/5/19	Season 8	Episode 6	The Iron Throne	bronn	I think that's a very presumptuous statement.	7
23908	2019/5/19	Season 8	Episode 6	The Iron Throne	tyrion lannister	I once brought a jackass and a honeycomb into ...	11
23909	2019/5/19	Season 8	Episode 6	The Iron Throne	man	The Queen in the North!	5
23910	2019/5/19	Season 8	Episode 6	The Iron Throne	all	The Queen in the North! The Queen in the North...	25

23911 rows × 7 columns

#基本思路：先对人进行排序
#以will为例。向下统计，若遇到will则将对应的words数加上去，若不是则跳过。
#L_count[-1] 第一个元素特殊处理
L_count = []
N_words = list(zip(df_words['Name'],df_words['Words']))
for i in N_words:if i == N_words[0]:L_count.append(i[1])last = i[0]else:L_count.append(L_count[-1]+i[1] if i[0]==last else i[1])last = i[0]
df_words['Count']=L_count
df_words['Name'][df_words['Count'].idxmax()]

'tyrion lannister'

【练习二】现有一份关于科比的投篮数据集，请解决如下问题：
（a）哪种action_type和combined_shot_type的组合是最多的？

df_1 = pd.read_csv('C:/Users/PuLinYue/Desktop/joyful-pandas/data/Kobe_data.csv',index_col='shot_id')
df_1.head()

	action_type	combined_shot_type	game_event_id	game_id	lat	loc_x	loc_y	lon	minutes_remaining	period	...	shot_made_flag	shot_type	shot_zone_area	shot_zone_basic	shot_zone_range	team_id	team_name	game_date	matchup	opponent
shot_id
1	Jump Shot	Jump Shot	10	20000012	33.9723	167	72	-118.1028	10	1	...	NaN	2PT Field Goal	Right Side(R)	Mid-Range	16-24 ft.	1610612747	Los Angeles Lakers	2000/10/31	LAL @ POR	POR
2	Jump Shot	Jump Shot	12	20000012	34.0443	-157	0	-118.4268	10	1	...	0.0	2PT Field Goal	Left Side(L)	Mid-Range	8-16 ft.	1610612747	Los Angeles Lakers	2000/10/31	LAL @ POR	POR
3	Jump Shot	Jump Shot	35	20000012	33.9093	-101	135	-118.3708	7	1	...	1.0	2PT Field Goal	Left Side Center(LC)	Mid-Range	16-24 ft.	1610612747	Los Angeles Lakers	2000/10/31	LAL @ POR	POR
4	Jump Shot	Jump Shot	43	20000012	33.8693	138	175	-118.1318	6	1	...	0.0	2PT Field Goal	Right Side Center(RC)	Mid-Range	16-24 ft.	1610612747	Los Angeles Lakers	2000/10/31	LAL @ POR	POR
5	Driving Dunk Shot	Dunk	155	20000012	34.0443	0	0	-118.2698	6	2	...	1.0	2PT Field Goal	Center(C)	Restricted Area	Less Than 8 ft.	1610612747	Los Angeles Lakers	2000/10/31	LAL @ POR	POR

5 rows × 24 columns

gamegroup = list(zip(df_1['action_type'],df_1['combined_shot_type']))
gamegroup

[('Jump Shot', 'Jump Shot'),('Jump Shot', 'Jump Shot'),
...]

pd.Series(gamegroup).value_counts().index[0]

('Jump Shot', 'Jump Shot')

（b）在所有被记录的game_id中，遭遇到最多的opponent是一个支？

df_1.iloc[:,[3,-1]]#取最后三列

	game_id	opponent
shot_id
1	20000012	POR
2	20000012	POR
3	20000012	POR
4	20000012	POR
5	20000012	POR
...	...	...
30693	49900088	IND
30694	49900088	IND
30695	49900088	IND
30696	49900088	IND
30697	49900088	IND

30697 rows × 2 columns

gamegroup_1 = list(zip(df_1['game_id'],df_1['opponent']))
pd.Series(list(list(zip(*(pd.Series(gamegroup_1).unique()).tolist()))[1])).value_counts().index[0]

'SAS'

zip(df_1['game_id'],df_1['opponent'])
list(zip(df_1['game_id'],df_1['opponent']))

[(20000012, 'POR'),(20000012, 'POR'),...]

#unique函数：去重。不可用在list上，需要用在pd.series上。
a = pd.Series(list(zip(df_1['game_id'],df_1['opponent']))).unique()
a

array([(20000012, 'POR'), (20000019, 'UTA'), (20000047, 'VAN'), ...,(49900086, 'IND'), (49900087, 'IND'), (49900088, 'IND')],dtype=object)

a = pd.Series(list(zip(df_1['game_id'],df_1['opponent']))).unique().tolist()
a #无重复元素

[(20000012, 'POR'),(20000019, 'UTA'),(20000047, 'VAN'),(20000049, 'LAC'),(20000058, 'HOU'),...]

#解包——所有的第一个元素拿出来，第二元素拿出来。此时没有重复元素。
#取出两个包：第一个元素的集合、第二个元素的集合。
n = list(zip(*a))
n

pd.Series(n) #变成序列

0    (20000012, 20000019, 20000047, 20000049, 20000...
1    (POR, UTA, VAN, LAC, HOU, SAS, HOU, DEN, SAC, ...
dtype: object

本文来自互联网用户投稿，文章观点仅代表作者本人，不代表本站立场，不承担相关法律责任。如若转载，请注明出处。 如若内容造成侵权/违法违规/事实不符，请点击【内容举报】进行投诉反馈！

标签：技术

上一篇 > canvas实现扣洞
下一篇 > 【比赛报告】AHSOFNU codeforces训练赛1 by hzwer NOIP练习赛卷三

Duilib中list控件支持ctrl和shif多行选中的实现

[ICML2015]Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shif

win10系统微软输入法于eclipse ctrl+shif+f冲突间接处理办法

Codeforces Round #259 (Div. 2) B. Little Pony and Sort by Shif

读LDD3，内存映射与DMA--PAGE_SHIF…

VMware虚拟机安装XP【要先分区，再设置BOOT 启动CD，shif+上移】

更换iBus五笔的左与右Shif

sublime ctrl+shif+f 没用解决办法

idea 对 ctrl + z 的撤销是 ctrl + shif + z

计算机最早的设计师应用于,计算机应用基础选择题doc.doc

win10自带截图神器：Win+Shift+S

Python基础之文件目录操作

python简述目录_Python基础之文件目录操作(示例代码)

tp5 如何做数据采集

任务2-7(服务器字体+阿里巴巴矢量库)

html标签（1)：h1~h6,p,br,pre,hr

TI 电量计介绍与芯片选型指南

几款TI电源芯片简介

TI DSP芯片C2000系列读取FLASH数据

德州仪器(Ti)平台嵌入式开发基础

TI三相电机智能栅极驱动芯片特点分类

省选模拟（12.08） T3 圈圈圈圈圈圈圈圈

Hadoop生态圈技术栈（上）

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之6.Impala交互式查询

小猿圈之Linux下Mysql 操作命令

大数据Hadoop生态圈常用面试题

大数据开发基础入门与项目实战（三）Hadoop核心及生态圈技术栈之4.Hive DDL、DQL和数据操作

备战Noip2018模拟赛11（B组）T3 Monogatari 物语

【智能优化算法-圆圈搜索算法】基于圆圈搜索算法Circle Search Algorithm求解单目标优化问题附matlab代码

NYOJ 78 圈水池

递归问题跑道汽车绕圈问题 Python实现

Hadoop生态圈（三）：MapReduce

Pandas基础1.1｜Python学习笔记

相关文章