Pandas基础1.1|Python学习笔记

【练习一】 现有一份关于美剧《权力的游戏》剧本的数据集,请解决以下问题:
(a)在所有的数据中,一共出现了多少人物?

import pandas as pd
import numpy as np
df = pd.read_csv('C:/Users/PuLinYue/Desktop/joyful-pandas/data/Game_of_Thrones_Script.csv')
df.head()
Release DateSeasonEpisodeEpisode TitleNameSentence
02011/4/17Season 1Episode 1Winter is Comingwaymar royceWhat do you expect? They're savages. One lot s...
12011/4/17Season 1Episode 1Winter is ComingwillI've never seen wildlings do a thing like this...
22011/4/17Season 1Episode 1Winter is Comingwaymar royceHow close did you get?
32011/4/17Season 1Episode 1Winter is ComingwillClose as any man would.
42011/4/17Season 1Episode 1Winter is CominggaredWe should head back to the wall.
df.describe() 
Release DateSeasonEpisodeEpisode TitleNameSentence
count239112391123911239112391123911
unique738107356422300
top2017/8/13Season 2Episode 5Eastwatchtyrion lannisterNo.
freq505391430835051760103
df['Name'].nunique() #显示Name有多少个唯一值
564

(b)以单元格计数(即简单把一个单元格视作一句),谁说了最多的话?

df['Name'].value_counts()
tyrion lannister          1760
jon snow                  1133
daenerys targaryen        1048
cersei lannister          1005
jaime lannister            945... 
janos slunt                  1
steward of house stark       1
archmaester                  1
night watch stable boy       1
bryndel                      1
Name: Name, Length: 564, dtype: int64
df['Name'].value_counts().index[0]
'tyrion lannister'

(c)以单词计数,谁说了最多的单词?

#apply(lambda x:len(x.split())) apply函数看每个句子里有多少个单词
df_words = df.assign(Words=df['Sentence'].apply(lambda x:len(x.split()))).sort_values(by='Name')
df_words.head()
Release DateSeasonEpisodeEpisode TitleNameSentenceWords
2762011/4/17Season 1Episode 1Winter is Cominga voiceIt's Maester Luwin, my lord.5
30122011/6/19Season 1Episode 10Fire and Bloodaddam marbrandls it true about Stannis and Renly?7
30172011/6/19Season 1Episode 10Fire and Bloodaddam marbrandKevan Lannister2
136102014/6/8Season 4Episode 9The Watchers on the WallaemonAnd what is it that couldn't wait until mornin...10
136142014/6/8Season 4Episode 9The Watchers on the WallaemonOh, no need. I know my way around this library...48
df.assign(Words=df['Sentence'].apply(lambda x:len(x.split())))
Release DateSeasonEpisodeEpisode TitleNameSentenceWords
02011/4/17Season 1Episode 1Winter is Comingwaymar royceWhat do you expect? They're savages. One lot s...25
12011/4/17Season 1Episode 1Winter is ComingwillI've never seen wildlings do a thing like this...21
22011/4/17Season 1Episode 1Winter is Comingwaymar royceHow close did you get?5
32011/4/17Season 1Episode 1Winter is ComingwillClose as any man would.5
42011/4/17Season 1Episode 1Winter is CominggaredWe should head back to the wall.7
........................
239062019/5/19Season 8Episode 6The Iron ThronebrienneI think we can all agree that ships take prece...12
239072019/5/19Season 8Episode 6The Iron ThronebronnI think that's a very presumptuous statement.7
239082019/5/19Season 8Episode 6The Iron Thronetyrion lannisterI once brought a jackass and a honeycomb into ...11
239092019/5/19Season 8Episode 6The Iron ThronemanThe Queen in the North!5
239102019/5/19Season 8Episode 6The Iron ThroneallThe Queen in the North! The Queen in the North...25

23911 rows × 7 columns

#基本思路:先对人进行排序
#以will为例。向下统计,若遇到will则将对应的words数加上去,若不是则跳过。
#L_count[-1] 第一个元素特殊处理
L_count = []
N_words = list(zip(df_words['Name'],df_words['Words']))
for i in N_words:if i == N_words[0]:L_count.append(i[1])last = i[0]else:L_count.append(L_count[-1]+i[1] if i[0]==last else i[1])last = i[0]
df_words['Count']=L_count
df_words['Name'][df_words['Count'].idxmax()]
'tyrion lannister'

【练习二】现有一份关于科比的投篮数据集,请解决如下问题:
(a)哪种action_type和combined_shot_type的组合是最多的?

df_1 = pd.read_csv('C:/Users/PuLinYue/Desktop/joyful-pandas/data/Kobe_data.csv',index_col='shot_id')
df_1.head()
action_typecombined_shot_typegame_event_idgame_idlatloc_xloc_ylonminutes_remainingperiod...shot_made_flagshot_typeshot_zone_areashot_zone_basicshot_zone_rangeteam_idteam_namegame_datematchupopponent
shot_id
1Jump ShotJump Shot102000001233.972316772-118.1028101...NaN2PT Field GoalRight Side(R)Mid-Range16-24 ft.1610612747Los Angeles Lakers2000/10/31LAL @ PORPOR
2Jump ShotJump Shot122000001234.0443-1570-118.4268101...0.02PT Field GoalLeft Side(L)Mid-Range8-16 ft.1610612747Los Angeles Lakers2000/10/31LAL @ PORPOR
3Jump ShotJump Shot352000001233.9093-101135-118.370871...1.02PT Field GoalLeft Side Center(LC)Mid-Range16-24 ft.1610612747Los Angeles Lakers2000/10/31LAL @ PORPOR
4Jump ShotJump Shot432000001233.8693138175-118.131861...0.02PT Field GoalRight Side Center(RC)Mid-Range16-24 ft.1610612747Los Angeles Lakers2000/10/31LAL @ PORPOR
5Driving Dunk ShotDunk1552000001234.044300-118.269862...1.02PT Field GoalCenter(C)Restricted AreaLess Than 8 ft.1610612747Los Angeles Lakers2000/10/31LAL @ PORPOR

5 rows × 24 columns

gamegroup = list(zip(df_1['action_type'],df_1['combined_shot_type']))
gamegroup
[('Jump Shot', 'Jump Shot'),('Jump Shot', 'Jump Shot'),
...]
pd.Series(gamegroup).value_counts().index[0]
('Jump Shot', 'Jump Shot')

(b)在所有被记录的game_id中,遭遇到最多的opponent是一个支?

df_1.iloc[:,[3,-1]]#取最后三列
game_idopponent
shot_id
120000012POR
220000012POR
320000012POR
420000012POR
520000012POR
.........
3069349900088IND
3069449900088IND
3069549900088IND
3069649900088IND
3069749900088IND

30697 rows × 2 columns

gamegroup_1 = list(zip(df_1['game_id'],df_1['opponent']))
pd.Series(list(list(zip(*(pd.Series(gamegroup_1).unique()).tolist()))[1])).value_counts().index[0]
'SAS'
zip(df_1['game_id'],df_1['opponent'])
list(zip(df_1['game_id'],df_1['opponent']))
[(20000012, 'POR'),(20000012, 'POR'),...]
#unique函数:去重。不可用在list上,需要用在pd.series上。
a = pd.Series(list(zip(df_1['game_id'],df_1['opponent']))).unique()
a
array([(20000012, 'POR'), (20000019, 'UTA'), (20000047, 'VAN'), ...,(49900086, 'IND'), (49900087, 'IND'), (49900088, 'IND')],dtype=object)
a = pd.Series(list(zip(df_1['game_id'],df_1['opponent']))).unique().tolist()
a #无重复元素
[(20000012, 'POR'),(20000019, 'UTA'),(20000047, 'VAN'),(20000049, 'LAC'),(20000058, 'HOU'),...]
#解包——所有的第一个元素拿出来,第二元素拿出来。此时没有重复元素。
#取出两个包:第一个元素的集合、第二个元素的集合。
n = list(zip(*a))
n
pd.Series(n) #变成序列
0    (20000012, 20000019, 20000047, 20000049, 20000...
1    (POR, UTA, VAN, LAC, HOU, SAS, HOU, DEN, SAC, ...
dtype: object


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部