Pandas基础1.1|Python学习笔记
【练习一】 现有一份关于美剧《权力的游戏》剧本的数据集,请解决以下问题:
(a)在所有的数据中,一共出现了多少人物?
import pandas as pd
import numpy as np
df = pd.read_csv('C:/Users/PuLinYue/Desktop/joyful-pandas/data/Game_of_Thrones_Script.csv')
df.head()
| Release Date | Season | Episode | Episode Title | Name | Sentence | |
|---|---|---|---|---|---|---|
| 0 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | waymar royce | What do you expect? They're savages. One lot s... |
| 1 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | will | I've never seen wildlings do a thing like this... |
| 2 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | waymar royce | How close did you get? |
| 3 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | will | Close as any man would. |
| 4 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | gared | We should head back to the wall. |
df.describe()
| Release Date | Season | Episode | Episode Title | Name | Sentence | |
|---|---|---|---|---|---|---|
| count | 23911 | 23911 | 23911 | 23911 | 23911 | 23911 |
| unique | 73 | 8 | 10 | 73 | 564 | 22300 |
| top | 2017/8/13 | Season 2 | Episode 5 | Eastwatch | tyrion lannister | No. |
| freq | 505 | 3914 | 3083 | 505 | 1760 | 103 |
df['Name'].nunique() #显示Name有多少个唯一值
564
(b)以单元格计数(即简单把一个单元格视作一句),谁说了最多的话?
df['Name'].value_counts()
tyrion lannister 1760
jon snow 1133
daenerys targaryen 1048
cersei lannister 1005
jaime lannister 945...
janos slunt 1
steward of house stark 1
archmaester 1
night watch stable boy 1
bryndel 1
Name: Name, Length: 564, dtype: int64
df['Name'].value_counts().index[0]
'tyrion lannister'
(c)以单词计数,谁说了最多的单词?
#apply(lambda x:len(x.split())) apply函数看每个句子里有多少个单词
df_words = df.assign(Words=df['Sentence'].apply(lambda x:len(x.split()))).sort_values(by='Name')
df_words.head()
| Release Date | Season | Episode | Episode Title | Name | Sentence | Words | |
|---|---|---|---|---|---|---|---|
| 276 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | a voice | It's Maester Luwin, my lord. | 5 |
| 3012 | 2011/6/19 | Season 1 | Episode 10 | Fire and Blood | addam marbrand | ls it true about Stannis and Renly? | 7 |
| 3017 | 2011/6/19 | Season 1 | Episode 10 | Fire and Blood | addam marbrand | Kevan Lannister | 2 |
| 13610 | 2014/6/8 | Season 4 | Episode 9 | The Watchers on the Wall | aemon | And what is it that couldn't wait until mornin... | 10 |
| 13614 | 2014/6/8 | Season 4 | Episode 9 | The Watchers on the Wall | aemon | Oh, no need. I know my way around this library... | 48 |
df.assign(Words=df['Sentence'].apply(lambda x:len(x.split())))
| Release Date | Season | Episode | Episode Title | Name | Sentence | Words | |
|---|---|---|---|---|---|---|---|
| 0 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | waymar royce | What do you expect? They're savages. One lot s... | 25 |
| 1 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | will | I've never seen wildlings do a thing like this... | 21 |
| 2 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | waymar royce | How close did you get? | 5 |
| 3 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | will | Close as any man would. | 5 |
| 4 | 2011/4/17 | Season 1 | Episode 1 | Winter is Coming | gared | We should head back to the wall. | 7 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 23906 | 2019/5/19 | Season 8 | Episode 6 | The Iron Throne | brienne | I think we can all agree that ships take prece... | 12 |
| 23907 | 2019/5/19 | Season 8 | Episode 6 | The Iron Throne | bronn | I think that's a very presumptuous statement. | 7 |
| 23908 | 2019/5/19 | Season 8 | Episode 6 | The Iron Throne | tyrion lannister | I once brought a jackass and a honeycomb into ... | 11 |
| 23909 | 2019/5/19 | Season 8 | Episode 6 | The Iron Throne | man | The Queen in the North! | 5 |
| 23910 | 2019/5/19 | Season 8 | Episode 6 | The Iron Throne | all | The Queen in the North! The Queen in the North... | 25 |
23911 rows × 7 columns
#基本思路:先对人进行排序
#以will为例。向下统计,若遇到will则将对应的words数加上去,若不是则跳过。
#L_count[-1] 第一个元素特殊处理
L_count = []
N_words = list(zip(df_words['Name'],df_words['Words']))
for i in N_words:if i == N_words[0]:L_count.append(i[1])last = i[0]else:L_count.append(L_count[-1]+i[1] if i[0]==last else i[1])last = i[0]
df_words['Count']=L_count
df_words['Name'][df_words['Count'].idxmax()]
'tyrion lannister'
【练习二】现有一份关于科比的投篮数据集,请解决如下问题:
(a)哪种action_type和combined_shot_type的组合是最多的?
df_1 = pd.read_csv('C:/Users/PuLinYue/Desktop/joyful-pandas/data/Kobe_data.csv',index_col='shot_id')
df_1.head()
| action_type | combined_shot_type | game_event_id | game_id | lat | loc_x | loc_y | lon | minutes_remaining | period | ... | shot_made_flag | shot_type | shot_zone_area | shot_zone_basic | shot_zone_range | team_id | team_name | game_date | matchup | opponent | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| shot_id | |||||||||||||||||||||
| 1 | Jump Shot | Jump Shot | 10 | 20000012 | 33.9723 | 167 | 72 | -118.1028 | 10 | 1 | ... | NaN | 2PT Field Goal | Right Side(R) | Mid-Range | 16-24 ft. | 1610612747 | Los Angeles Lakers | 2000/10/31 | LAL @ POR | POR |
| 2 | Jump Shot | Jump Shot | 12 | 20000012 | 34.0443 | -157 | 0 | -118.4268 | 10 | 1 | ... | 0.0 | 2PT Field Goal | Left Side(L) | Mid-Range | 8-16 ft. | 1610612747 | Los Angeles Lakers | 2000/10/31 | LAL @ POR | POR |
| 3 | Jump Shot | Jump Shot | 35 | 20000012 | 33.9093 | -101 | 135 | -118.3708 | 7 | 1 | ... | 1.0 | 2PT Field Goal | Left Side Center(LC) | Mid-Range | 16-24 ft. | 1610612747 | Los Angeles Lakers | 2000/10/31 | LAL @ POR | POR |
| 4 | Jump Shot | Jump Shot | 43 | 20000012 | 33.8693 | 138 | 175 | -118.1318 | 6 | 1 | ... | 0.0 | 2PT Field Goal | Right Side Center(RC) | Mid-Range | 16-24 ft. | 1610612747 | Los Angeles Lakers | 2000/10/31 | LAL @ POR | POR |
| 5 | Driving Dunk Shot | Dunk | 155 | 20000012 | 34.0443 | 0 | 0 | -118.2698 | 6 | 2 | ... | 1.0 | 2PT Field Goal | Center(C) | Restricted Area | Less Than 8 ft. | 1610612747 | Los Angeles Lakers | 2000/10/31 | LAL @ POR | POR |
5 rows × 24 columns
gamegroup = list(zip(df_1['action_type'],df_1['combined_shot_type']))
gamegroup
[('Jump Shot', 'Jump Shot'),('Jump Shot', 'Jump Shot'),
...]
pd.Series(gamegroup).value_counts().index[0]
('Jump Shot', 'Jump Shot')
(b)在所有被记录的game_id中,遭遇到最多的opponent是一个支?
df_1.iloc[:,[3,-1]]#取最后三列
| game_id | opponent | |
|---|---|---|
| shot_id | ||
| 1 | 20000012 | POR |
| 2 | 20000012 | POR |
| 3 | 20000012 | POR |
| 4 | 20000012 | POR |
| 5 | 20000012 | POR |
| ... | ... | ... |
| 30693 | 49900088 | IND |
| 30694 | 49900088 | IND |
| 30695 | 49900088 | IND |
| 30696 | 49900088 | IND |
| 30697 | 49900088 | IND |
30697 rows × 2 columns
gamegroup_1 = list(zip(df_1['game_id'],df_1['opponent']))
pd.Series(list(list(zip(*(pd.Series(gamegroup_1).unique()).tolist()))[1])).value_counts().index[0]
'SAS'
zip(df_1['game_id'],df_1['opponent'])
list(zip(df_1['game_id'],df_1['opponent']))
[(20000012, 'POR'),(20000012, 'POR'),...]
#unique函数:去重。不可用在list上,需要用在pd.series上。
a = pd.Series(list(zip(df_1['game_id'],df_1['opponent']))).unique()
a
array([(20000012, 'POR'), (20000019, 'UTA'), (20000047, 'VAN'), ...,(49900086, 'IND'), (49900087, 'IND'), (49900088, 'IND')],dtype=object)
a = pd.Series(list(zip(df_1['game_id'],df_1['opponent']))).unique().tolist()
a #无重复元素
[(20000012, 'POR'),(20000019, 'UTA'),(20000047, 'VAN'),(20000049, 'LAC'),(20000058, 'HOU'),...]
#解包——所有的第一个元素拿出来,第二元素拿出来。此时没有重复元素。
#取出两个包:第一个元素的集合、第二个元素的集合。
n = list(zip(*a))
n
pd.Series(n) #变成序列
0 (20000012, 20000019, 20000047, 20000049, 20000...
1 (POR, UTA, VAN, LAC, HOU, SAS, HOU, DEN, SAC, ...
dtype: object
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
