Python爬虫系列——爬取豆瓣音乐排行榜!

今天一个学妹来找我帮忙,她选修了爬虫课,老师给了爬豆瓣视频排行榜信息的代码,要求她们改写,然后学妹想改成爬音乐排行榜信息,结果出错了,然后代码出错了又自己改不了,所以就来找我小土豆啦~

后面附带她们老师给的爬取豆瓣电影排行榜信息的代码。

下面是我改过之后的完整代码,可正常运行的,如果不能正常爬取内容请注意该博客的发布时间,有可能豆瓣的代码换了!

from urllib import request
from chardet import detect
from bs4 import BeautifulSoup as bs
import redef getSoup(url):with request.urlopen(url) as fp:byt=fp.read()det=detect(byt)return bs(byt.decode(det['encoding']),'lxml')def getData(soup):data=[]div=soup.find('div',attrs={'class':'grid-16-8 clearfix'})data = []for width in div.findAll('table', {'width': '100%%'}):small_data = []small_data.append(width.find('a', {'class': 'nbg'})['title'])small_data.append(width.find('span', {'class': 'rating_nums'}).string)star_index = re.search('class="allstar', str(width)).span()[1]small_data.append(int(str(width)[star_index: star_index + 2]) / 10)data.append(small_data)return datadef nextUrl(soup):a=soup.find('a',text=re.compile("^后页"))if a:return a.attrs['href']else:return Noneif __name__=='__main__':url="https://music.douban.com/top250"soup=getSoup(url)print(getData(soup))nt=nextUrl(soup)while nt:soup=getSoup(nt)print(getData(soup))nt=nextUrl(soup)

这部分是老师给的爬豆瓣电影排行榜信息的代码:

from urllib import request
from chardet import detect
from bs4 import BeautifulSoup as bs
import redef getSoup(url):with request.urlopen(url) as fp:byt=fp.read()det=detect(byt)return bs(byt.decode(det['encoding']),'lxml')def getData(soup):data=[]ol=soup.find('ol',attrs={'class':'grid_view'})for li in ol.findAll('li'):tep=[]titles=[]for span in li.findAll('span'):if span.has_attr('class'):if span.attrs['class'][0]=='title':titles.append(span.string.strip())elif span.attrs['class'][0]=='rating_num':tep.append(span.string.strip())elif span.attrs['class'][0]=='inq':tep.append(span.string.strip())tep.insert(0,titles)data.append(tep)return datadef nextUrl(soup):a=soup.find('a',text=re.compile("^后页"))if a:return a.attrs['href']else:return Noneif __name__=='__main__':url="https://movie.douban.com/top250"soup=getSoup(url)print(getData(soup))nt=nextUrl(soup)while nt:soup=getSoup(url+nt)print(getData(soup))nt=nextUrl(soup)

 


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部