requests+beautifulsoup4 爬虫实战
某电影网站手机页面有影视的评分,但不提供排序。为了看高分电影,动手写了爬虫,实现下载影视名称和评分,并输出至文件,后续通过excel处理排序。
#!/usr/bin/python3
# -*- coding:utf-8 -*-"""Here is docstring"""# __author__ = c08762import time
import requests
from bs4 import BeautifulSoupnames = []
scores = []headers = {'User-Agent':'Mozilla/5.0 (iPhone; U; CPU iPhone OS 5_1_1 like Mac OS X; en) AppleWebKit/534.46.0 (KHTML, like Gecko) CriOS/19.0.1084.60 Mobile/9B206 Safari/7534.48.3'}
root_url = 'http://www.dyaihao.com/type/5.html'
i = 1
print('正在获取 %s' % root_url)
resp = requests.get(root_url, headers=headers, timeout=15)while resp.status_code == 200:print('获取一个页面后暂停5秒\n')time.sleep(5)resp.encoding = 'utf-8'soup = BeautifulSoup(resp.text, 'lxml')# type(h3s) is list, 获取电影名h3s = soup.select('li h3')for h in h3s:# type(t) is strth = h.textnames.append(th[3:])# 获取评分ps = soup.select('li p')for p in ps:tp = p.textscores.append(tp[:-1])# 是否有下一页next_p = soup.find('a', class_="btn btn-primary btn-block")if next_p is None:print('恭喜爬取完毕,正在输出至文本...')name_score = dict(zip(names, scores))fileObject = open('/home/c08762/sample.txt', 'w')for k, v in name_score.items():fileObject.write(str(k))fileObject.write(",")fileObject.write(str(v))fileObject.write('\n')fileObject.close()print('文本写入完毕!结束')breakelse:# 如果有进行地址组装,并跳转build_url = "http://www.dyaihao.com" + next_p['href']i += 1if 0 == i % 20:print('\n防反爬,暂停30秒\n')time.sleep(30)print('正在获取 %s' % build_url)resp = requests.get(build_url, headers=headers, timeout=60)
else:print('发生页面打开错误')
有待完善:实现每日增量邮件提醒
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
