python 爬取下一页_爬虫爬不进下一页了,怎么办

该楼层疑似违规已被系统折叠 隐藏此楼查看此楼

#-*- coding: UTF-8 -*-

import scrapy

from hoho.items import HohoItem

import re

from scrapy.selector import Selector

import sys

reload(sys)

sys.setdefaultencoding( "UTF-8" )

class tongSpider(scrapy.Spider):

name = 'guwen'

start_urls=['http://www.shicifuns.com/v2/wenyan/list']

def parse(self,response):

papers = response.xpath('//div[@class="css_content"]/div/div[@class="css_body_left"]/div[@class="every_day"]/ul')

for paper in papers:

for p in paper.xpath('li'):

name = p.xpath('a/div/div[@class="poem_title"]/span/text()').extract()[0]

url = p.xpath('a/@href').extract()[0]

content = p.xpath('a/div/div[@class="poem_content"]/text()').extract()[0].strip("\r\n ")

author = p.xpath('a/div/div[@class="poem_info"]/span[@class="dynasty"]/text()').extract()[0]

pinfen = p.xpath('a/div/div[@class="poem_info"]/span[@class="dynasty"]/text()').extract()[1]

item = HohoItem(name = name,url="http://www.shicifuns.com"+url,content=content,author=author,pinfen=pinfen)

yield item

next = response.xpath("//div[@class='css_content']/div/div[@class='css_body_left']/div[@class='pagination']/ul/li/a[@class='next page focus']/@href").extract()

if next:

yield scrapy.Request(url = "http://www.shicifuns.com" + next[0],callback=self.parse)


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部