Scrapy|python - 宁波妈妈网 - Powered by phpwind

只看楼主倒序阅读使用道具楼主发表于: 2019-03-23

Scrapy

http://quotes.toscrape.com/tag/humor/page/1/

https://scrapy-chs.readthedocs.io/zh_CN/0.24/topics/contracts.html

https://www.jianshu.com/p/df9c0d1e9087

分享到 淘江湖新浪 QQ微博 QQ空间开心人人豆瓣网易微博百度鲜果白社会飞信

离线北斗星

管理员

只看该作者沙发发表于: 2019-03-27

https://www.cnblogs.com/qiyeboy/p/5428240.html
https://blog.csdn.net/ljm_9615/article/details/76715696
https://www.jianshu.com/p/09e29b0a4b29

https://scrapy-chs.readthedocs.io/zh_CN/0.24/topics/api.html

https://www.cnblogs.com/qmfsun/p/6207918.html

https://blog.csdn.net/lishk314/article/details/83539622

离线北斗星

管理员

只看该作者板凳发表于: 2019-03-27

# 标题
article_title = response.xpath('//div[@class="entry-header"]/h1/text()').extract()[0]
print(article_title)

# 时间
article_time = response.xpath('//p[@class="entry-meta-hide-on-mobile"]/text()').extract()[0].strip().replace(
    '·', '').strip()

print(article_time)
# 点赞数
article_praise = response.xpath('//span[contains(@class,"vote-post-up")]/h10/text()').extract()[0]
print(article_praise)

# 收藏数
bookmark = response.xpath('//span[contains(@class,"bookmark-btn")]/text()').extract()[0]
# 正则提取收藏数字
match_bookmark = re.match('.*(\d+).*', bookmark)
if match_bookmark:
    article_bookmark = match_bookmark.group(1)
    print(article_bookmark)

# 评论数
comments = response.xpath('//a[@href="#article-comment"]/text()').extract()[0]
match_comments = re.match('.*(\d+).*', comments)
if match_comments:
    article_comments = match_comments.group(1)
    print(article_comments)

# 文章详情
article_contents = response.xpath('//div[@class="entry"]').extract()[0]

# 文章标签
tag_list = response.xpath('//p[@class="entry-meta-hide-on-mobile"]/a/text()').extract()

# 去重标签
tag_list = [element for element in tag_list if not element.strip().endswith("评论")]
tags = ','.join(tag_list)
print(tags)
---------------------
作者：sy_y
来源：CSDN
原文：https://blog.csdn.net/s740556472/article/details/81023624
版权声明：本文为博主原创文章，转载请附上博文链接！

离线北斗星

管理员

只看该作者地板发表于: 2019-03-28

https://www.jianshu.com/p/09e29b0a4b29

离线北斗星

管理员

只看该作者 4楼发表于: 2019-03-29

https://www.cnblogs.com/wzjbg/p/6507564.html

离线北斗星

管理员

只看该作者 5楼发表于: 2019-03-29

https://blog.csdn.net/harry5508/article/details/86486777

https://stackoverflow.com/questions/8372703/how-can-i-use-different-pipelines-for-different-spiders-in-a-single-scrapy-proje/34647090#34647090

https://www.jianshu.com/p/df9c0d1e9087

帖子

Scrapy [复制链接]


	http://ningbo.zhaogongla.cn:6680 访问内容超出本站范围，不能确定是否安全


	关闭您还没有登录，快捷通道只有在登录后才能使用。立即登录还没有帐号？赶紧注册一个


	关闭选中1篇全选