模拟登陆

准备网站

爬虫专用网

FormRequest

Scrapy内专门提供用于构造含有表单的数据请求FormRequest

1
2
3
4
5
6
7
8
9
10
......
In [1]: fd = {'email':'keeep@webscraping.com','password':'135799'}

In [2]: from scrapy.http import FormRequest

In [3]: request = FormRequest.from_response(response,formdata=fd)

In [4]: fetch(request)

In [5]: view(response)

然后就显示出来你登陆之后的页面

编码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import FormRequest
from ..items import LoginItem


class LoginsSpider(scrapy.Spider):
name = 'logins'
allowed_domains = ['example.webscraping.com']
start_urls = ['http://example.webscraping.com/places/default/user/profile']

def parse(self, response):
exam = LoginItem()
exam['values'] = response.xpath('//td[@class="w2p_fw"]/text()').extract()
yield exam

login_url = "http://example.webscraping.com/places/default/user/login"

def start_requests(self):
# 注意此处为scrapy.Request
yield scrapy.Request(self.login_url,callback = self.login)

def login(self,response):
fd = {'email':'liushuo@webscraping.com','password':'12345678'}
yield FormRequest.from_response(response,formdata=fd,callback = self.parse_login)

def parse_login(self,response):
if 'Welcome Liu' in response.text:
yield from super().start_requests()

使用Cookie登陆

下载browsercookie第三方库

pip install browsercookie

作用

browsercookie的chrome和firefox方法分别返回对应浏览器的CookieJar对象,进行迭代即可访问每一个Cookie对象

CookiesMiddleware

待续

赏个🍗吧
0%