Python爬虫 Selenium使用隧道代理
1,由于隧道代理的特点,使用selenium+隧道代理,一般用于做一次性访问达到某种目的,而不是长时间保持状态的访问,更不适用于需要跳转的情况
2,selenium+隧道代理,不可设置无头模式(不可添加 headless)
代码:
from selenium.webdriver import Chrome, ChromeOptions
import zipfile
import stringdef create_proxyauth_extension(proxy_host, proxy_port,proxy_username, proxy_password,scheme='http', plugin_path=None):# 该配置不用改,可以直接用if plugin_path is None:plugin_path = 'chrome_proxyauth_plugin.zip'manifest_json = """{"version": "1.0.0","manifest_version": 2,"name": "Chrome Proxy","permissions": ["proxy","tabs","unlimitedStorage","storage","","webRequest","webRequestBlocking"],"background": {"scripts": ["background.js"]},"minimum_chrome_version":"22.0.0"}"""background_js = string.Template("""var config = {mode: "fixed_servers",rules: {singleProxy: {scheme: "${scheme}",host: "${host}",port: parseInt(${port})},bypassList: ["foobar.com"]}};chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});function callbackFn(details) {return {authCredentials: {username: "${username}",password: "${password}"}};}chrome.webRequest.onAuthRequired.addListener(callbackFn,{urls: [""]},['blocking']);""").substitute(host=proxy_host,port=proxy_port,username=proxy_username,password=proxy_password,scheme=scheme,)with zipfile.ZipFile(plugin_path, 'w') as zp:zp.writestr("manifest.json", manifest_json)zp.writestr("background.js", background_js)return plugin_pathif __name__ == '__main__':url = "https://www.baidu.com"proxyauth_plugin_path = create_proxyauth_extension(proxy_host="xxx.xxx.com", # 隧道hsotproxy_port="", # 隧道端口proxy_username="", # 输入隧道账号proxy_password="" # 输入密码)print(proxyauth_plugin_path)cOption = ChromeOptions()# cOption.add_argument('--disable-gpu')# cOption.add_argument('--no-sandbox')# cOption.add_argument('--headless')cOption.add_extension(proxyauth_plugin_path)browser = Chrome(options=cOption)browser.get(url)print(browser.page_source)
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
