发布时间:2025-12-10 19:45:58 浏览次数:5
爬取西刺代理的免费IP[通俗易懂]爬取西刺代理的免费IP背景出于爬取其他项目的需求,爬取点代理ip存成文本文件,随机取一个简单验证,方便自己其他代码里面调用。环境win10,python3.6,pycharm干货importrequestsfrombs4importBeautifulSoupimporttimeimportrandomheaders={‘Use…
import requestsfrom bs4 import BeautifulSoupimport timeimport randomheaders = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36'}def xici_ip(page): for num_page in range(1,page+1): url_part = "http://www.xicidaili.com/wn/" # 爬取西刺代理的IP,此处选的是国内https url = url_part + str(num_page) # 构建爬取的页面URL r = requests.get(url, headers=headers) if r.status_code == 200: soup = BeautifulSoup(r.text,'lxml') trs = soup.find_all('tr') for i in range(1,len(trs)): tr = trs[i] tds = tr.find_all('td') ip_item = tds[1].text + ':' + tds[2].text # print('抓取第'+ str(page) + '页第' + str(i) +'个:' + ip_item) with open(r'路径\get_xici_ip.txt', 'a', encoding='utf-8') as f: f.writelines(ip_item + '\n') # time.sleep(1) return ('存储成功')def get_ip(): with open(r'路径\get_xici_ip.txt', 'r', encoding='utf-8') as f: lines = f.readlines() return random.choice(lines)def check_ip(): proxies = { 'HTTPS': 'HTTPS://' + get_ip().replace('\n', '')} try: r = requests.get('http://httpbin.org/ip', headers=headers, proxies=proxies, timeout=10) if r.status_code == 200: return proxies except Exception as e: print(e)def main(): xici_ip(1) # 抓取第一页,一页100个url try: return check_ip() except Exception as e: print(e) check_ip()if __name__ == '__main__': main() 是否还在为Ide开发工具频繁失效而烦恼,来吧关注以下公众号获取最新激活方式。亲测可用!
【正版授权,激活自己账号】:Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛
【官方授权 正版激活】:官方授权 正版激活 自己使用,支持Jetbrains家族下所有IDE…
走过路过,有任何问题,请不吝赐教。