python+ beautifulsoup 解析采集内容

安装:pip  install beautifulsoup4

安装成功

模拟登录并获取html代码内容


其中cookie添加部分比较麻烦折腾了几十分钟

写入cookie:

cookie = driver.get_cookies()


jsonCookies = json.dumps(cookie)
with open('qqhomepage.json', 'w') as f:
f.write(jsonCookies)
str=''
with open('qqhomepage.json','r',encoding='utf-8') as f:
    listCookies=json.loads(f.read())

for item in listCookies:
    driver.add_cookie({
        "domain":".xxx.cn",
        "name":item["name"],
        "value":item["value"],
        "path":'/',
        "expires":None
    })
    print(item["name"])
    print(item["value"])

相关文档 https://www.crummy.com/software/BeautifulSoup/bs4/doc/

解析区间内的所有图片:

html = driver.page_source

soup = BeautifulSoup(html,'html.parser')
soup.prettify()
htmlrr = soup.select('#ChapterContent10576198 img');


print(htmlrr)