技术资料
Mysql
Windows下安装mysql 5.6
Python
Python3.5.2 安装(windows环境)
图片爬取和写入
gevent队列任务
selenium模拟浏览器操作
pandas表格和数据应用
OS文件创建
excel格式转换:csv转xls
email自动发送
excel读取指定多行数据
cookie登录后爬取内容
单页文字图片爬取保存到word
学习实践:知网疾病知识
学习实践:知网指南
字典生成树形目录
docx文本图片存入word
-
+
首页
学习实践:知网指南
1.分析目录 ```python import requests,json,csv,random,time,os from selenium import webdriver path = r'C:\Users\zmc\项目\TEST\知网知识库\9-指南梳理\url.txt' with open(path,'r',encoding='utf-8') as f: n = 0 for i in f: n += 1 if n in range(114,115): j = i.replace('\n','').split('\t') dir = j[0] name = j[1] url = j[2] url2 = url.replace('download/GetDownLoad?fname','zndetail/index?query') #需要频繁更新cookie才行,每天有限量 #WWZN开头的不能直接爬 headers = { 'User-Agent': '', 'Cookie': '', } session = requests.Session() # 设置chrome后台静默运行 option = webdriver.ChromeOptions() option.add_argument('headless') # 浏览器模拟打开网址 driver = webdriver.Chrome(chrome_options=option) driver.get(url2) time.sleep(random.randrange(10,100)) driver.close() res = requests.get(url=url, headers=headers) ``` ## 2.保存附件 ```python import requests,json,csv,random,time,os from selenium import webdriver # headers = { # 'User-Agent': '', # 'Cookie': '', } # urldown = 'https://lczl.cnki.net/download/GetDownLoad?fname=SHZZ202003002&tname=CCGL&year=2020' # session = requests.Session() # res = requests.get(url=urldown,headers=headers) # with open("abc.pdf", "wb") as code: # code.write(res.content) path = r'C:\Users\zmc\项目\TEST\知网知识库\9-指南梳理\urldown183.txt' with open(path,'r',encoding='utf-8') as f: n = 0 for i in f: n += 1 if n in range(114,115): j = i.replace('\n','').split('\t') dir = j[0] name = j[1] url = j[2] url2 = url.replace('download/GetDownLoad?fname','zndetail/index?query') #需要频繁更新cookie才行,每天有限量 #WWZN开头的不能直接爬 headers = { 'User-Agent': '', 'Cookie': '', } try:#有少数页面打开本身就报错 session = requests.Session() # 设置chrome后台静默运行 option = webdriver.ChromeOptions() option.add_argument('headless') # 浏览器模拟打开网址 driver = webdriver.Chrome(chrome_options=option) driver.get(url2) time.sleep(random.randrange(10,100)) driver.close() res = requests.get(url=url, headers=headers) road = r'C:\临床指南\{}'.format(dir) if os.path.isdir(road): print(n,'有',name) with open(road+'\\'+'{}.pdf'.format(name), "wb") as code: code.write(res.content) else: print(n,'000',name) os.mkdir(road) with open(road+'\\'+'{}.pdf'.format(name), "wb") as code: code.write(res.content) except: pass ```
大诚
2022年8月3日 10:40
转发文档
收藏文档
上一篇
下一篇
手机扫码
复制链接
手机扫一扫转发分享
复制链接
Markdown文件
PDF文档
PDF文档(打印)
分享
链接
类型
密码
更新密码