10

【python零基础爬虫入门】,爬取百度图片,小孩子也能学会

 3 years ago
source link: https://blog.csdn.net/weixin_57171554/article/details/115904755
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

【python零基础爬虫入门】,爬取百度图片,小孩子也能学会

先上效果图
在这里插入图片描述
需要头文件

import re
import requests
import os

因为爬虫需要用到请求网络部分,所以需要这两个包,没有的话自行下载即可。

 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36'

完整的请求

url = 'https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=='+name+'+&pn='+str(i*30)
        result = requests.get(url,headers=headers)
        dowmloadPic(result.content.decode(), name)

得到了html之后需要用到正则表达式

 pic_url = re.findall('"objURL":"(.*?)",',html,re.S)

最后直接把请求到的图片下载好就行

 fp = open(dir, 'wb')
        fp.write(pic.content)
        fp.close()

完整代码:

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import re
import requests
import os


def dowmloadPic(html, keyword,i):
    pic_url = re.findall('"objURL":"(.*?)",',html,re.S)
   
    abc=i*60
    print('找到关键词:' + keyword + '的图片,现在开始下载图片...')
    for each in pic_url:
        print('正在下载第' + str(abc) + '张图片,图片地址:' + str(each))
        try:
            pic = requests.get(each, timeout=10)
        except requests.exceptions.ConnectionError:
            print('【错误】当前图片无法下载')
            continue

        dir = r'D:\image\i' + keyword + '_' + str(abc) + '.jpg'
        if not os.path.exists('D:\image'):
            os.makedirs('D:\image')
        
        fp = open(dir, 'wb')
        fp.write(pic.content)
        fp.close()
        abc += 1


if __name__ == '__main__':
    #word = input("Input key word: ")
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36'}
    name = input('输入下载图片的名字')
    num = 0
    x = input('您要爬取几张呢?,n*60')

    for i in range(int(x)):
        url = 'https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=='+name+'+&pn='+str(i*30)
        result = requests.get(url,headers=headers)
        dowmloadPic(result.content.decode(), name,i)
print("下载完成")

有想学爬虫的小伙伴也可以找我交流一下。
q:2316773638


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK