Scrapy 如何存储图片？

扫码或搜索：进击的Coder

发送

即可立即永久解锁本站全部文章

在设置中找到ITEM_PIPELINES并加入以下代码

scrapy.pipelines.images.ImagesPipeline: 301

settings配置:

图片存储路径：

IMAGES_STORE = “your path”

图片存储天数

images_EXPIRES =  30

设置缩略图(固定值):

IMAGES_THUMBS = {
'small':(50,50)
'big':(270,270)
}

示例：

# 配置图片管道参数
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
IMAGES_STORE = os.path.join(BASE_DIR,'images')

# 寻找此文件的父级目录
os.path.dirname() 
# 当前脚本的绝对路径目录
os.path.abspath(__file__)
# __file__当前脚本的名字 
 
IMAGES_STORE = os.path.join(BASE_DIR,'images')
将BASE_DIR新增IMAGES文件夹路径

设置spider中获取images_url的提取方法

item['image_urls'] = "提取语法"
# item['image_urls'] = response.css(".pic img:attr('src')").extract()
item['images'] = [] # 【】中不需要填写,下载图片之后，保存本地的文件位置

使用ImagesPipeline下载图片时，需要使用images_urls字段，images_urls一般是可迭代的列表或元组类型

如果遇到图片反扒请打开

# DEFAULT_REQUEST_HEADERS = {
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
#    "referer":"自行配置"
# }

存入MongoDB，示例代码

 
import pymongo
from itemadapter import ItemAdapter
 
class MongoPipeline:
 
    collection_name = 'scrapy_items'
 
    def __init__(self, mongo_uri, mongo_db):
        self.mongo_uri = mongo_uri
        self.mongo_db = mongo_db
 
    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongo_uri=crawler.settings.get('MONGO_URI'),
            mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
        )
 
    def open_spider(self, spider):
        self.client = pymongo.MongoClient(self.mongo_uri)
        self.db = self.client[self.mongo_db]
 
    def close_spider(self, spider):
        self.client.close()
 
    def process_item(self, item, spider):
        self.db[self.collection_name].insert_one(ItemAdapter(item).asdict())
        return item

更多详情请查阅官方文档：https://docs.scrapy.org/en/latest/topics/item-pipeline.html#take-screenshot-of-item

在设置中找到ITEM_PIPELINES并加入以下代码

settings配置:

图片存储路径：

图片存储天数

设置缩略图(固定值):

Recommend

Linux环境编程进程间通信机制理解

实战分享：闲鱼无货源项目如何从0开始做到月收入过万 - 卢松松博客

2020微商创业一定要避开这6个坑 - 卢松松博客

开发模型的演化 - ThoughtWorks洞见

Golang标准库CHM格式文档

轻量级软路由 ESXi 和家庭影院方案的最佳实践

CSS标准

GitHub - TheMiningTeamYT/WinTXT: WinTXT -- A Command Line Text Editor For Window...

Windows Terminal Preview 1.3 Release | Windows Command Line

强大高效而精简易用的Golang爬虫框架Colly，能否取代 Scrapy？

About Joyk