17

Scrapy 如何存储图片?

 3 years ago
source link: https://cuiqingcai.com/9634.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

扫码或搜索: 进击的Coder

发送

即可 立即永久 解锁本站全部文章

zM73Abu.jpg!mobile

官方文档说明

在设置中找到ITEM_PIPELINES并加入以下代码

scrapy.pipelines.images.ImagesPipeline: 301

settings配置:

图片存储路径:

IMAGES_STORE = “your path”

图片存储天数

images_EXPIRES =  30

设置缩略图(固定值):

IMAGES_THUMBS = {
'small':(50,50)
'big':(270,270)
}

示例:

# 配置图片管道参数
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
IMAGES_STORE = os.path.join(BASE_DIR,'images')
# 寻找此文件的父级目录
os.path.dirname() 
# 当前脚本的绝对路径目录
os.path.abspath(__file__)
# __file__当前脚本的名字 
 
IMAGES_STORE = os.path.join(BASE_DIR,'images')
将BASE_DIR新增IMAGES文件夹路径

设置spider中获取images_url的提取方法

item['image_urls'] = "提取语法"
# item['image_urls'] = response.css(".pic img:attr('src')").extract()
item['images'] = [] # 【】中不需要填写,下载图片之后,保存本地的文件位置

使用ImagesPipeline下载图片时,需要使用images_urls字段,images_urls一般是可迭代的列表或元组类型

如果遇到图片反扒请打开

# DEFAULT_REQUEST_HEADERS = {
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
#    "referer":"自行配置"
# }

存入MongoDB,示例代码

 
import pymongo
from itemadapter import ItemAdapter
 
class MongoPipeline:
 
    collection_name = 'scrapy_items'
 
    def __init__(self, mongo_uri, mongo_db):
        self.mongo_uri = mongo_uri
        self.mongo_db = mongo_db
 
    @classmethod
    def from_crawler(cls, crawler):
        return cls(
            mongo_uri=crawler.settings.get('MONGO_URI'),
            mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
        )
 
    def open_spider(self, spider):
        self.client = pymongo.MongoClient(self.mongo_uri)
        self.db = self.client[self.mongo_db]
 
    def close_spider(self, spider):
        self.client.close()
 
    def process_item(self, item, spider):
        self.db[self.collection_name].insert_one(ItemAdapter(item).asdict())
        return item
 

更多详情请查阅官方文档:https://docs.scrapy.org/en/latest/topics/item-pipeline.html#take-screenshot-of-item


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK