17
Scrapy 如何存储图片?
source link: https://cuiqingcai.com/9634.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
扫码或搜索: 进击的Coder
发送
即可 立即永久 解锁本站全部文章
在设置中找到ITEM_PIPELINES并加入以下代码
scrapy.pipelines.images.ImagesPipeline: 301
settings配置:
图片存储路径:
IMAGES_STORE = “your path”
图片存储天数
images_EXPIRES = 30
设置缩略图(固定值):
IMAGES_THUMBS = { 'small':(50,50) 'big':(270,270) }
示例:
# 配置图片管道参数 BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) IMAGES_STORE = os.path.join(BASE_DIR,'images')
# 寻找此文件的父级目录 os.path.dirname() # 当前脚本的绝对路径目录 os.path.abspath(__file__) # __file__当前脚本的名字 IMAGES_STORE = os.path.join(BASE_DIR,'images') 将BASE_DIR新增IMAGES文件夹路径
设置spider中获取images_url的提取方法
item['image_urls'] = "提取语法" # item['image_urls'] = response.css(".pic img:attr('src')").extract() item['images'] = [] # 【】中不需要填写,下载图片之后,保存本地的文件位置
使用ImagesPipeline下载图片时,需要使用images_urls字段,images_urls一般是可迭代的列表或元组类型
如果遇到图片反扒请打开
# DEFAULT_REQUEST_HEADERS = { # 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', # 'Accept-Language': 'en', # "referer":"自行配置" # }
存入MongoDB,示例代码
import pymongo from itemadapter import ItemAdapter class MongoPipeline: collection_name = 'scrapy_items' def __init__(self, mongo_uri, mongo_db): self.mongo_uri = mongo_uri self.mongo_db = mongo_db @classmethod def from_crawler(cls, crawler): return cls( mongo_uri=crawler.settings.get('MONGO_URI'), mongo_db=crawler.settings.get('MONGO_DATABASE', 'items') ) def open_spider(self, spider): self.client = pymongo.MongoClient(self.mongo_uri) self.db = self.client[self.mongo_db] def close_spider(self, spider): self.client.close() def process_item(self, item, spider): self.db[self.collection_name].insert_one(ItemAdapter(item).asdict()) return item
更多详情请查阅官方文档:https://docs.scrapy.org/en/latest/topics/item-pipeline.html#take-screenshot-of-item
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK