6
How to download scrapy images to a dynamic folder based on
source link: https://www.codesd.com/item/how-to-download-scrapy-images-to-a-dynamic-folder-based-on.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
How to download scrapy images to a dynamic folder based on
advertisements
I'm trying to override default path full/hash.jpg
to <dynamic>/hash.jpg
, I've tried How to download scrapy images in a dyanmic folder using following code:
def item_completed(self, results, item, info):
for result in [x for ok, x in results if ok]:
path = result['path']
# here we create the session-path where the files should be in the end
# you'll have to change this path creation depending on your needs
slug = slugify(item['category'])
target_path = os.path.join(slug, os.path.basename(path))
# try to move the file and raise exception if not possible
if not os.rename(path, target_path):
raise DropItem("Could not move image to target folder")
if self.IMAGES_RESULT_FIELD in item.fields:
item[self.IMAGES_RESULT_FIELD] = [x for ok, x in results if ok]
return item
but I get:
Traceback (most recent call last):
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 839, in _cbDeferred
self.callback(self.resultList)
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 382, in callback
self._startRunCallbacks(result)
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 490, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/user/Projects/sepid/scraper/scraper/pipelines.py", line 44, in item_completed
if not os.rename(path, target_path):
exceptions.OSError: [Errno 2] No such file or directory
I don't know what's wrong, also is there any other way to change the path? Thanks
I have created a pipeline inherited from ImagesPipeline
and overridden file_path
method and used it instead of standard ImagesPipeline
class StoreImgPipeline(ImagesPipeline):
def file_path(self, request, response=None, info=None):
image_guid = hashlib.sha1(to_bytes(request.url)).hexdigest()
return 'realty-sc/%s/%s/%s/%s.jpg' % (YEAR, image_guid[:2], image_guid[2:4], image_guid)
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK