6

How to download scrapy images to a dynamic folder based on

 2 years ago
source link: https://www.codesd.com/item/how-to-download-scrapy-images-to-a-dynamic-folder-based-on.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How to download scrapy images to a dynamic folder based on

advertisements

I'm trying to override default path full/hash.jpg to <dynamic>/hash.jpg, I've tried How to download scrapy images in a dyanmic folder using following code:

def item_completed(self, results, item, info):

    for result in [x for ok, x in results if ok]:
        path = result['path']
        # here we create the session-path where the files should be in the end
        # you'll have to change this path creation depending on your needs
        slug = slugify(item['category'])
        target_path = os.path.join(slug, os.path.basename(path))

        # try to move the file and raise exception if not possible
        if not os.rename(path, target_path):
            raise DropItem("Could not move image to target folder")

    if self.IMAGES_RESULT_FIELD in item.fields:
        item[self.IMAGES_RESULT_FIELD] = [x for ok, x in results if ok]
    return item

but I get:

Traceback (most recent call last):
    File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
    File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 839, in _cbDeferred
    self.callback(self.resultList)
    File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 382, in callback
    self._startRunCallbacks(result)
    File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 490, in _startRunCallbacks
    self._runCallbacks()
    --- <exception caught here> ---
    File "/home/user/.venv/sepid/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
    File "/home/user/Projects/sepid/scraper/scraper/pipelines.py", line 44, in item_completed
    if not os.rename(path, target_path):
    exceptions.OSError: [Errno 2] No such file or directory

I don't know what's wrong, also is there any other way to change the path? Thanks


I have created a pipeline inherited from ImagesPipeline and overridden file_path method and used it instead of standard ImagesPipeline

class StoreImgPipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None):
        image_guid = hashlib.sha1(to_bytes(request.url)).hexdigest()
        return 'realty-sc/%s/%s/%s/%s.jpg' % (YEAR, image_guid[:2], image_guid[2:4], image_guid)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK