30

scrapyd drop-in replacement, scrapy clustering solution writing in go

 4 years ago
source link: https://www.tuicool.com/articles/7R7ruez
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

scrapyd-go

an drop-in replacement for scrapyd that is more easy to be scalable and distributed on any number of commodity machines with no hassle, each scrapyd-go instance is a stateless microservice, all instances must be connected to the same redis server, redis is used as a ceneralized registry system for all instances, so each instance se what others see.

Why

scrapyd isn't bad, but it is very stateful, it isn't that easy to deploy it in a destributed environment like k8s , as well as I wanted to add more features, so I started this project as a drop-in replacement for scrapyd but writing in modern & scalable environment like go for restful server and redis as centeralized registry.

TODOs

  • schedule.json
  • cancel.json
  • addversion.json
  • listprojects.json
  • listversions.json
  • listspiders.json
  • delproject.json
  • delversion.json
  • listjobs.json
  • daemonstatus.json
  • logs/{jobid} , new : realtime output of the job log

Configurations

scrapyd-go configs are just simple command line flags

-dir string
        the directory to use for local caching (default ".scrapyd-go")
  -listen string
        the address to bind to (default ":6800")
  -max2keep int
        the maximum jobs/logs to keep in memory (default 1000000)
  -poll int
        time in millisecond between each poll operation from queue(s) (default 10)
  -python string
        the python binary to use (default "python3")
  -redis string
        the redis server address (default "redis://:somepass@localhost:6379/1")
  -sync int
        time in seconds between each sync operation (default 15)
  -workers int
        the maximum workers count (default cpu-cores-count)

Installation

  • binary : go to releases page and download your os based release
  • docker : $ docker pull alash3al/scrapyd-go
  • source : $ go get github.com/alash3al/scrapyd-go

Running

  • binary : $ ./scrapyd_bin_file -redis redis://localhost:6379/1
  • docker : $ docker run --link SomeRedisServerContainer -p 6800:6800 alash3al/scrapyd-go -redis redis://SomeRedisServerContainer:6379/1
  • source : $ scrapyd-go -redis redis://localhost:6379/1

Contributing

  • Fork the repo
  • Create a feature branch
  • Push your changes
  • Create a pull request

License

Apache License v2.0

Author


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK