

GitHub - alash3al/scraply: Scraply a simple dom scraper to fetch information fro...
source link: https://github.com/alash3al/scraply?v=3.0.0
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Simple Scraping Tool
Scraply, is a very simple html scraping tool, if you know css & jQuery then you can use it!
Overview
you can use
scraply
within your stack viacli
orhttp
.
# here is the CLI usage
# extracting the title and the description from scraply github repo page
$ scraply extract \
-u "https://github.com/alash3al/scraply" \
-x title="select('title').text()" \
-x description="select('meta[name=description]').attr('content')"
# same thing but with custom user agent
$ scraply extract \
-u "https://github.com/alash3al/scraply" \
-ua "OptionalCustomUserAgent"\
-x title="select('title').text()" \
-x description="select('meta[name=description]').attr('content')"
# same thing but with asking scraply to return the response body for debuging purposes
$ scraply extract \
--return-body \
-u "https://github.com/alash3al/scraply" \
-x title="select('title').text()" \
-x description="select('meta[name=description]').attr('content')"
for
http
usage, we will run the http server then using any http client to interact with it.
# running the http server
# by default it listens on address ":8010" which equals to "0.0.0.0:8010"
# for more information execute `$ scraply help`
$ scraply serve
# then in another shell let's execute the following curl
$ curl http://localhost:8010/extract \
-H "Content-Type: application/json" \
-s \
-d '{"url": "https://github.com/alash3al/scraply", "extractors": {"title": "$(\"title\").text()"}, "return_body": false, "user_agent": "CustomeUserAgent"}'
Download ?
you can go to the releases page and pick the latest version. or you can
$ docker run --rm -it ghcr.io/alash3al/scraply scraply help
Contribution ?
for sure you can contribute, how?
- clone the repo
- create your fix/feature branch
- create a pull request
nothing else, enjoy!
About
I'm Mohamed Al Ashaal, a software engineer :)
Recommend
-
55
a super simple stupid event-loop kernel in pure PHP
-
43
README.md Redix a very fast persistent pure key - value store, that uses the same RESP prot...
-
47
a persistent real-time key-value store, with the same redis protocol with powerful features - alash3al/redix
-
43
README.md SQLer SQL-er is a tiny http server that applies the old CGI concept but for SQL queries,...
-
120
README.md
-
5
Redix v5 redix is a very simple key => value storage engine that speaks redis and even more simpler and flexible. Why did I build this? redis
-
8
About a very simple, tiny and intuitive ffmpeg wrapper with a cli interface for inspecting & transforming media files supported by the original ffmpeg software. I wanted to learn mo...
-
8
Javascript Libraries From Top 1 Million Sites CSV files available as open access dataset getsetfetch-dataset-javascript-libraries.csv.gz (146 MB)
-
8
katch! a very simple wrapper utility for headless chrome to easily export any webpage as png, jpeg, pdf or html (prerender), you can use it via http or...
-
6
alash3al/phoo master
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK