5 Tips To Create A More Reliable Web Crawler
source link: https://www.tuicool.com/articles/QF7biiZ
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
W hen I am crawling websites, web crawlers being blocked by websites could be describe as the most annoying stuff. To be really great in web crawling, you should not only able to write the xpath or css selectors very fast but also how you design your crawlers matters a lot especially in the long run.
During the first year of my crawling website’s journey, I am more focus on how to scrape website. Being able to scrape the data, clean and organise it, this achievement already can make my day. After crawling more websites, is when I find out there are 4 important elements which are the most important to be a great web crawlers.
Speed of the crawler
Are you able to scrape the data in your limited time?
Completeness of the data scraped
Do you manage to scrape all the data you want to scrape?
Accuracy of the data scraped
How can you ensure the data scraped is accurate?
Scalability of the web crawler
Could you scale the web crawler if the amount of websites increases?
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK