3

A Facebook crawler was making 7M requests per day to my stupid website

 1 year ago
source link: https://coding.napolux.com/a-facebook-crawler-was-making-7m-requests-per-day-to-my-stupid-website/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A Facebook crawler was making 7M requests per day to my stupid website – On Coding

A Facebook crawler was making 7M requests per day to my stupid website

June 08, 2020

I own a little website I use for some SEO experiments. Of course there’s some content and a facebook sharing button for every post.

The website is so little it runs on a "single controller" PHP app + a 400kb SQLite db, but can generate thousands of different pages.

Everything is hosted (together with a bunch of other websites) on a cheap DigitalOcean machine + free cloudflare plan for some caching. One of those websites has some alerting and it started to alert me about being down.

After some investigations I’ve found out the problem… the Facebook Crawler

That crawler was making more than 7M requests per day (with a peak of 300req/second) to that website.

Their doc was not helping on how to block the bot.

  • og:ttl -> ignored
  • robots.txt -> ignored
  • HTTP 429 -> ignored

I had to block the user-agent using cloudflare rules.

If there’s someone working on that crawler reading this, please stop ignoring basic Internet netiquette about crawlers.

Next time you could hit someone on AWS. And then they’ll probably ask you to pay the bill 😉

Edit: looks like I’m in homepage on HackerNews

For the ones wondering, here is an IP from a request, it’s for sure a Facebook IP.

If you want to share a comment or report an issue with this post, please send me an email to [email protected]

MORE WORDS...

Block the DOM in order to inspect interactive elements

A very dirty trick to inspect interactive stuff in the browser (works in Chrome and Firefox) by leveraging the debugger.

Read more »

About me

I'm Francesco, a programmer. Somebody call me Napo or Napolux. This is my blog about coding. You can find all the small projects I work on in my spare time here.

If you like these tutorials or you think they're useful, you can buy me a coffee using PayPal.

Please, read the license: CC BY-NC-SA. Made with WordPress.

Theme coding.napolux.com heavily inspired by Ruud van Asseldonk's website.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

g.gif?v=ext&j=1%3A10.7&blog=147038722&post=259&tz=2&srv=coding.napolux.com&host=coding.napolux.com&ref=&fcp=5526&rand=0.8098821391886244


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK