How to block AI Crawler Bots using robots.txt file

Cynicus Rex@lemmy.ml · 4 months ago

How to block AI Crawler Bots using robots.txt file

asudox@lemmy.world · 4 months ago

Block? Nope, robots.txt does not block the bots. It’s just a text file that says: “Hey robot X, please do not crawl my website. Thanks :>”

ɐɥO@lemmy.ohaa.xyz · 4 months ago

I disallow a page in my robots.txt and ip-ban everyone who goes there. Thats pretty effective.

JackbyDev@programming.dev · 4 months ago

Did you ban it in your humans.txt too?

bountygiver [any]@lemmy.ml · edit-2 4 months ago

humans typically don’t visit [website]/fdfjsidfjsidojfi43j435345 when there’s no button that links to it

Avatar_of_Self@lemmy.world · 4 months ago

I used to do this on one of my sites that was moderately popular in the 00’s. I had a link hidden via javascript, so a user couldn’t click it (unless they disabled javascript and clicked it), though it was hidden pretty well for that too.

IP hits would be put into a log and my script would add a /24 of that subnet into my firewall. I allowed specific IP ranges for some search engines.

Anyway, it caught a lot of bots. I really just wanted to stop automated attacks and spambots on the web front.

I also had a honeypot port that basically did the same thing. If you sent packets to it, your /24 was added to the firewall for a week or so. I think I just used netcat to add to yet another log and wrote a script to add those /24’s to iptables.

I did it because I had so much bad noise on my logs and spambots, it was pretty crazy.

How to block AI Crawler Bots using robots.txt file

How to block AI Crawler Bots using robots.txt file

Attention Required! | Cloudflare