• asudox@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    4 months ago

    Block? Nope, robots.txt does not block the bots. It’s just a text file that says: “Hey robot X, please do not crawl my website. Thanks :>”

    • ɐɥO@lemmy.ohaa.xyz
      link
      fedilink
      arrow-up
      1
      ·
      4 months ago

      I disallow a page in my robots.txt and ip-ban everyone who goes there. Thats pretty effective.

        • bountygiver [any]@lemmy.ml
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          4 months ago

          humans typically don’t visit [website]/fdfjsidfjsidojfi43j435345 when there’s no button that links to it

          • Avatar_of_Self@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            4 months ago

            I used to do this on one of my sites that was moderately popular in the 00’s. I had a link hidden via javascript, so a user couldn’t click it (unless they disabled javascript and clicked it), though it was hidden pretty well for that too.

            IP hits would be put into a log and my script would add a /24 of that subnet into my firewall. I allowed specific IP ranges for some search engines.

            Anyway, it caught a lot of bots. I really just wanted to stop automated attacks and spambots on the web front.

            I also had a honeypot port that basically did the same thing. If you sent packets to it, your /24 was added to the firewall for a week or so. I think I just used netcat to add to yet another log and wrote a script to add those /24’s to iptables.

            I did it because I had so much bad noise on my logs and spambots, it was pretty crazy.