• Home
  • About
  • Contact
  • Disclaimer
  • Sitemap

Too Many Secrets

Feed on
Posts
Comments
« Fix your wordpress permalink structure
Tax time for domainers? »

Block bad bots from your site

Jul 10th, 2008 by Richard

robots imageHave you ever wondered how that email address listed on your web site gets onto those spam lists or how copies of your web site content ends up in places that you don’t want it to?

Well, many times the reason this happens is because bots and crawlers are spidering your web site data for nefarious reasons.

But you can block a lot of this from happening by simply adding a list of the bad bots and crawlers to your web site using a robots.txt file.

I’ve managed to block most of the bad bots as you can see from the chart shown in this post. Only bots you want to visit like google, yahoo, bloglines etc are spidering my site.

When a bot or crawler comes to your site, they normally check first to see if you have a robots.txt file and then check to see if they are listed in the file. If they are listed using the disallow setting, then the bots will go away and not spider your site.

I block about 100 bots from this site. Have a look at my robots.txt for the full list and feel free to copy my list of bad bots and use it for yourself.

To find out how to use your robots.txt file to do all sorts of other things, please check out the official Web Robots Page for more suggestions.

NOTE: Be sure to put your robots.txt file in your web space root because that is where the bots look for it. If you place the file anywhere else, they will not read it, or follow it.

Related Posts
  • Automatically add avatars to your web site
    I was having a conversation with some friends about how easy it is to dynam...
  • Get a fast web site with simple changes
    There are many things you can do to squeeze a little more speed out of your...
  • How to redirect based on referer
    Have you ever wanted to redirect visitors to a special promo page based on ...

Posted in Web Hosting Tips

2 Responses to “Block bad bots from your site”

  1. on 11 Jul 2008 at 5:16 pm1Frank Michlick (DomainNameNews.com)

    Great advice Richard and also great work on the list of bots in your robots.txt.
    Since “bad bots” or even “evil bots” that steal content will ignore your robots.txt file and often pretend to be a normal webbrowser, you may want to consider to also block the bad bots directly in your apache configuration either by host name, IP address or User Agent.

  2. on 11 Oct 2008 at 12:17 pm2Which SEO BOTS to block to save bandwidth? - vBulletin SEO Forums

    [...] is an interesting resource with a list of suggestions for bots to block: Block bad bots from your site __________________ Joe Ward / Crawlability Inc. vBSEO 3.2.0 Launched - Maximum Overdrive for [...]

  • Latest Posts

    • Rank using an anchor text keyword matrix
    • Domainers can learn from Beijing Olympics
    • How to fix Bido
    • Tax time for domainers?
    • Block bad bots from your site
  • Categories

    • Domaining (5)
    • Funny Stories (1)
    • Reviews (2)
    • Web Hosting Tips (9)
    • Web Site Development (5)
  • Popular Posts

    • How to redirect based on referer
    • Fix your wordpress permalink structure
    • Avoid wordpress duplicate content
    • Domainers have a bad reputation
    • Block bad bots from your site

Too Many Secrets © 2008 All Rights Reserved.