Block unwanted crawlers wasting your resources

Block unwanted crawlers wasting your resources

 

 

 

Looking at your server logs can show you a huge amount of traffic that is just eating away your server response.

To tighten it up here are two helpful snippets of code that will stop them dead in there tracks.

Firstly;

In your htaccess file found in the root of your domain.

Add the following for robots/crawlers that don’t obey the robots.txt disallow rules.

Just under —

## enable rewrites

add the following

Options +FollowSymLinks
RewriteEngine on
## Added by citricmedia.co.uk to prevent unrequested crawling
#Commercial Crawlers
RewriteCond %{HTTP_USER_AGENT} BLEXBot|MJ12bot|TwengaBot|008|AhrefsBot|SemrushBot|WotBox [OR]
#Foreign Local Search Engines
RewriteCond %{HTTP_USER_AGENT} YandexBot|Sosospider|Baiduspider|ZumBot
RewriteCond %{REQUEST_URI} !robots.txt
RewriteRule .* – [F,L]

Next in your robots.txt file.

Add the following for the ones that do;

User-agent: BLEXBot
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: TwengaBot
Disallow: /
User-agent: 008
Disallow: /
User-agent: AhrefsBot
Disallow: /
User-agent: SemrushBot
Disallow: /
User-agent: WotBox
Disallow: /
User-agent: Sosospider
Disallow: /
User-agent: Baiduspider
Disallow: /
User-agent: ZumBot
Disallow: /
User-agent: YandexBot
Disallow: /

If you don’t have a robots.txt file just create one called robots.txt and upload it to the root of your domain via ftp.

That’s it now you can get your bandwidth back add as many as you need and please let us know if you find others that are hurting you so we can update this post for others