Miscellaneous ⇒ Search Engines :: Archives ⇒ Blocking bots :: Archived ⇒ Community Forums ⇒ CPG Dragonfly™ CMS
Forum IndexSearch Engines

Archived ⇒ Blocking bots


Is there any way of blocking bots from roaming a df site?

I am still getting heaps of problems from my host about using too many CPU resources. However, I've come to realise that the peaks are occurring when most of my users are NOT online.

Basically around 95% of my users are based in Scotland. Some of the peaks are occurring at 2am BST ... ie when most of my users are OFFline!! I can only put this down to bots and harvesters.

Can I either block them or reduce their access to the site? Quite frankly, I don't care if my search-engine rating goes down as a result.

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Unix / Apache 2.2.6 / MySQL5.0.45 / PHP5.2.5 / CPGNuke9.2.1


In your root dragonfly folder, find robots.txt file.

Edit it so that it now reads:

User-agent: * Disallow: /

That will ban all bots from your site Smile

Norbie

www.norbiesworld.co.uk

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux / Apache Custom Version / 4.0.26-standard (client: 5.0.15) / 4.4.4 / 9.1.1


that's awesome. i might do that. thanx norbie

i'm just that sweet

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Windows XP | P4HT 3.4| 1.5GB RAM | 256 Vid Card PCIX | Apache2.0.52 | MySQL 1.4.8 | PHP 4.3.10 | 9.03


You could also put:

User-agent: * Crawl-delay: 20

instead, which would tell all bots to pause for 20 seconds before indexing the next page.

EDIT: If you use this, make sure you keep what was in your robot.txt file before, and just add that onto the end of the file.

Norbie

www.norbiesworld.co.uk

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux / Apache Custom Version / 4.0.26-standard (client: 5.0.15) / 4.4.4 / 9.1.1

Last edited by norbie on Sun Oct 09, 2005 9:27 pm; edited 1 time in total


all of that assumes bots behave like they are supposed to and check robots.txt. That assumption will be wrong as often, if not more, then it is correct.

NEMINI.org, NEMINI.us, NEMINI.info, NYMINI.org

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
1.3.34 (Unix)/4.1.18-standard/4.4.2 /9.1.0.8 CVS


Not All Bot obey robot.txt. How you going to fix that problem?

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Apache/1.3.34 (Unix)/4.0.25-standard/4.4.1/CVS


Currently this website is blocking stuff, within a few weeks a solution regarding this issue will be available.

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Fedora 25 / Apache 2.4.27 / MariaDB 10.1.26 / PHP 7.1.10 / Mercurial


xfsunoles wrote
Not All Bot obey robot.txt. How you going to fix that problem?


It's better than nothing.
I was providing the help that I could.

Norbie

www.norbiesworld.co.uk

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux / Apache Custom Version / 4.0.26-standard (client: 5.0.15) / 4.4.4 / 9.1.1


... and this help is gratefully accepted, norbie. Thanks!!

I'll be very keen to see what this new solution brings, but in the meantime I shall try adding the noted text to the robots file.

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Unix / Apache 2.2.6 / MySQL5.0.45 / PHP5.2.5 / CPGNuke9.2.1


This won't help for viruses and third-party bots. If you want to address them you need a firewall with IDS or a bit code tampering...
I'm currently testing what effect will have on them a replacement of lines like this:
if (!defined('CPG_NUKE')) { exit; }

with this one:
if (!defined('CPG_NUKE')) { header("Location: 127.0.0.1:22"); exit; }

Or to be "user unfriendly" use port 135 instead Very Happy

Wise man - believes half of what he reads. Genius - knows which half to believe.

-- Benny Hill

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Slackware 9.1 Current/Apache 2.05/MySQL 4.1.12/PHP 5.04/Dragonfly 9.0.6.1


xfsunoles wrote
Not All Bot obey robot.txt. How you going to fix that problem?


Use a bot trap to ban those robots that disobey robots.txt.

It have it's downside, though. I once caught Google Wap Proxy with it. Laughing

[edit]Forgot to mention, I use the first method (PHP), so I don't have to fiddle with .htaccess. [end edition]

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux 2.4.24 / Apache 1.3.29 (Unix) / MySQL 4.0.18-standard / PHP 4.3.3/ CVS

All times are UTC