Miscellaneous ⇒ Search Engines :: Archives ⇒ Bots browing my cpgnucalendar all the way to 2099 :: Archived ⇒ Community Forums ⇒ CPG Dragonfly™ CMS
Forum IndexSearch Engines

Archived ⇒ Bots browing my cpgnucalendar all the way to 2099


I have many bots that keep wasting my precious bandwidth by crawling all over my cpgnucalendar

my cpg site is in a subdomain

www.domain.com/subdomain/index.php

in the subdomain, i already prevent bots from going into the modules directory and yet, in the who where block, i keep seeing bots and crawlers in the cpgnucalender..


how to stop them?

some are msn and yahoo bots...
at times i do get google bots going there as well.

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
9.0.6.1


using robots.txt

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
9.0.6.1


They can't get to the modules directory anyway. Try this:

Disallow: /CPGNuCalendar*

Diagon Alley - Top Design

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux/1.3.37/4.1.21-standard/4.4.4/9.1.1


not helping..
yahoo/intomi bots still in my cpgcalendar with the robots.txt updated above

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
9.0.6.1


I don't think they read it every time they visit, I think they cache it for a while or something.

Diagon Alley - Top Design

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux/1.3.37/4.1.21-standard/4.4.4/9.1.1


my subdomain stats in cpanel of my webhost states that these bots take up 9-10 mb of bandwidth each...

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
9.0.6.1


Until the bots have digested your new robots.txt a solution could be to set the module to members only...?

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux/Apache/5.0.24/5/9.1 CVS


One thing I've noticed is that bots love calendars...more than anything else I've ever seen. I have no idea why. Maybe bots are designed to look for the most current information so when they see a calendar they chew through the info like it was candy.

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux/1.3.33/4.4/4.3.11


setting calendar module to registered members only..

pity...

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
9.0.6.1


farbird wrote
setting calendar module to registered members only..

pity...


You can still have the blocks show some info, only the links won't work now.

But a better solution would be this:


<META NAME=”ROBOTS” CONTENT=”NOINDEX”>

You can read more about the noindex and nofollow meta tags on our webmaster pages.


I'm too busy right now to dig in CPGNuCalendar's code, but shouldn't this be done by setting Dragonfly's $METATAGS['robots'] in parts of the CPGNuCalendar code just before require_once('header.php');?

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux/Apache/5.0.24/5/9.1 CVS


even when i restrict the block and modules to registered members only, the bots still crawled all over the calendar module...

exasperated....

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
9.0.6.1


theyre still following old links, be patient. if theyre using 10mb of bandwidth thats not so bad...

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux/1.3.33 (Unix)/4.0.22-standard/4.3.9/DF 9.x


ok..
will wait out..

darn the bots.

hehe

can't live with them, can't live without them

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
9.0.6.1


They're just crawling a message that tells them they have to be registered to view that page. They love to do the same thing with for example, Private Messages until you disallow that too.

Diagon Alley - Top Design

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux/1.3.37/4.1.21-standard/4.4.4/9.1.1


I'm getting slammed also. I've also tried everything.

A bug in the code is worth two in the documentation.

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Ubuntu Linux 8.04.1/Apache 2.28/MySQL 5.0.51/ PHP 5/DF 9.2.x

All times are UTC