Wiki ⇒ Tips and Tricks ⇒ Bots and crawlers ⇒ CPG Dragonfly™ CMS
10. 1: Bots and crawlers
As you all noticed, thanks to LEO more and more bots are overloading your and our website. Most of them are good and some are very evil. Now you want to know which bots can be trusted and which are evil. To get this all listed we've started this page and you may edit/add more details about bots and crawlers.
NOTE: To disallow a bot user the specified User-agent from the bots below and ad the following line below the User-agent line.
ia_archiverAlexa's web crawler
ScooterAltaVista's very friendly crawler, it will never exceed 1% of your server resources due to a nice algorythm that calculates how long it takes to fetch a page and multiplies that time with 100 before it fetches the next page.
To ban the bots in robots.txt use:
TeomaAsk Jeeves webcrawler
More to come like:
IBP, ccubee, FAST MetaWeb Crawler, NutchCVS, Findexa Crawler, Vagabondo, W3C-checklink, Wget, Openbot, noxtrumbot, Minuteman, btbot, Java, 1Noonbot, genieknows, YahooFeedSeeker, VoilaBot, w3search, RPT-HTTPClient, MJ12bot, BDFetch, aipbot, Filangy, Baiduspider, appie, Bilbo, Yahoo-MMCrawler, Pogodak, etc.
And ofcourse the BAD bots like:
LinkWalker, WebReaper, Schmozilla, OmniExplorer, Picture Finder, etc.
Created: Saturday, June 25, 2005 (15:50:30) by DJMaze
Updated: Saturday, August 26, 2006 (05:07:46) by alva