Miscellaneous ⇒ Search Engines ⇒ Baidu (180.15.x.x) and its Fictitious urls ⇒ Community Forums ⇒ CPG Dragonfly™ CMS
Forum IndexSearch Engines

Baidu (180.15.x.x) and its Fictitious urls Reply to topic


Had my site down this morning. Totally blank pages, and emails to admin every few seconds! Corrupted session table. All repaired now.

But the user at the time was baidu. I have baidu's spider (202.x.x.x) blocked in robots.txt, and baidu are very clear that they respect robots.txt. And they do sort of ... 202.x doesn't visit, but they send 180.15.x.x instead. Lots of them, which makes obeying the delay pretty moot, that do not heed robots. txt. So I've blocked 180.15.0.0 - 180.15.255.255 and wait to see what Ip it uses next.

I don't know for sure that this was the cause of the session table issue - it was probably provoked by something else happening as well. But the url they were using!!! No way is this a Search engine crawler.

..../Pro_News/sid=7/Your_Account=/profile=****/newlang=swedish.html

The profile username I've blanked out as it was a valid member's name on my site. Pro_News/sid=7 is valid. But the rest of it is taken from at least 1, possible 2, different urls, and then merged!?! What's that about? Anyone have any ideas?

The next in line after my DF went into a loop trying to delete a session entry was even more bizarre:

..../Pro_News/sid=4/news=/pro_news=/credits=/newlang=english/

If you feed Pro_News a garbled url which at least starts out correctly, it'll return you a bank page. I thought it best not to try to guess the user's intent just in case they were fishing. Correct? But perhaps I should send a 404? How would I do that in DF/PHP?

Pro_News CM™ - Content Management for Dragonfly CMS™

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux / 1.3.39 - 2.4.9 / 5.5.42 - 5.6.16 / 5.4.37 - 5.5.11 / 9.4


<base href=""> could be the issue.

Say you are at: /Pro_News/sid=7/example.html
If baidu ignored the base href and there's a link to "Your_Account=/profile=****/newlang=swedish.html"
It will map that to /Pro_News/sid=7/Your_Account=/profile=****/newlang=swedish.html

However, i noticed the Baidu crawler on many websites (including them without base href) is just messed up with links.

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Fedora 25 / Apache 2.4.27 / MariaDB 10.1.26 / PHP 7.1.10 / Mercurial


Thanks for that. Your explanation makes sense, and could happen - except perhaps for the username in 2nd part (most DF links are to profile number not name).

But my second example appears to be 3 or 4 urls concatenated. So I think it's just screwed up as you say. Or it's trying for weaknesses present in php-nuke or similar???

Oh, and Baidu did come back! Rolling Eyes So now added 123.125.70.108 plus 123.125.71.0 - 123.125.71.255

Pro_News CM™ - Content Management for Dragonfly CMS™

Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Linux / 1.3.39 - 2.4.9 / 5.5.42 - 5.6.16 / 5.4.37 - 5.5.11 / 9.4

All times are UTC


Jump to: