Had my site down this morning. Totally blank pages, and emails to admin every few seconds! Corrupted session table. All repaired now.
But the user at the time was baidu. I have baidu's spider (202.x.x.x) blocked in robots.txt, and baidu are very clear that they respect robots.txt. And they do sort of ... 202.x doesn't visit, but they send 180.15.x.x instead. Lots of them, which makes obeying the delay pretty moot, that do not heed robots. txt. So I've blocked 126.96.36.199 - 188.8.131.52 and wait to see what Ip it uses next.
I don't know for sure that this was the cause of the session table issue - it was probably provoked by something else happening as well. But the url they were using!!! No way is this a Search engine crawler.
The profile username I've blanked out as it was a valid member's name on my site. Pro_News/sid=7 is valid. But the rest of it is taken from at least 1, possible 2, different urls, and then merged!?! What's that about? Anyone have any ideas?
The next in line after my DF went into a loop trying to delete a session entry was even more bizarre:
If you feed Pro_News a garbled url which at least starts out correctly, it'll return you a bank page. I thought it best not to try to guess the user's intent just in case they were fishing. Correct? But perhaps I should send a 404? How would I do that in DF/PHP?
Say you are at: /Pro_News/sid=7/example.html
If baidu ignored the base href and there's a link to "Your_Account=/profile=****/newlang=swedish.html"
It will map that to /Pro_News/sid=7/Your_Account=/profile=****/newlang=swedish.html
However, i noticed the Baidu crawler on many websites (including them without base href) is just messed up with links.
Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Fedora 25 / Apache 2.4.27 / MariaDB 10.1.26 / PHP 7.1.10 / Mercurial