I know I've been here before, but I'm hoping to get a thorough and detailed analysis, hopefully leading to a permanent solution. This is also a different server than I had in the past, better CPU etc. I'll try to make this as detailed and thorough as possible, but if I can provide any more info, let me know.
Here is the situation: Most of the time, the server load (viewing top) is reasonable -- somewhere between 0.8 and 2.5. Sites are responsive, things are good. But periodically, at random times, the load will shoot up to 6, 10, 20, I even saw 74 once. Tonight it has been spiking above 5 quite a lot. Usually I will restart Apache from WHM and it will calm down. More often than not, the wait-states will be very high, near 100%, and this is when the server is basically unresponsive. Tonight I simply put my big DF site into maintenance mode and the load dropped below 0.5 within a minute or so. Ten minutes later, I turned it back on, and quite soon the load was above 3.
The site: My site is www.mlparena.com, a fairly popular site dedicated to My Little Ponies. I administer it for my wife. You wouldn't think it would be too popular, but it seems to be. I'm running DF 18.104.22.168. I plan to upgrade, but unless someone says it's much faster, I'm not exactly keen to open that can of worms, especially considering the amount of tweaks, additional modules, etc that it has.
The extras: I have integrated Menalto Gallery 2.02, via the dragonflied module. The database for this is about 66MB at this point. Sometimes I turn off this module but I don't notice a significant reduction in the load. I would like to run IPTracker, but haven't had it on in a year for fear it was increasing the MySQL load. I also have the following modules up and running:
forums centre block
most online users individual
Another issue that concerns me is the size of my database. I've never culled old posts and it goes back to 2003. The main db for the site is 910MB. In the past when I've tried doing maintenance, it never ever completes, just quits after awhile with the old posts still there. Maybe this isn't a big deal, maybe it's huge, I have no idea.
We also have a lot of forums, and a lot of ranks, including many unique ranks for individuals. I've often wondered if doing so added to the load in some way.
The server. I'm using a Planet single Xeon 2GB ram server.
1 Intel \ 2.0 GHz 1333FSB - Woodcrest \ Xeon 5130 (Dual Core)
2 Generic \ 1024 MB \ DDR2 667 FB DIMM
1 Dell \ 9G Drive Controller - SAS/SATA \ SAS 5/i
1 Western Digital \ 250GB:SATA2:7200RPM \ WD2500JS
WHM, cpanel, redhat, the rest of my server specs are in my sig.
I had configserver go over it right after I got it. They did their cpanel special. I also asked them to tune my.cnf. This is what that looks like today:
You should also add a crawl-delay (5) to robots.txt - Yahoo, and others that obey crawl-delay, can destroy you without it. Just blocking them in htaccess can do you more harm than good.
However, there are numerous wannabes and spammers that are blocked, and others that you can block in the security system (9.1.1) that DJ refers to.
Google ignores it , but they do state you can contact them for a reduced crawl rate. Beware of this, as their idea of reduced rate can be "switched off", and you end up with no indexing. Like MS, Google does its own thing, regardless of what users want - come to think of it, Google takes the lead these days - MS has improved.
Server specs (Server OS / Apache / MySQL / PHP / DragonflyCMS):
Joined: Mar 05, 2005
Location: My Little Pony Arena
I have noticed that I get a number of emails from the server saying that this or that IP was blocked temporarily for too many connections. Usually there are around 320 connections on one IP. The few IPs have checked have been on my DF site.
I always attributed this to the server being slow, so maybe the user trying to access the site repeatedly. But maybe something else is happening?
Is it normal to have so many sleeping processes? Seems like 200 to 250 is the norm for me.
This will monitor mysql for short while then give you suggestions for improving performance.
As an example on my server it suggested the surprising but successful recommendation to turn off query caching - something I had previously been increasing to take advantage of more memory. In our case the writes to the DB were frequent enough that caching was actually reducing performance.
Another very helpful figure that monyog produces would have saved me a lot of pain in the past is 'worst case memory usage'. If that is more than the RAM you have (less some for Apache - approx 10kB per thread) then it would explain your spikes problem for instance. I had this as I was allowing too many connections, the server started using swap space, it therefore responded more slowly, more connections queued and so on until it died.
That said you don't need a tool to see that if all your DB connections are used then you will use way more RAM than you actually have - your key cache (a global mysql cache) plus 500 * the dedicated per thread buffers gives well over 4GB, just for MySQL.
I think it is very likely that you can stop your spikes just by reducing the number of MySQL connections you allow. The go from there with monyog and see how you get on.
I'm back from my vacation (when you take your family, it's hardly a vacation!) and ready to dive in. Unfortunately, I feel I have more questions than answers when it comes to this subject. I'm just a little glad to know I'm not the only one.
Like marcus81, I also have a number of sites on my server, but more like 100, not 200. And also like him, my own site, a cpgnuke site, is by far the heaviest hitter. I am anxious to upgrade to 9.1 but haven't had the energy or time to undertake it.
I liked what hqarrse had to say. I'd like to try MONyog, and I probably have to reduce my number of connections as well. I htink mine is set to 800! What happens after the max number is reached, from the users persepctive? I installed MONyog but so far can't get it to connect to mysql. But I did get it to connect to SSH, whatever thats worth. If you have any suggestions I sure would appreciate any help on that.
This also brings up another issue. I have a security package installed by config server. One thing it does is block users temporarily if they have too many connections. This happens maybe once a day and happened just now. I checked their IP and sure enough, they are a member of my cphnuke site. It claimed they had 340 connections open. Basically the email looked like this: