I know I've been here before, but I'm hoping to get a thorough and detailed analysis, hopefully leading to a permanent solution. This is also a different server than I had in the past, better CPU etc. I'll try to make this as detailed and thorough as possible, but if I can provide any more info, let me know.
Here is the situation: Most of the time, the server load (viewing top) is reasonable -- somewhere between 0.8 and 2.5. Sites are responsive, things are good. But periodically, at random times, the load will shoot up to 6, 10, 20, I even saw 74 once. Tonight it has been spiking above 5 quite a lot. Usually I will restart Apache from WHM and it will calm down. More often than not, the wait-states will be very high, near 100%, and this is when the server is basically unresponsive. Tonight I simply put my big DF site into maintenance mode and the load dropped below 0.5 within a minute or so. Ten minutes later, I turned it back on, and quite soon the load was above 3.
The site: My site is
www.mlparena.com, a fairly popular site dedicated to My Little Ponies. I administer it for my wife. You wouldn't think it would be too popular, but it seems to be. I'm running DF 9.0.6.1. I plan to upgrade, but unless someone says it's much faster, I'm not exactly keen to open that can of worms, especially considering the amount of tweaks, additional modules, etc that it has.
The extras: I have integrated Menalto Gallery 2.02, via the dragonflied module. The database for this is about 66MB at this point. Sometimes I turn off this module but I don't notice a significant reduction in the load. I would like to run IPTracker, but haven't had it on in a year for fear it was increasing the MySQL load. I also have the following modules up and running:
dfmaps
daily babe
flash games
forums centre block
most posts
most online users individual
reviews
shoutblock &
treasury module
Another issue that concerns me is the size of my database. I've never culled old posts and it goes back to 2003. The main db for the site is 910MB. In the past when I've tried doing maintenance, it never ever completes, just quits after awhile with the old posts still there. Maybe this isn't a big deal, maybe it's huge, I have no idea.
We also have a lot of forums, and a lot of ranks, including many unique ranks for individuals. I've often wondered if doing so added to the load in some way.
The server. I'm using a Planet single Xeon 2GB ram server.
1 Intel \ 2.0 GHz 1333FSB - Woodcrest \ Xeon 5130 (Dual Core)
2 Generic \ 1024 MB \ DDR2 667 FB DIMM
1 Dell \ 9G Drive Controller - SAS/SATA \ SAS 5/i
1 Western Digital \ 250GB:SATA2:7200RPM \ WD2500JS
WHM, cpanel, redhat, the rest of my server specs are in my sig.
I had configserver go over it right after I got it. They did their cpanel special. I also asked them to tune my.cnf. This is what that looks like today:
| Code:: |
[client]
port
socket
[mysqld]
port
socket
skip-locking
key_buffer = 256M
max_allowed_packet = 1M
table_cache = 256
sort_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size= 16M
thread_concurrency = 2
max_connections = 500
skip-innodb
log-slow-queries = /var/log/mysql-slow.log <- I just looked, this log file is now 65MB!
long_query_time = 5
server-id = 1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[isamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout |
Backups are turned off (unfortunately!) Stats are off on most sites, including this one.
My current theory is that maybe I am getting slammed by bots. This would explain the random timing. Either that or something is going on with the server that I am not aware of.
Here is my robots.txt file:
User-agent: *
Disallow: notmyrealadmin.php
Disallow: error.php
Disallow: /admin/
Disallow: /blocks/
Disallow: /cache/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /modules/
Disallow: /themes/
User-agent: Baidu
Disallow: /
User-agent: *alexa*
Disallow: /
User-agent: Googlebot-Image
Disallow: /
I am including some screenshots of top. These were all taken tonight. I don't claim to understand them much.
www.mlparena.com/top2.gif
www.mlparena.com/top3.gif
www.mlparena.com/top4.gif
www.mlparena.com/top5.gif
www.mlparena.com/top6.gif
www.mlparena.com/top7.gif
www.mlparena.com/top8.gif
www.mlparena.com/top9.gif
www.mlparena.com/top10.gif
top9 is the nice status after I turned off the pony site for 10 minutes, then turned it back on. top10 was an hour later.
I also get an email if the server load is high for 5 minutes or more. Here's one from tonight. Since it's longer than what I see in top, it shows more processes.
www.mlparena.com/highload.txt
it's been particularly bad tonight.
I am pretty desperate to get this sorted out. Besides this site, I have a few paying clients who don't appreciate the server being unreachable ever, much less this often.
I am also hoping I can find the solution here because everyone else simply says DF is to blame and I should look for a different script. If I should pay for a particular service, I will.
I don't know if there is going to be a magic bullet, or a series of minor things that add up, but I'm sure hoping this can be fixed