Quantcast
Channel: Karl Katzke » mysql
Viewing all articles
Browse latest Browse all 10

Detecting and Resolving LAMP Stack Problems – Scheduled Downtime

$
0
0

In the last issue of my current consulting saga, Detecting and Resolving LAMP Stack Performance Problems, we talked about a Drupal site that was being brought offline every few hours due to poor tuning of the LAMP stack. With the default settings, a site isn’t going to take much before it just falls flat on it’s face.

After triaging and addressing the main issues based on the logs, we were left with two more issues. The first was the inability of Drupal to perform well in an environment where it had to rebuild every page from source for every page view. This is well documented in the drupal community; there are many pages inn the documentation area of Drupal that deal with caching and performance optimization. The second issue was MySQL performance and the long table lock/scan times we were seeing on some queries that could not be further optimized.

We scheduled a 2 hour downtime with the customer to install some tools. Our checklist was installing memcached and PHP-APC. I also wanted to take the time to back up the MySQL database and run a good check_table on each of the MyISAM tables. (Yes, I know. MyISAM. More on that later.)

Side note: I would typically prefer xcache, which in my mind is superior to APC because I have an easier time working with it and prefer it’s management interface and tuning parameters. However, APC was available as a binary package for the platform we were on, and xcache was not. To make things faster and easier, we chose APC. Despite the endless debate about which is superior, both are usable and work. I have not run into problems using APC on an 8-core system, despite oft-reported-but-never-proven flock() issues.

APC was fast to install and required minimal tuning. It produced a noticeable performance improvement. However, the number of deadlocked apache threads (and total number of apache threads) went up, and the other Apache errors that dealt with clients timing out did not cease.

We installed the Drupal Memcache implementation along with the appropriate PECL module. We configured two pools, both using up to 1 GB of RAM (which we had to spare on the web server.) The ‘hot’ pool would mostly handle cached pages for non-logged-in users, and the other one would handle some higher volume caching for users that are logged in, as well as some internal/custom functionality to go along with specialized RSS feed parsing. (Side note: We found that the Cache and Cacherouter plugins did not work as expected. Rather than waste downtime troubleshooting them, we used what worked.)

Again, we saw a huge performance boost. We needed to do some tuning (changing certain cache settings and analyzing performance, but that was essentially everything that we could find to do from a single-server web server side of things.

While we’re on the topic of drupal: Don’t forget that Drupal has a ‘cron’ program that should be getting called remotely. It’s sort of a poor man’s cron solution, but it works. It was causing our load to spike every 20 minutes. We occasionally disabled it during testing to be sure we understood it’s effects.

The next beast to tackle was the database. As previously mentioned, it was on MyISAM tables. Obviously, this isn’t ideal. We found that node lookups, statistics lookups, and searches were taking up a disproportionate amount of server time because they were both The weirdest part was that we were seeing some full table scans in the slow query log (i.e. 3 million rows scanned) but a later ‘explain’ statement couldn’t replicate the performance recorded in the slow query log.

We batted around adding indexes. The issue was that Drupal’s search and nodes tables are frequently altered, which means the indexes become scrambled quickly. And really, what was taking time was the size of the table we were dealing with — the table wouldn’t fit in memory, so it was copying it to a disk temporary table and then doing a filesort.

Running check_table did the trick to re-sort the indexes and ‘defrag’ the files, but the benefits only lasted so long.

What we ended up doing was taking the database down, dumping everything out to a SQL file, and re-importing everything to InnoDB. Make sure that innodb_files_per_table is enabled, or you might end up with some unexpectedly big files — this depends on your architecture and filesystem. Remember that InnoDB files can not currently shrink. (Also: You can do the table changes online, but it’s really not recommended. It takes a long time, especially when some of your tables are larger than 1gb.) Don’t forget to switch to set innodb_buffer_pool_size appropriately.

The change to InnoDB, the implementation of both PHP engine-level opcode and actual built pages, and the careful tuning of Apache and MySQL parameters led to stability for this client.

There were some further problems, but they were with an unrelated product that causes a nightly load spike on the database machine. Tomorrow night I’ll covering the cleanup work: NFS iops vs. local disk, binary logging and the lack of backups in the original configuration, and building some redundancy into the system so that it can tolerate faults more smoothly.


Viewing all articles
Browse latest Browse all 10

Latest Images

Trending Articles





Latest Images