Popularity Out of Control
This blog has been semi-offline for at least a few hours. Sorry about that.
Through an interesting (at least to me) series of events, I discovered that my custom “popularity” hack/plugin was out of control.
It started earlier today when one of the the RAID discs on this server gave a SMART error. I’ve never been able to really judge whether SMART errors are “for real” or not—sometimes I’ve had a disc with a serious SMART error that, after another test, will run fine for another year or two without incident.
In any case, I wasn’t so worried, since this system is running RAID-5, so a single disc failing, even catastrophically, doesn’t result in any data loss or downtime.
I decided to take the disc out of the RAID and run some SMART tests on it. I marked it with mdadm as FAILed (mdadm —fail) and then removed it. Strangely, the disc first passed a short test with no problems, but then ceased to be recognized as having SMART capabilities at all.
I’ve had that problem before and never figured it out—the SMART capabilities seem to come back eventually. Not knowing what best to do, I just put the drive back into the RAID to see what would happen.
Since the disc had already been marked as FAILed, the reconstruction started from scratch. Although I’ve got a pretty good (2Ghz) CPU and plenty (2G) of RAM, kblockd and md0_raid5 are using up most CPU cycles and everything has slowed down significantly. Reconstruction seems to be proceeding fine, just slowly—it will take 24-36 hours at the current rate. Presumably it would be much faster in single user mode, but I can’t take this server offline.
In the meantime, I noticed that my blog had stopped responding entirely, while other blosxom blogs served from this same server were fine (maybe a little more latency than usual). So I started parsing through the differences in my blosxom installation and everyone else’s to see what could possibly be slowing things down so much as to time out the blog.
Eventually I identified the culprit: my custom-made “popularity” plugin, which reports the most popular entries on this blog and the number of hits they have received. I hacked it together several years ago. I think at the time I just wanted to see if it would work, with the plan to come back later and fix it. I guess I forgot the “fix it” part.
My popularity plugin creates a log file that it reads in each time the blog is accessed, and then appends to that file. Over the years, that file has grown to 17 megabytes. Although this is a huge waste of system resources, I didn’t notice it when the system was running at full speed. With the reduced performance from the RAID reconstruction, however, this meant that my blog never finished loading at all.
If you made it this far, you certainly deserve some sort of geek admin award. Congratulations. I deserve some sort of stupid admin award, myself.
P.S. Apparently I must have forgotten to check my horoscope today. The laundry machine flooded the basement when I didn’t check the sink into which the laundry machines empties. I should probably avoid sharp objects for the rest of the day.
Chung-chieh Shan Jan 28
My sympathies.