On Vacation In Michigan, RAID problems

I’ve been in Michigan for a few weeks now with my wife and her family and will be here another week. Internet access is at best intermittent, so I haven’t (and won’t) be blogging much.

The technical highlight so far has been trying to troubleshoot problems with the RAID on bostoncoop.net over a cell phone while at the county fair, surrounded by pigs and alpacas.

Speaking of RAID problems: can anyone suggest why more than half of our 200G drives would fail in various ways within a year of installation? They are from various manufacturers (WDC and Maxtor), and have failed differently, and some are giving SMART errors only days after installation. Almost all of the other equipment is new as well. Most commonly the failure shows up as kernel DMA errors, which as best I can tell don’t really point to any particular cause. We suspect temperature problems—is 50-60 celsius enough to be a serious problem?

In particular, I’d appreciate any suggestions as to how to limit the problem to hardware vs. software, hard drives vs. controller(s) vs. motherboard vs. memory… And so forth.

4 comments

  1. Anonymous Jan 28

    Bad power(supply) ?

  2. Arcterex Jan 28

    Only thing I can think of is that the drives are too hot. Maybe try getting some of those HD cooler fans and see if that helps.

  3. Baptiste Jan 28

    I don’t think that kind of temperature make an hard drive broken, mines are hot and I don’t have problem (for the moment :)).
    My thought is the culprit is quality of Hard drives; During a server installation for a customer, I had to change 4 times an SCSI IBM HD because the RAID sofware detected a problem when I creating the RAID. It was really a pain and I lost 3 days.

  4. Spotteddog Jan 28

    I’d look at power supply issues. Flakey random weirdness frequently points to a large ripple or other defect in the power supply.

Leave a Reply

(Markdown Syntax Permitted)