[RDD] Redundant Hard Drive/Backup

Cowboy curt at cwf1.com
Thu Jan 9 07:21:57 EST 2014

On Wednesday 08 January 2014 06:42:36 pm Alan Smith wrote:
> So now its down to rsync via chron, or software raid...

 Now that I understand, I'd recommend you do what I do,
 or something along these lines......
 Software RAID-1. with a twist.
 Each physical disk has 3 partitions.
 1 swap, 1 very basic minimal system, 1 everything else.

 This way, disk reads are faster, ( it's striped in the RAID driver ) but
 disk writes are slower. ( must write the same data twice )
 Swap is striped, so swap is faster. ( if you're short on RAM )
 Either disk can function complete as a non-RAID stand alone device,
 and/or can be accessed directly without a RAID driver in the kernel,
 in the event it becomes necessary. ( some day, some way, it will )

 The basic system is one very small RAID-1 partition. ( about 200 MB )
 /boot /bin /sbin /etc /lib a number of mount points, and not much else.
 Things that essentially never change.
 That partition is also backed up on a bootable CD.
 Everything else is a RAID-1 partition that houses files that
 can, do, or might change with some frequency.
 That md partition is rsync'd to another machine as a backup.

 RAID will not help you if the motherboard fails, for instance.

 Periodically ( daily, weekly ) removing one of the physical disks,
 putting it on a shelf, and replacing it with another good disk,
 does leave you with a backup on the shelf.
 rsync can do this for you, but requires another machine on the
 network to house that backup.

 The kernel maintains mdstat in /proc which is the current health
 of the RAID md devices dynamically updated constantly.
 A cron job compares a recorded copy of mdstat to the dynamic
 /proc/mdstat file every 5 minutes or so.
 If they are not identicle, something has changed, and I want to
 know very quickly. The system starts screaming for attention,
 sending e-mails, flashing the screen, beeping the speaker....
 but keeps on running off of the non-failed device.
 If the disks were purchased and installed at the same time, you've
 got about a week to deal with it.
 Disks manufactured at the same time, with the same run time on them, 
 tend to fail within about 8 days of each other on average.
 Maybe longer, but I wouldn't.

 That way, I can replace the failed drive, re-boot and let the system
 rebuild the RAID with about 5 minutes total off-line down time.
 Rebuilding a 2 terabyte RAID takes hours, but the system is up
 and running while it happens, with a little planning.

 If there is a spare disk in the machine, the RAID driver can swap
 out the failed disk all by itself, but that's a little advanced.



Never eat more than you can lift.
		-- Miss Piggy

More information about the Rivendell-dev mailing list