[RDD] Redundant Hard Drive/Backup
curt at cwf1.com
Thu Jan 9 07:21:57 EST 2014
On Wednesday 08 January 2014 06:42:36 pm Alan Smith wrote:
> So now its down to rsync via chron, or software raid...
Now that I understand, I'd recommend you do what I do,
or something along these lines......
Software RAID-1. with a twist.
Each physical disk has 3 partitions.
1 swap, 1 very basic minimal system, 1 everything else.
This way, disk reads are faster, ( it's striped in the RAID driver ) but
disk writes are slower. ( must write the same data twice )
Swap is striped, so swap is faster. ( if you're short on RAM )
Either disk can function complete as a non-RAID stand alone device,
and/or can be accessed directly without a RAID driver in the kernel,
in the event it becomes necessary. ( some day, some way, it will )
The basic system is one very small RAID-1 partition. ( about 200 MB )
/boot /bin /sbin /etc /lib a number of mount points, and not much else.
Things that essentially never change.
That partition is also backed up on a bootable CD.
Everything else is a RAID-1 partition that houses files that
can, do, or might change with some frequency.
That md partition is rsync'd to another machine as a backup.
RAID IS **NOT** A BACKUP !!
RAID will not help you if the motherboard fails, for instance.
Periodically ( daily, weekly ) removing one of the physical disks,
putting it on a shelf, and replacing it with another good disk,
does leave you with a backup on the shelf.
rsync can do this for you, but requires another machine on the
network to house that backup.
The kernel maintains mdstat in /proc which is the current health
of the RAID md devices dynamically updated constantly.
A cron job compares a recorded copy of mdstat to the dynamic
/proc/mdstat file every 5 minutes or so.
If they are not identicle, something has changed, and I want to
know very quickly. The system starts screaming for attention,
sending e-mails, flashing the screen, beeping the speaker....
but keeps on running off of the non-failed device.
If the disks were purchased and installed at the same time, you've
got about a week to deal with it.
Disks manufactured at the same time, with the same run time on them,
tend to fail within about 8 days of each other on average.
Maybe longer, but I wouldn't.
That way, I can replace the failed drive, re-boot and let the system
rebuild the RAID with about 5 minutes total off-line down time.
Rebuilding a 2 terabyte RAID takes hours, but the system is up
and running while it happens, with a little planning.
If there is a spare disk in the machine, the RAID driver can swap
out the failed disk all by itself, but that's a little advanced.
Never eat more than you can lift.
-- Miss Piggy
More information about the Rivendell-dev