People delete files by accident and the work is lost if you do not have a system that takes periodic snapshots of the system to be able to recover files, however a lot of “snapshot” systems cost a lot of money, something a SME never has enough of, so I wrote my own little utility with the help of some “rotate” utility I found on the Internet using one of the best features Linux has called “hard-links”.
The premise was to have an automatic utility to backup all/specified files onto another/external disk drive at periodic intervals, preferably not part of the normal system – for example I have a RAID controller that has all disks BUT the /snapshot disk attached, the /snapshot disk is attached to the MB based disk controller. You could also use a NAS drive as the snapshot drive and mount this locally (via nfs etc) – so in case of someone deleting a file it can be recovered (either by the system administrator or the user – this is explained in part 2).
Automatic Execution of the system
The snapshot system is run using /etc/crontab.
Below you can see I use two different methods backing up the files: “dynamic (frequent changing files)” and “static (non frequent changing files)”. The frequent changing files I back up more regular, while the in-frequent are included in the daily (total system) only (to make up for example for yum) plus the entire system weekly and monthly.
# snapshot
37 19 * * 1-5 root nice -n 18 /usr/local/backup/snapshot_backup -bt hourly -fs static
37 7,10,13,16 * * 1-5 root nice -n 18 /usr/local/backup/snapshot_backup -bt hourly -fs dynamic
37 21 * * 1-5 root /usr/local/backup/snapshot_backup -bt daily -fs static
37 22 * * 5 root /usr/local/backup/snapshot_backup -bt weekly -fs static
37 1 1 * * root /usr/local/backup/snapshot_backup -bt monthly -fs static
…
What one of my backed-up trees looks like
What I wanted to achieve:
- is to have short spans between hours,
- daily backups that backup the last backup of the hourly backup,
- weekly backups that backup the last backup of daily
- monthly backup that backup the last backup of weekly
So here is what one of my trees looks like:
[root /usr/local/backup] >ls -la /snapshot/var/ total 144 drwxr-xr-x 19 root root 4096 Jun 16 10:42 . drwxr-xr-x 21 root root 4096 Apr 24 19:46 .. drwxr-xr-x 35 root root 4096 Jun 12 16:44 daily.1 drwxr-xr-x 35 root root 4096 Jun 11 16:44 daily.2 drwxr-xr-x 35 root root 4096 Jun 10 16:44 daily.3 drwxr-xr-x 35 root root 4096 Jun 9 16:44 daily.4 drwxr-xr-x 35 root root 4096 Jun 8 16:43 daily.5 drwxr-xr-x 35 root root 4096 Jun 16 10:44 hourly.0 drwxr-xr-x 35 root root 4096 Jun 16 07:44 hourly.1 drwxr-xr-x 35 root root 4096 Jun 15 19:39 hourly.2 drwxr-xr-x 35 root root 4096 Jun 15 16:43 hourly.3 drwxr-xr-x 35 root root 4096 Jun 15 13:43 hourly.4 drwxr-xr-x 35 root root 4096 Jun 15 10:43 hourly.5 drwxr-xr-x 35 root root 4096 Jun 15 07:45 hourly.6 drwxr-xr-x 36 root root 4096 May 1 16:44 monthly.1 drwxr-xr-x 35 root root 4096 Jun 5 16:44 weekly.1 drwxr-xr-x 35 root root 4096 May 29 16:44 weekly.2 drwxr-xr-x 36 root root 4096 May 22 16:44 weekly.3 drwxr-xr-x 36 root root 4096 May 15 16:44 weekly.4
Hardlinks, an explanation
When people look at files they usually think the file’s name as being the file itself, but really the name is a hard link.
Also a given file can have more than one hard link to itself, look at a directory for example, it has at least two hard links: the directory name itself and . (for when you’re inside it). It also has one hard link from each of its sub-directories (the .. file inside each one).
Using the stat utility you can find a lot of information about files but also how many hard links a file has with the command:
stat filename
Now assume we have a file “a” and we make a hardlinkk to it called “b”:
ln a b
“a” and “b” are two names for the same file – you can verify by looking at the inodes (the inode number will be different on your machine):
ls -i a 232177 a ls -i b 232177 b
However, “b” takes up a lot less space – this varies depending on the OS you have, the hardlink on a 64bit system takes more space than on a 32bit system but that still is a lot less than a complete copy of itself.
Here are the advantages listed of hard links:
- The contents of the file are only stored once, so you don’t use twice the space.
- If you change a, you’re changing b, and vice-versa.
- If you change the permissions or ownership of a, you’re changing those of b as well, and vice-versa.
- If you overwrite a by copying a third file on top of it, you will also overwrite b
- If you do not want to overwrite you tell cp to unlink before overwriting by running cp with the –remove-destination flag.
- Removing one hardlink will not remove the other hardlink nor will it remove the contents of the file, the contents of the file will ONLY be reoved when the count of hardlinks reaches 0 (zero) – this is heavily used in the utility.
RSYNC
Rsync is a well-known piece of GPL’d software, written originally by Andrew Tridgell and Paul Mackerras, both from Canberra, Australia. It is installed by default these days on most (if not all) Linux variants – if not check you distro and get it.
Assume you have a directory called /source, to back it up into the directory /destination you use:
rsync -a source/ destination/
which would copy the entire tree from source to destination, every file including permissions, date/time stamps etc.
Now if you wait for a few hours and do the same command again
rsync -a source/ destination/
rsync first checks the source directory and compares it with the destination directory and copies ONLY the files that have changed, reducing the workload on the system.
Continue to read here
Pingback: A cheap snapshot system (Part 2) | System Administrator Blog
Pingback: A cheap snapshot system (Part 3) | System Administrator Blog
Pingback: Commandments of System Administration: Thou Shalt Not Copy | System Administrator Blog