Open main menu

CDOT Wiki β

Changes

OSTEP Infrastructure

7,453 bytes added, 13:26, 7 August 2013
Backup System
= Backup System =
 
== Introduction ==
Here is a copy of the script: [place link to zenit wiki]
 
Here at CDOT, our current backup solution was a little archaic, and hard to expand on. I decided to make a new method of backup that can be run from a single computer and backup our entire infrastructure. This script is currently, as I'm writing this not in a finished state, however it is in a state where it works and is usable as a replacement to our previous system. I would like to pose a warning that this method of backup across systems is not a very secure method, and it does pose security threats. Since it does require you to give some users nopasswd sudo access to some or all programs. I am looking for a way around this, and would appreciate any input on this matter.
 
== Goals ==
There were a few goals that were kept in mind with this script as it was developed:
- Script resides on a single computer (complete)
- Do not run multiple backups using the same hard drive (complete)
- Check space requirements before performing a backup on source and destination (in progress)
- Emails out daily reports on success or fail (not complete)
- Logs all information /var/log/smart-bk/ (complete)
- Easy(ish) to add a new backup schedule (complete)
- Can view all backups that are currently running (complete)
- Can view all the backups in the queue to run (complete)
- Can view all the schedules that are added (complete)
- Records a record of all previously run backups (not complete)
- Website to view status of currently running backups (not complete)
 
At this time, not all of these goals have been completed, but I would like them to be sooner or later. Right now I'm setting up a little documentation on how it currently works, what it's missing, and what my next steps will be.
 
== Scheduler System ==
The scheduler system is the main part of the script and uses a list of fields. Using this list of fields, the script can determine exactly when the backup should take place, if it has already run for the day, if it is still running, and all the details surrounding the specified backup. A person or bash script will add backups they would like to be performed to a schedule using specific parameters. A schedule looks like this:
<pre>
----------------------------------------------------------------------------------------------------
id|day|time|type|source host|dest host|source dir|dest dir|source user|dest user
----------------------------------------------------------------------------------------------------
1|06|11:00|archive|japan|bahamas|/etc/|/data/backup/japan/etc/|backup|backup
</pre>
 
Field details and explanations:
<pre>
id - This is just a unique field identifier.
day - This is the day the backup last was last run. This is used to check if the schedule is expired(in the past) or has already completed.
time - This is the time at which the backup will start. This allows you to order different schedules to happen earlier or later in the day.
type - This is the type of backup. Currently there are 3.
- archive backup wraps the directory specified in a tar archive and compresses it with bzip. Uses options: tar -cpjvf
- rsync is a very simple rsync that preserves most things. Uses options: rsync -aHAXEvz
- dbdump backup, this is specifically a koji db backup currently. Uses options: pg_dump koji
source_host - This host is the target for backup. You want the files backup up from here.
dest_host - This host is your backup storage location. All files backed up will go here.
source_dir - This directory correlates to source_host. This is the directory that is backed up.
dest_dir - This directory correlates to dest_host. This is where the backup is stored.
source_user - User to use on the source host.
dest_user - User to use on the dest host.
</pre>
 
 
== Database Structure ==
All data for this script is stored inside a sqlite3 db. The database file used should be called "schedules.db".
 
Create a new sqlite3 database:
<pre>
sqlite3 schedules.db
</pre>
 
Run the following sql statements to create the proper tables within the database
<pre>
sqlite&gt; .schema
CREATE TABLE Queue(scheduleid INTEGER, queuetime TEXT, FOREIGN KEY(scheduleid) REFERENCES Schedule(id));
CREATE TABLE Running(scheduleid INTEGER, starttime TEXT, FOREIGN KEY(scheduleid) REFERENCES Schedule(id));
CREATE TABLE Schedule(id INTEGER PRIMARY KEY, day TEXT, time TEXT, type TEXT, source_host TEXT, dest_host TEXT, source_dir TEXT, dest_dir TEXT, source_user TEXT, dest_user TEXT);
</pre>
 
== How To Use sbk ==
sbk is a python script that allows you to specify options which change what functions the script will perform.
 
=== Help ===
Use the help option for help. This will show a description of the program, all available options, and the options descriptions.
<pre>
sbk -h
or
sbk --help
</pre>
 
=== View Schedule Information ===
sbk allows users to view all the schedules that have been created, all the current schedules in the queue, and all scheduled backups currently in progress.
<pre>
sbk -s
or
sbk --show
</pre>
 
=== Adding New Schedules/Backups ===
The backup user is able to add new backup schedules. These schedules will be run at their designated times. Unfortunately I couldn't find a way to add schedules without specifying a lot of options. However this is all done through the command line, so you can easily bash script a way of automating this.
 
Add a new schedule:
<pre>
[backup@bahamas ~]$ sbk --add --time=11:00 --backup-type=archive --source-host=japan --dest-host=bahamas --source-dir=/etc/ --dest-dir=/data/backup/japan/etc/ --source-user=backup --dest-user=backup
</pre>
 
=== Removing Schedules/Backups ===
In order to remove a schedule, a "sid" must be specified. This is simply the "id" of the schedule, which is unique to schedules. You can get this "id" by running "sbk -s".
 
Remove a schedule:
<pre>
[backup@bahamas ~]$ sbk --remove --sid=1
</pre>
 
=== Start the Backups ===
Start intelligently queuing schedules and initiate the backups. This next part is good to run in a crontab or in some automated way(see multiple instances).
 
<pre>
sbk -q
or
sbk --queue
</pre>
 
==== Multiple Instances ====
The way the script currently runs, it only runs one backup at a time, however you can run multiple instances of the "sbk --queue". If you run multiple instances, they will work together and increase the number of backups at the same time(as long as there are multiple backups!), this in turn should increase the speed to complete all backups. Running multiple instances should be safe and will not harm backups that are already running. It also will not run backups that are already running, or interfere with them in any way. My plan for the future of this script is to allow it to spawn it's own child processes which will perform multiple backups as well, HOWEVER, it still needs to be run multiple times during the day because it will only start backups whose time has expired(ex. if you run it at 11:00 and a backup has a time of 13:00, the schedule at 13:00 will not start until it's day field is expired).
 
So to get the most speed out of your backups, and finish all schedules as fast as possible, run this script in the crontab at a high frequency. What a high frequency is, is really up to you, there is hardly any harm in running it too often, as it will just end itself if there is nothing to do. Whether you run the script once a day or once every 10 minutes, the script will get the job done.
198
edits