Difference between revisions of "Fedora ARM Secondary Architecture/ARM Fedora Backup Management"

From CDOT Wiki
Jump to: navigation, search
(Restore from Backups)
 
(45 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
=Seneca CDOT ARM Project: Management Server Backup and Recovery plan=
 
=Seneca CDOT ARM Project: Management Server Backup and Recovery plan=
 +
Contact info:
 +
 
* Phone: X33463
 
* Phone: X33463
 
* Email: fedora-arm at senecacollege dot ca
 
* Email: fedora-arm at senecacollege dot ca
Line 42: Line 44:
 
For backing up the configurations and keeping copies of those for the previous 30 days and also allocating room for possible synced copies from other servers, minimum space requirements is 10GB. To accommodate for the synced copy of the repository that could go up to 2TB
 
For backing up the configurations and keeping copies of those for the previous 30 days and also allocating room for possible synced copies from other servers, minimum space requirements is 10GB. To accommodate for the synced copy of the repository that could go up to 2TB
 
===Data Retention===
 
===Data Retention===
* Removable drives will keep data up to 3 days old
+
* Removable drives will keep data up to 3 days
* Network synced drives/servers will keep data up to 30 days old
+
* Network synced drives/servers will keep data up to 30 days
 +
 
 
===Initial Setup===
 
===Initial Setup===
This applies to any new server that will be backed up (this process could be scriptable eventually)
+
This applies to any new server that will be backed up (this process could become a script eventually)
 
* Create a user named "backup"
 
* Create a user named "backup"
 
* Give that user certificate access through ssh (to and from) the other servers (in the backup group). Tutorials available [http://zenit.senecac.on.ca/wiki/index.php/OPS235_Lab_7 here]
 
* Give that user certificate access through ssh (to and from) the other servers (in the backup group). Tutorials available [http://zenit.senecac.on.ca/wiki/index.php/OPS235_Lab_7 here]
 +
* Also give cert access to the root of each server so that those can remote access to the "backup" user of other servers.
 
* Add a new lvm called archive that will be mounted as /archive on startup (add an entry to /etc/fstab). LVM tutorial available [http://zenit.senecac.on.ca/wiki/index.php/OPS235_Lab_4 here]
 
* Add a new lvm called archive that will be mounted as /archive on startup (add an entry to /etc/fstab). LVM tutorial available [http://zenit.senecac.on.ca/wiki/index.php/OPS235_Lab_4 here]
 
* Create the following directories and give "backup" write permission to those directories
 
* Create the following directories and give "backup" write permission to those directories
Line 53: Line 57:
 
  /archive/repo-backup
 
  /archive/repo-backup
 
  /var/log/blaze-backup
 
  /var/log/blaze-backup
 +
* Create the following file and give "backup" write permission to that file and also add the hostname for the admin server (iraq) in that file (please see offered services section for other options)
 +
/etc/sysconfig/blaze
 +
* In the admin host, add the hostname in the buddyhost section in blaze and wildfire with a random syncpeer. (optional, please see offered services section for more options)
 +
* Add the following entry to /etc/aliases (use proper email format):
 +
blaze-wildfire: fedora-arm at senecacollege dot ca
 
* Schedule backup as root (crontab entry). It will look something similar to this:
 
* Schedule backup as root (crontab entry). It will look something similar to this:
 
  20 2 * * * ADMHOST=$(cat /etc/sysconfig/blaze); BKU=backup; export ADMHOST BKU; scp $BKU@$ADMHOST:/usr/local/bin/blaze \
 
  20 2 * * * ADMHOST=$(cat /etc/sysconfig/blaze); BKU=backup; export ADMHOST BKU; scp $BKU@$ADMHOST:/usr/local/bin/blaze \
 
  /usr/local/bin/; chown -R $BKU:$BKU /usr/local/bin/blaze; chmod 755 /usr/local/bin/blaze; /usr/local/bin/blaze
 
  /usr/local/bin/; chown -R $BKU:$BKU /usr/local/bin/blaze; chmod 755 /usr/local/bin/blaze; /usr/local/bin/blaze
 +
* If the server is going to be a reposync host then follow the first three steps to create a user account called apache and give it access to and from the repohost
 +
 
===Scheduling===
 
===Scheduling===
 
* In Chile blaze runs at 01:35 AM UTC
 
* In Chile blaze runs at 01:35 AM UTC
Line 61: Line 72:
 
* In Australia blaze runs at 02:20 AM UTC
 
* In Australia blaze runs at 02:20 AM UTC
 
* In Ireland blaze runs at 03:35 AM UTC
 
* In Ireland blaze runs at 03:35 AM UTC
* In Iraq blaze runs at 01:30 AM EDT
+
* In Scotland blaze runs at 12:50 AM EDT
 +
* In Romania blaze runs at 01:50 AM EDT
 +
* In Iraq blaze runs at 08:30 AM EDT
 +
 
 +
===Offered/Backedup Servers/Services===
 +
* Please add the service type in the appropriate row in the "Define the names of the remote machines and services" section in each script
 +
** PostgreSQL Database Server (DBHOST)(Possibly one - Ireland)
 +
** Repository (REPOHOST) (Possibly one - Chile)
 +
** Server that has backup to removable drive capabilities (RBDHOST) (Chile or Ireland or Scotland)
 +
** Central Sync Server (SYNCHOST) (Iraq, Scotland and Romania)
 +
** Repository Backup (COPYREPOHOST) (Scotland Romania)
 +
** Sync Peer (BUDDYHOST) (All)
 +
** Custom Log and Mail Server (AGGREGHOST) (Iraq)
 +
** Administrator Host (ADMHOST) (Iraq, for now)
 +
** All related servers under blaze wildfire system (ALLSYS) (Australia Chile Hongkong Iraq Ireland Scotland Romania)
  
 
===Repo Sync===
 
===Repo Sync===
* Add Content (to do for max)
+
* Repo sync process is done via rsync
===Backup in Removable Drives===
+
*  Repo sync via LVM snapshot is under consideration
* Add Content (to do for max)
+
===Backup to Removable Drives===
 
+
* Removable drives (2x1T or 1x2T) can be attached to the repo server or any server that has a copy of the repo
 +
* Those servers have the capability to auto backup overnight as long as the drives are labeled as cdot-backup-nnn (where nnn could be 001..999)  
 +
* Multiple removable drives should be used and one set of drives containing the full repo copy should be always offsite (daily rotation schedule)
 +
===Offsite Backup===
 +
* Offsite backup over the network is still under consideration
 
===Restore from Backups===
 
===Restore from Backups===
This process is not automated yet. If an older copy is required for the postgresql dump or any scripts or configuration files, please copy the necessary zip file from /archive/blaze-backup/[hostname]/ location and unzip it to a new location and copy the desired file.
+
This process is not automated yet. If an older copy is required for the postgresql dump or any scripts or configuration files, please copy the necessary zip file from /archive/blaze-backup/[hostname]/ location and unzip it to a new location and copy the desired file. Please follow the [[#Backup_Layout|backup layout]] table provided in the Appendix.
 +
PostgreSQL Restore is done via the following command (after unzipping):
 +
pg_restore
  
 
== Conclusion==
 
== Conclusion==
 
ARM is still a developing technology and expectations are that The Fedora ARM Project at Seneca CDOT will be highly active for the next few years. Depending on the level of activity the backup strategy in place may become out of date in a matter of months or it may need to be supplemented with other solutions.
 
ARM is still a developing technology and expectations are that The Fedora ARM Project at Seneca CDOT will be highly active for the next few years. Depending on the level of activity the backup strategy in place may become out of date in a matter of months or it may need to be supplemented with other solutions.
 
==Appendix==
 
==Appendix==
===Admin Host Configuration===
+
===Admin Configuration===
Available at
+
This file specifies which server is the  admin host. It is available in each host at
 
  /etc/sysconfig/blaze
 
  /etc/sysconfig/blaze
 
 
===Backup Layout===
 
===Backup Layout===
 
{|class="mediawiki" border="1" cellspacing="2" width="100%"
 
{|class="mediawiki" border="1" cellspacing="2" width="100%"
Line 97: Line 127:
 
:/var/spool/cron/
 
:/var/spool/cron/
 
:/usr/local/bin/
 
:/usr/local/bin/
 +
:/var/lib/bcfg2/
 
|
 
|
 
:Configurations
 
:Configurations
 
:Cronjobs
 
:Cronjobs
 
:Scripts
 
:Scripts
 +
:BCFG2 DB
 
|
 
|
:Iraq and Ireland
+
:Iraq and Scotland
 
|
 
|
 
:/archive/blaze-backup/hongkong/
 
:/archive/blaze-backup/hongkong/
Line 145: Line 177:
 
:/mnt/koji/
 
:/mnt/koji/
 
:/usr/local/bin/
 
:/usr/local/bin/
 
 
|
 
|
 
:Configurations
 
:Configurations
Line 153: Line 184:
 
|
 
|
 
:Iraq and Australia
 
:Iraq and Australia
:(Repo Sync in Iraq only)
+
:(Repo Sync in Scotland and Romania only)
 
|
 
|
 
:/archive/blaze-backup/chile/
 
:/archive/blaze-backup/chile/
Line 172: Line 203:
 
|
 
|
 
:/archive/blaze-backup/iraq/
 
:/archive/blaze-backup/iraq/
 +
|-
 +
|
 +
:Scotland
 +
|
 +
:/etc/
 +
:/var/spool/cron/
 +
:/usr/local/bin/
 +
:/var/lib/etherpad-lite
 +
|
 +
:Configurations
 +
:Cronjobs
 +
:Scripts
 +
:Etherpad Lite DB
 +
|
 +
:Iraq and Ireland
 +
|
 +
:/archive/blaze-backup/scotland/
 +
|-
 +
|
 +
:Romania
 +
|
 +
:/etc/
 +
:/var/spool/cron/
 +
:/usr/local/bin/
 +
|
 +
:Configurations
 +
:Cronjobs
 +
:Scripts
 +
|
 +
:Australia, Iraq and Scotland
 +
|
 +
:/archive/blaze-backup/romania/
 
|}
 
|}
 +
 
===Backup Location Capacity===
 
===Backup Location Capacity===
 
  hongkong
 
  hongkong
Line 180: Line 244:
 
  iraq
 
  iraq
 
  Filesystem                    Type    Size    Mounted on
 
  Filesystem                    Type    Size    Mounted on
  /dev/mapper/iraq-archive      ext4    1.6T    /archive
+
  /dev/mapper/iraq-archive      ext4    50.0G  /archive
  
 
  ireland
 
  ireland
Line 193: Line 257:
 
  Filesystem                    Type    Size    Mounted on
 
  Filesystem                    Type    Size    Mounted on
 
  /dev/mapper/chile-archive    ext4    9.9G    /archive
 
  /dev/mapper/chile-archive    ext4    9.9G    /archive
 +
 +
scotland
 +
Filesystem                    Type    Size    Mounted on
 +
/dev/mapper/scotland-archive  ext4    2.2T    /archive
 +
 +
romania
 +
Filesystem                    Type    Size    Mounted on
 +
/dev/mapper/yoda-archive      ext4    2.0T    /archive
 +
 +
===Log Files===
 +
Available on all servers running blaze
 +
/var/log/blaze-backup.log
 +
/var/log/wildfire-sync.log
 +
Available on central/aggregator/admin server 
 +
/var/log/blaze-wildfire.log
  
 
.
 
.

Latest revision as of 18:20, 16 May 2012

Important.png
This is a draft only!
It is still under construction and content may change. Do not rely on this information.

Seneca CDOT ARM Project: Management Server Backup and Recovery plan

Contact info:

  • Phone: X33463
  • Email: fedora-arm at senecacollege dot ca

Scope

This document is intended for those who are familiar with administering the ARM Project's management servers. Having knowledge of the CDOT ARM standard operation procedures may lead to a better understanding of this document.

Purpose

The purpose of this document is to describe the process for the backup and synchronization of critical data and provide example of some possible disaster recovery scenarios.

Introduction

The Fedora ARM Project is a highly active R & D project. Because of the rapid change, the project required some specific backup strategies to be implemented on top of regular backup strategies. In the event of an emergency, this document should provide enough information to re-create a similar management platform that will keep the project running.

Background

Like all Fedora projects, the ARM Project has a small yet active global community. The build firm is located in Ontario, Canada. Infrastructure support for the build firm is provided by the Seneca Center for Development and Technology at Seneca College in Toronto. The number of available ARM builders in the firm can vary. The builders are not backed up as those are easily recreated from available images.

As for the management servers, there are five major ones that are critical to this operation. For a complete list of servers and the services those offer please consult the necessary documents. These four servers are Australia (Koji builders temporary work storage), Chile (The complete ARM repository), Hongkong (Koji hub and web interface), Iraq (Central backup management and storage) and Ireland (Koji database).

Description

The backup system has been broken into two custom scripts. First one is to create a local copy of the configurations and to back up the repository to removable drives, it is called blaze. The second script is called wildfire and it is to synchronize those local copies to different locations over the network. The servers get the copies of those scripts from the central backup management and storage server (iraq) via scp and runs blaze first and then wildfire.

Backup Script Source

The scripts are located at (any server with backup enabled):

/usr/local/bin/

Backup Directory Structure

The backups (both local copies and networked synced copies) are stored inside /archive/blaze-backup (/archive/repo-backup for the repository sync) in each servers respective directory. Example:

[root@australia ~]# tree /archive
/archive
├── blaze-backup
│   ├── australia
│   │   ├── australia-cron-Feb-27-2012.tar.bz2
│   │   ├── australia-etc-Feb-27-2012.tar.bz2
│   │   └── australia-userlocalbin-Feb-27-2012.tar.bz2
│   ├── chile
│   │   ├── chile-cron-Feb-27-2012.tar.bz2
│   │   ├── chile-etc-Feb-27-2012.tar.bz2
│   │   └── chile-userlocalbin-Feb-27-2012.tar.bz2
│   └── ireland
│       ├── ireland-cron-Feb-27-2012.tar.bz2
│       ├── ireland-etc-Feb-27-2012.tar.bz2
│       ├── ireland-postgresql-Mar-27-2012.sql.bz2
│       └── ireland-userlocalbin-Feb-27-2012.tar.bz2
└── lost+found

Space Consideration and Allocation

For backing up the configurations and keeping copies of those for the previous 30 days and also allocating room for possible synced copies from other servers, minimum space requirements is 10GB. To accommodate for the synced copy of the repository that could go up to 2TB

Data Retention

  • Removable drives will keep data up to 3 days
  • Network synced drives/servers will keep data up to 30 days

Initial Setup

This applies to any new server that will be backed up (this process could become a script eventually)

  • Create a user named "backup"
  • Give that user certificate access through ssh (to and from) the other servers (in the backup group). Tutorials available here
  • Also give cert access to the root of each server so that those can remote access to the "backup" user of other servers.
  • Add a new lvm called archive that will be mounted as /archive on startup (add an entry to /etc/fstab). LVM tutorial available here
  • Create the following directories and give "backup" write permission to those directories
/archive/blaze-backup
/archive/repo-backup
/var/log/blaze-backup
  • Create the following file and give "backup" write permission to that file and also add the hostname for the admin server (iraq) in that file (please see offered services section for other options)
/etc/sysconfig/blaze
  • In the admin host, add the hostname in the buddyhost section in blaze and wildfire with a random syncpeer. (optional, please see offered services section for more options)
  • Add the following entry to /etc/aliases (use proper email format):
blaze-wildfire: fedora-arm at senecacollege dot ca
  • Schedule backup as root (crontab entry). It will look something similar to this:
20 2 * * * ADMHOST=$(cat /etc/sysconfig/blaze); BKU=backup; export ADMHOST BKU; scp $BKU@$ADMHOST:/usr/local/bin/blaze \
/usr/local/bin/; chown -R $BKU:$BKU /usr/local/bin/blaze; chmod 755 /usr/local/bin/blaze; /usr/local/bin/blaze
  • If the server is going to be a reposync host then follow the first three steps to create a user account called apache and give it access to and from the repohost

Scheduling

  • In Chile blaze runs at 01:35 AM UTC
  • In Hongkong blaze runs at 02:05 AM UTC
  • In Australia blaze runs at 02:20 AM UTC
  • In Ireland blaze runs at 03:35 AM UTC
  • In Scotland blaze runs at 12:50 AM EDT
  • In Romania blaze runs at 01:50 AM EDT
  • In Iraq blaze runs at 08:30 AM EDT

Offered/Backedup Servers/Services

  • Please add the service type in the appropriate row in the "Define the names of the remote machines and services" section in each script
    • PostgreSQL Database Server (DBHOST)(Possibly one - Ireland)
    • Repository (REPOHOST) (Possibly one - Chile)
    • Server that has backup to removable drive capabilities (RBDHOST) (Chile or Ireland or Scotland)
    • Central Sync Server (SYNCHOST) (Iraq, Scotland and Romania)
    • Repository Backup (COPYREPOHOST) (Scotland Romania)
    • Sync Peer (BUDDYHOST) (All)
    • Custom Log and Mail Server (AGGREGHOST) (Iraq)
    • Administrator Host (ADMHOST) (Iraq, for now)
    • All related servers under blaze wildfire system (ALLSYS) (Australia Chile Hongkong Iraq Ireland Scotland Romania)

Repo Sync

  • Repo sync process is done via rsync
  • Repo sync via LVM snapshot is under consideration

Backup to Removable Drives

  • Removable drives (2x1T or 1x2T) can be attached to the repo server or any server that has a copy of the repo
  • Those servers have the capability to auto backup overnight as long as the drives are labeled as cdot-backup-nnn (where nnn could be 001..999)
  • Multiple removable drives should be used and one set of drives containing the full repo copy should be always offsite (daily rotation schedule)

Offsite Backup

  • Offsite backup over the network is still under consideration

Restore from Backups

This process is not automated yet. If an older copy is required for the postgresql dump or any scripts or configuration files, please copy the necessary zip file from /archive/blaze-backup/[hostname]/ location and unzip it to a new location and copy the desired file. Please follow the backup layout table provided in the Appendix. PostgreSQL Restore is done via the following command (after unzipping):

pg_restore

Conclusion

ARM is still a developing technology and expectations are that The Fedora ARM Project at Seneca CDOT will be highly active for the next few years. Depending on the level of activity the backup strategy in place may become out of date in a matter of months or it may need to be supplemented with other solutions.

Appendix

Admin Configuration

This file specifies which server is the admin host. It is available in each host at

/etc/sysconfig/blaze

Backup Layout

Source Server
Backup Source
Type of Data
LAN Targets
Backup Location
Hongkong
/etc/
/var/spool/cron/
/usr/local/bin/
/var/lib/bcfg2/
Configurations
Cronjobs
Scripts
BCFG2 DB
Iraq and Scotland
/archive/blaze-backup/hongkong/
Ireland
/etc/
/var/spool/cron/
pg_dump
/usr/local/bin/
Configurations
Cronjobs
PostgreSQL DB
Scripts
Iraq and Chile
/archive/blaze-backup/ireland/
Australia
/etc/
/var/spool/cron/
/usr/local/bin/
Configurations
Cronjobs
Scripts
Iraq and Hongkong
/archive/blaze-backup/australia/
Chile
/etc/
/var/spool/cron/
/mnt/koji/
/usr/local/bin/
Configurations
Cronjobs
Repo
Scripts
Iraq and Australia
(Repo Sync in Scotland and Romania only)
/archive/blaze-backup/chile/
/archive/repo-backup/
Iraq
/etc/
/var/spool/cron/
/usr/local/bin/
Configurations
Cronjobs
Scripts
Chile
/archive/blaze-backup/iraq/
Scotland
/etc/
/var/spool/cron/
/usr/local/bin/
/var/lib/etherpad-lite
Configurations
Cronjobs
Scripts
Etherpad Lite DB
Iraq and Ireland
/archive/blaze-backup/scotland/
Romania
/etc/
/var/spool/cron/
/usr/local/bin/
Configurations
Cronjobs
Scripts
Australia, Iraq and Scotland
/archive/blaze-backup/romania/

Backup Location Capacity

hongkong
Filesystem                    Type    Size    Mounted on
/dev/mapper/hk-archive        ext4    5.0G    /archive
iraq
Filesystem                    Type    Size    Mounted on
/dev/mapper/iraq-archive      ext4    50.0G   /archive
ireland
Filesystem                    Type    Size    Mounted on
/dev/mapper/ireland-archive   ext4    1.2T    /archive
australia
Filesystem                    Type    Size    Mounted on
/dev/mapper/australia-archive ext4    9.9G    /archive
chile
Filesystem                    Type    Size    Mounted on
/dev/mapper/chile-archive     ext4    9.9G    /archive
scotland
Filesystem                    Type    Size    Mounted on
/dev/mapper/scotland-archive  ext4    2.2T    /archive
romania
Filesystem                    Type    Size    Mounted on
/dev/mapper/yoda-archive      ext4    2.0T    /archive

Log Files

Available on all servers running blaze

/var/log/blaze-backup.log
/var/log/wildfire-sync.log

Available on central/aggregator/admin server

/var/log/blaze-wildfire.log

.