Revision as of 14:46, 22 April 2014

OSTEP Infrastructure SOP

This page should be used for all OSTEP Infrastructure. Explaining the standard operating procedure of all tools, configuration, and programs.

Pidora Repos

The Pidora Repositories are hosted at: http://pidora.ca/pidora/

Symbolic Link Setup

Mash Repos

~/pidora-rsync/mash/pidora-18-latest/
├── mash.log
├── pidora-18
│   ├── armhfp
│   └── source
├── pidora-18-rpfr-updates
│   ├── armhfp
│   └── SRPMS
├── pidora-18-rpfr-updates-testing
│   ├── armhfp
│   └── SRPMS
├── pidora-18-updates
│   ├── armhfp
│   └── SRPMS
└── pidora-18-updates-testing
    ├── armhfp
    └── SRPMS

Pidora main repo sym links

~/public_html/pidora/releases/18/packages/
├── armhfp
│   ├── debug -> ~/pidora-rsync/mash/pidora-18-latest/pidora-18/armhfp/debug
│   └── os -> ~/pidora-rsync/mash/pidora-18-latest/pidora-18/armhfp/os
└── source
    └── SRPMS -> ~/pidora-rsync/mash/pidora-18-latest/pidora-18/source/SRPMS

Pidora updates repo sym links

~/public_html/pidora/
├── rpfr-updates
│   ├── 18
│   │   ├── armhfp ->~/pidora-rsync/mash/pidora-18-latest/pidora-18-rpfr-updates/armhfp/
│   │   └── SRPMS ->~/pidora-rsync/mash/pidora-18-latest/pidora-18-rpfr-updates/SRPMS/
│   └── testing
│       └── 18
│           ├── armhfp ->~/pidora-rsync/mash/pidora-18-latest/pidora-18-rpfr-updates-testing/armhfp/
│           └── SRPMS ->~/pidora-rsync/mash/pidora-18-latest/pidora-18-rpfr-updates-testing/SRPMS/
└── updates
    ├── 18
    │   ├── armhfp ->~/pidora-rsync/mash/pidora-18-latest/pidora-18-updates/armhfp/
    │   └── SRPMS -> ~/pidora-rsync/mash/pidora-18-latest/pidora-18-updates/SRPMS/
    └── testing
        └── 18
            ├── armhfp ->~/pidora-rsync/mash/pidora-18-latest/pidora-18-updates-testing/armhfp/
            └── SRPMS ->~/pidora-rsync/mash/pidora-18-latest/pidora-18-updates-testing/SRPMS/

Ansible Builder Configuration Management

Details About Ansible

Ansible allows for remotely managing the configuration of all builders in a simple and efficient way. Ansible works by running a playbook, a playbook is a way to organize plays and run plays. A play is a set of ansible "command" or "modules" that are used on each builder, these modules can: copy files, change permissions, modify files, run commands, run scripts, and much more.

host = japan directory = /etc/ansible hosts file = /etc/ansible/ansible_hosts ansible config = /etc/ansible/ansible.cfg playbook = /etc/ansible/install_builder.yml plays = /etc/ansible/builders_tasks/ builders files = /etc/ansible/builders/

How To Use Ansible

Log in to japan as root

ssh japan

Change to the ansible directory

cd /etc/ansible

Check the status of all hosts connected to ansible
- The word builders in the command below is specifying an ansible group

ansible -m ping builders

Copy over all configurations required and start the koji service

ansible-playbook install_builders.yml --verbose

Change Builder Configurations

The best way to edit a play in ansible is to find the ansible module that is needed and read about it. Ansible modules have great documentation and there are tons of them, so many that there is one for every task that needs to be completed.

The ansible modules can be found here:Ansible Modules

All builder plays can be found inside /etc/ansible/builders_plays/ on japan.
Make sure that if a new play is created, it is added into the playbook at /etc/ansible/install_builders.yml on japan

How To Set Up A New Builder

Before adding a builder to ansible, there are a few things that need to be completed.

Network

Add a hostname to the /etc/hosts file on japan
Add a hostname to the /etc/ansible/builders/config_files/hosts file on japan

If it uses DHCP, then link the hostname to a host in /etc/dhcp/dhcpd.conf by specifying the mac address and host name

or

If the builder has a changing mac address and can't use DHCP, get access to the builder and set the ip manually

ifconfig <interface> <ipaddr> netmask 255.255.255.0 up
route add default gw 192.168.1.254

Services

Initially change services on the builder, since ansible needs to gain access to the builder there are a few things that need to be done.
NetworkManager - If it is a static address, stop this service, or if you have already setup DHCP on japan, start network manager

systemctl start NetworkManager

or

systemctl stop NetworkManager

sshd - Start this service

systemctl start sshd

firewalld - Stop this service

systemctl stop firewalld

selinux - Stop selinux for now as it interferes with ansible ssh

setenforce 0

SSHD

Copy the file /etc/ansible/builder/config_files/authorized_keys from japan to the builder
- This file contains public keys for users and ansible

scp /etc/ansible/builder/config_files/authorized_keys root@builder:

Login to the builder

ssh root@builder

Setup ssh and authorized keys

mkdir .ssh
mv authorized_keys .ssh/
chmod 700 .ssh/
chmod 600 .ssh/authorized_keys

Ansible should now have access to this builder

Ansible Groups

The following ansible groups are used to change the type of configuration that each builder receives. Once each builder has been added to the groups they should be in, run ansible and each group will get treated slightly differently, configuring all builders.

Group Structure

The following is a structure of groups, this shows parent groups with child groups.

builders
- builders_default
  - trimslices
  - arndales
  - cubies
  - specials
- builders_nfs
- builders_swap
  - trimslices
- builders_staticip
  - arndales

The child groups link back to a list of hostnames.

trimslices
- tri-1-1
- tri-1-2
- tri-1-3
- tri-1-4
cubies
- cub-2-1
- cub-2-2
arndales
- arn-3-1
- arn-3-2
specials
- arm-4-1
- arm-4-2
- arm-4-3
- arm-4-4

builders_default

This group is a default group to for all builders. All builders should be in this group.

builders_nfs

This group is used for nfs configuration. This was previously used on older builders that did not have hard drives and required more building space and speed.

builder_swap

This group will allow for ansible to generate a 4GB swap file on the builders and turn that swap file on. This is primarily used for builders that require more swap than is set up on their swap partitions.

builder_staticip

This group should be used for all builders that require static ip addresses. It will setup the custom ip address based on the resolved hostname inside the /etc/ansible/ansible_hosts file.

Backup System

Introduction

Here is a copy of the script: Smart-Backup-Source

Here at CDOT, our current backup solution was a little archaic, and hard to expand on. I decided to make a new method of backup that can be run from a single computer and backup our entire infrastructure. This script is currently, as I'm writing this not in a finished state, however it is in a state where it works and is usable as a replacement to our previous system. I would like to pose a warning that this method of backup across systems is not a very secure method, and it does pose security threats. Since it does require you to give some users nopasswd sudo access to some or all programs. I am looking for a way around this, and would appreciate any input on this matter.

The script is to be run on the computer: bahamas The script is to be run with the user: backup

Goals

There were a few goals that were kept in mind with this script as it was developed:

- Script resides on a single computer (complete)
- Do not run multiple backups using the same hard drive (complete)
- Check space requirements before performing a backup on source and destination (in progress)
- Emails out daily reports on success or fail (not complete)
- Logs all information /var/log/smart-bk/ (complete)
- Easy(ish) to add a new backup schedule (complete)
- Can view all backups that are currently running (complete)
- Can view all the backups in the queue to run (complete)
- Can view all the schedules that are added (complete)
- Records a record of all previously run backups (not complete)
- Website to view status of currently running backups (not complete)

At this time, not all of these goals have been completed, but I would like them to be sooner or later. Right now I'm setting up a little documentation on how it currently works, what it's missing, and what my next steps will be.

Scheduler System

The scheduler system is the main part of the script and uses a list of fields. Using this list of fields, the script can determine exactly when the backup should take place, if it has already run for the day, if it is still running, and all the details surrounding the specified backup. A person or bash script will add backups they would like to be performed to a schedule using specific parameters. A schedule looks like this:

----------------------------------------------------------------------------------------------------
id|day|time|type|source host|dest host|source dir|dest dir|source user|dest user
----------------------------------------------------------------------------------------------------
1|06|11:00|archive|japan|bahamas|/etc/|/data/backup/japan/etc/|backup|backup

Field details and explanations:

id - This is just a unique field identifier.
day - This is the day the backup last was last run. This is used to check if the schedule is expired(in the past) or has already completed.
time - This is the time at which the backup will start. This allows you to order different schedules to happen earlier or later in the day.
type - This is the type of backup. Currently there are 3.
     - archive backup wraps the directory specified in a tar archive and compresses it with bzip. Uses options: tar -cpjvf
     - rsync is a very simple rsync that preserves most things. Uses options: rsync -aHAXEvz
     - dbdump backup, this is specifically a koji db backup currently. Uses options: pg_dump koji
source_host - This host is the target for backup. You want the files backup up from here.
dest_host - This host is your backup storage location. All files backed up will go here.
source_dir - This directory correlates to source_host. This is the directory that is backed up.
dest_dir - This directory correlates to dest_host. This is where the backup is stored.
source_user - User to use on the source host.
dest_user - User to use on the dest host.

Database Structure

All data for this script is stored inside a sqlite3 db. The database file used should be called "schedules.db".

Create a new sqlite3 database:

sqlite3 schedules.db

Run the following sql statements to create the proper tables within the database

sqlite> .schema
CREATE TABLE Logs(scheduleid INTEGER, status TEXT, errors TEXT, start_date TEXT, start_time TEXT, end_date TEXT, end_time TEXT);
CREATE TABLE Queue(scheduleid INTEGER, queuetime TEXT, FOREIGN KEY(scheduleid) REFERENCES Schedule(id));
CREATE TABLE Running(scheduleid INTEGER, starttime TEXT, FOREIGN KEY(scheduleid) REFERENCES Schedule(id));
CREATE TABLE Schedule(id INTEGER PRIMARY KEY, day TEXT, time TEXT, type TEXT, source_host TEXT, dest_host TEXT, source_dir TEXT, dest_dir TEXT, source_user TEXT, dest_user TEXT, desc TEXT);

How To Use sbk

sbk is a python script that allows you to specify options which change what functions the script will perform.

Help

Use the help option for help. This will show a description of the program, all available options, and the options descriptions.

sbk -h
or 
sbk --help

View Schedule Information

sbk allows users to view all the schedules that have been created, all the current schedules in the queue, and all scheduled backups currently in progress.

sbk -s
or
sbk --show

Adding New Schedules/Backups

The backup user is able to add new backup schedules. These schedules will be run at their designated times. Unfortunately I couldn't find a way to add schedules without specifying a lot of options. However this is all done through the command line, so you can easily bash script a way of automating this.

Add a new schedule:

[backup@bahamas ~]$ sbk --add --time "11:00"  --backup-type archive  --source-host japan  --dest-host bahamas  --source-dir /etc/  --dest-dir /data/backup/japan/etc/  --source-user backup  --dest-user backup --desc "archive of japan /etc-> bahamas"

Removing Schedules/Backups

In order to remove a schedule, a "sid" must be specified. This is simply the "id" of the schedule, which is unique to schedules. You can get this "id" by running "sbk -s".

Remove a schedule:

[backup@bahamas ~]$ sbk --remove --sid=1

Start the Backups

Start intelligently queuing schedules and initiate the backups. This next part is good to run in a crontab or in some automated way(see multiple instances).

sbk -q
or
sbk --queue

Multiple Instances

The way the script currently runs, it only runs one backup at a time, however you can run multiple instances of the "sbk --queue". If you run multiple instances, they will work together and increase the number of backups at the same time(as long as there are multiple backups!), this in turn should increase the speed to complete all backups. Running multiple instances should be safe and will not harm backups that are already running. It also will not run backups that are already running, or interfere with them in any way. My plan for the future of this script is to allow it to spawn it's own child processes which will perform multiple backups as well, HOWEVER, it still needs to be run multiple times during the day because it will only start backups whose time has expired(ex. if you run it at 11:00 and a backup has a time of 13:00, the schedule at 13:00 will not start until it's day field is expired).

So to get the most speed out of your backups, and finish all schedules as fast as possible, run this script in the crontab at a high frequency. What a high frequency is, is really up to you, there is hardly any harm in running it too often, as it will just end itself if there is nothing to do. Whether you run the script once a day or once every 10 minutes, the script will get the job done.

Advanced sbk Options

Most of these options are just for me to debug while making the script, however some of them may be useful for managing backups. Note: when using the "--sid=<id>" please replace <id> with a number.

Remove a single schedule from the queue.


sbk --remove-queue --sid=<id>

Remove a single schedule from running(this will NOT currently end the backup that is running).


sbk --remove-run --sid=<id>

Expire a schedule so that it appears as though it has not run today, useful if you want to force a backup to run a second time.


sbk --expire --sid=<id>

Add a schedule to the queue. This is similar to expire in function, except you don't have to wait for the time field to expire. Next time "sbk --queue" is run, it will run this backup.


sbk --add-queue --sid=<id>

Logging

[UPDATE] A new logging section has been added to the schedule.db. This allows for specific logging events to be saved with dates and success or failure, for more accurate logging.

I could not figure out the format for the logging. Too many options. I went with a procedure where it makes a new log file each time the program is run. This could be a problem if you run the script too frequently, since it will make so many log files. I think the best idea would be to log to a single file, or to log into the sqlite3 database. I have not had time to change this yet.

Logging directory:


/var/log/smart-bk/

Logging format:

smart-bk-yyyy-MM-dd-hh-mm.log

Backup Host Configuration

One of the main drawbacks to using this script is, it requires a bit of configuration on all computers using the backup system.

backup user created on all computers
backup user must be able to ssh without a password from any computer to any other as backup user
backup user must have sudo access with the nopasswd option on the rsync program and tar program(Security risk! Giving rsync sudo access allows backup user to modify any file.)visudo: backup ALL=(ALL) NOPASSWD: /usr/bin/rsync, /bin/tar
root user must be able to ssh to all backup users from any computer(This is annoying, trying to find a way around this.)
add custom users such as koji to work with ssh no password to all backup users, give root access to koji user in the same way
WARNING, make sure you disable the passwords on all these backup accounts, that way they can't log in and get access to root without a private key
IMPORTANT - visudo on each machine you connect would like to connect to: Defaults:backup !requiretty

otherwise sudo will complain about not having a tty

This list of configurations, that need to be done to each computer, is annoying and could be done better. Currently looking for ways to change it. After these configurations are made, you can use this host in any backup schedule.

@@ Line 364: / Line 364: @@
 === Logging ===
+[UPDATE]
+A new logging section has been added to the schedule.db. This allows for specific logging events to be saved with dates and success or failure, for more accurate logging.
 I could not figure out the format for the logging. Too many options. I went with a procedure where it makes a new log file each time the program is run. This could be a problem if you run the script too frequently, since it will make so many log files. I think the best idea would be to log to a single file, or to log into the sqlite3 database. I have not had time to change this yet.

Difference between revisions of "OSTEP Infrastructure"