OPS635-lab-nagios
Revision as of 21:06, 12 January 2020 by Peter.callaghan (talk | contribs) (→Investigation 2: Nagios Notifications)
Contents
OPS635 Lab 1: Nagios Installation and Configuration
Overview
In an enterprise environment, a production server must be staged before deployment. Any upgrade to the production servers must be tested in a testing environment and signed off by the change manager(s) before deploying to the production environment. In this lab, you will install and configure the Nagios monitoring framework on a VM running on your testing environment before deploying it to the production environment. You will use many of the common definitions encountered in a typical nagios installation.
Investigation 1: Minimal Nagios Resources
Clone your existing VM. Call the new VM nagios.<yourdomain>.ops, and provide it a static address of your choice.
- Add the necessary records for this machine to your DNS server.
- Install and configure Nagios on this machine.
- Configure your Nagios to also use any definitions you include in a file called lab1.cfg.
- Using the lab1.cfg file, create definitions to get your nagios installation to monitor the following hosts/services:
- Create a host definition to make the nagios machine monitor itself (using a non-loopback address). It should use the check_ping command every ten minutes to make sure it is active.
- Create a service definition to make the nagios machine monitor whether it can connect to it's own ssh service (using the non-loopback address). It should use the pre-written check_ssh command every 30 minutes, re-checking every 10 minutes if the initial check fails.
- Create a timeperiod definition, and set it to only include the days and times you are in OPS635. Modify the definitions in lab1.cfg to only run during this time.
- Make sure the webservice running on your nagios machine is accessible from your host machine.
- Access the nagios web console and confirm that these checks are working before continuing.
Investigation 2: Nagios Notifications
- Turn flap detection off for the checks you created in investigation 1.
- Modify the lab1.cfg file to include a contact named after yourself, using your email address in your domain. Set its notification periods to use the same timeperiod you created in investigation 1.
- Create a second contact called senioradmin, using the email account for root@<yourdomain>.ops.
- Set the notification interval for the host and service you created in investigation 1 to five minutes. This is unreasonably short for most installations, but in this lab we want to get multiple notifications in a very short time line so that we can be sure they are working.
- If either of these services go into a hard-fail state, nagios should now send you an email. Note that you will have to configure the email server on your host to accept email for your domain.
- Manipulate you machine to cause these checks to fail (e.g. set your firewall to block ssh traffic), and make sure you receive the email before continuing.
- Fix your machine so the checks are passing again.
- Add a hostescalation and a serviceescalation so that if you don't fix the issue before you are notified three times, the notification will instead be sent to the senior admin.
- Cause the checks to fail again, and wait for the notification to be sent to root.
Investigation 3: Nagios Custom Commands
- Create a script plugin called check_apache that will use systemctl to check the state of your httpd service. If the service is running, return 0. If it is inactive, return 1. If it is failed, return 2. For any other result return 3.
- Create a command definition called check_apache_status that will call the check_apache plugin.
- Create a new service definition that will use the new command to check the status of your apache service every two minutes, going into a hard-fail state on the third failed check.
- Create an event handler script to restart apache if it is inactive. Use the nagios macros to make sure it only tries to restart apache on the second failed check (that is, before it goes into a hard-fail state).
- Add notifications similar to those for your other checks (you should be notified if the service goes into a hard-fail state, and the senior admin should be notified if you don't fix it).
Investigation 4: Nagios Remote Commands
- Under Construction
- Clone your existing VM again. Call the new VM nagiosclone.<yourdomain>.ops, provide it a static address of your choice, and add it to your DNS server.
- Install NRPE on nagiosclone.
Submission
Demonstrate the your script working on a newly installed VM, and upload it to blackboard.