OPS435 Python Assignment 1 2017 - 3
Contents
Assignment 1 - Parsing a log file
Weight: 15% of the overall grade.
Due Date: Ask your professor for exact date.
Late penalty: 10% per school day, and note the assignment must be completed satisfactorily in order to pass the course no matter what grade you get.
Overview
Often, system administrators need to analyze log files. This can be done using a paginator such as less
when your system has just been set up and/or you're the only user. On a production system it is not unusual to have thousands of legitimate users per month accessing the server's services, plus thousands more bots looking for unpatched vulnerabilities, brute-forcing username/password pairs, or just downloading every available file on your web server.
In this assignment you will create a program that will help you as a Apache server administrator to answer questions about the status, load, and security of your web server. You will not need to set up a web server for this assignment, though you're welcome to use the one you've set up in OPS335 as a practice machine.
Instructions
You are encouraged to start early and work on the parts you're comfortable with so that you don't have too much work piled up before the deadline.
- For example: you only need basic programming skills to create the menus and implement menu navigation. Do that now, so that later you don't have to worry about it.
- You may use any coding style you choose - but make sure it is consistent throughout the program. Coding style is important - not least because I have to read it. "Style" includes adding a comment at the top of every function explaining what it does. You won't lose marks for too many comments, you will lose marks for too few.
- If your assignemnt doesn't work or has major problems - you will be asked to resubmit it, which will likely cost you more than a late penalty.
Name and Parameters
Your Python3 program will be named check_apache_log_yourmysenecaid.py
and it will accept the following parameters:
-
--default
or-d
as an optional first argument, followed by: -
filename
, or: -
filename1 filename2 filename3
etc... - any number of filenames from 1 to as many as the command-line supports.
If the first argument is --default
or -d
, then you should not print a menu and instead execute the "How many total requests (Code 200)".
If there are no arguments - you should print an error and exit.
If there are other arguments - assume that they are names of files containing apache logs. If there is more than one file - append their contents to each other with the first one at the top and the last one at the bottom. Read the file contents in that order into your own data structures.
Header
Your program must be a single source file, and at the top of that file it will contain the following true statement as a comment (replace Andrew Smith with your own name):
OPS435 Assignment 1 - Fall 2017
check_apache_log_yourmysenecaid.py
Author: Andrew Smith
The source code in this file (check_apache_log_yourmysenecaid.py) is original work written
by Andrew Smith and has not been copied from any other source including any
person, textbook, or online resource. I have not shared this work with anyone
or anything except for submission for grading. I understand that the
Academic Honesty Policy is not a joke and violators will be punished.
Main Menu
Your program will be primarily menu-based instead of parameter-driven. That means the user will ask the program to do something after the program is already running, which is different from a typical command-line tool. When the program starts, it will present the user will the following menu:
Reading log files... done.
Apache Log Analyser - Main Menu
===============================
1) Successful Requests
2) Failed Requests
q) Quit
Make sure the line of equal signs is not hard-coded. You must be able to quickly change the title and not have to update a string with some number of extra or fewer equal signs. You might want to make a function to display this line, and use that function for the other menus as well.
The "Reading log files" message must display only once when your program starts, not every time the menu is displayed. You may find it easier to code this functionality after you're done writing the code for the menu itself.
Both option 1 and 2 will display a new menu.
the q option is self-explanatory.
Apache Log Analyser - Successful Requests Menu
==============================================
1) How many total requests (Code 200)
2) How many requests from Seneca (IPs starting with 142.204)
3) How many requests for isomaster-1.3.13.tar.bz2
q) Return to Main Menu
Each line in the log file is in the following Apache log format:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
One example line is:
109.86.167.47 - - [29/Aug/2017:10:22:49 -0400] "GET /isomaster/releases/isomaster-1.3.13.tar.bz2 HTTP/1.1" 200 245085 "http://littlesvr.ca/isomaster/download/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) Gecko/20100101 Firefox/55.0"
You should use the Python re module, look here for documentation, you should probably use match() and group(). You may use the following regular expression to extract the components from each line:
([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) ((\d+)|-).*
The questions are self-explanatory, provide answers formatted as you see fit.
Apache Log Analyser - Failed Requests Menu
==========================================
1) How many total failed requests (Codes 404, 400, 500, 403, 405, 408, 416)
2) How many invalid requests for wp-login.php
3) List the filenames for failed requests for files in /apng/assembler/data
q) Return to Main Menu
Again, the questions are self-explanatory once you understand the Apache combined log format.
Sample Log Files
Here is a collection of log files you may use for testing your program: ops435-asg1-logs.tar.bz2
Submission
After testing your program - submit the .py file via Blackboard.