Difference between revisions of "OPS435 Python/assignment 2 fall2022"

From CDOT Wiki
Jump to: navigation, search
(import from 435)
 
(fixed a type)
 
(9 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
=Assignment 2 - Usage Report=
 
=Assignment 2 - Usage Report=
 
'''Weight:''' 15% of the overall grade
 
'''Weight:''' 15% of the overall grade
 
'''What you need''' A github account with a private repository named ops435-a2
 
  
 
'''Due Date:''' Please follow the three stages of submission schedule:
 
'''Due Date:''' Please follow the three stages of submission schedule:
* Complete the detail algorithm for this assignment script by March 20, 2020 on github,
+
* Complete the requirements for the first milestone and push to GitHub by November 25, 2022 by 11:59 PM,
* Complete the your Python script coding and upload to your vm in myvmlab by March 26, 2020 on github
+
* Complete the requirements for the second milestone and push to GitHub by December 2, 2022 by 11:59 PM
* Complete the testing and debugging by April 3 on the vm in myvmlab, and also submit the detail algorithm file (algorithm.txt), test results and the python script (a1_[seneca_id].py)to blackboard.
+
* Complete the your Python script and push to GitHub by December 9, 2022 at 11:59 PM, and
 
+
* Copy your Python script into a Word document and submit to Blackboard by December 9, 2022 at 11:59 PM.
'''Late Penalty:''' 20% per school day, and note that this assignment must be completed satisfactorily in order to pass the course even if you get zero mark for this assignment.
 
  
 
==Overview==
 
==Overview==
Line 27: Line 24:
 
asmith  pts/11      10.40.105.130    Tue Feb 13 14:07:43 2018 - Tue Feb 13 16:07:43 2018  (02:00)
 
asmith  pts/11      10.40.105.130    Tue Feb 13 14:07:43 2018 - Tue Feb 13 16:07:43 2018  (02:00)
 
</pre>
 
</pre>
It is always desirable to have a daily, weekly, or monthly usage reports by user or by remote host based on the above information.
+
It is always desirable to have a daily, or monthly usage reports by user or by remote host based on the above information.
  
 
== Tasks for this assignment ==
 
== Tasks for this assignment ==
In this assignment, your should preform the following activities:
+
In this assignment, you will create a script that can generate usage reports based off of output from the last command or from a file in a similar format. You will use usage_data_file for testing but the script should also be tested on some other Linux machines and on Matrix.  
# Complete a detail algorithm for producing daily, weekly, and monthly usage reports by user or by remote host based on the information stored in any given files generated from the 'last' command, either from a text file or real-time.  
+
 
# Once you have complete the detail algorithm, you should then <b>design the structure of your python script</b> by identifying the appropriate python objects, functions and modules to be used for each task in your algorithm and the main control logic. Make sure to identify the followings:
+
Depending on the options selected, your script should list users or remote IP addresses, either overall or limited by a specific date. It should also generate daily usage reports for specific users/remote hosts, or weekly usage reports as well.
## input data,
 
## computation tasks, and
 
## outputs.
 
# implement your computational solution using a single python script. You can use any built-in functions and functions from the python modules list in the "Allowed Python Modules" section below to implement your solution.  
 
# Test and review your working python code to see whether you can improve the interface of each function to facilitate better code re-use (this process is called <b>refactoring</b>).
 
  
 
== Allowed Python Modules ==
 
== Allowed Python Modules ==
 
* the <b>os, sys</b> modules
 
* the <b>os, sys</b> modules
 
* the <b>argparse</b> module
 
* the <b>argparse</b> module
* The <b>time</b> module  
+
* The <b>datetime</b> module
* The <b>ur_funcs</b> module (provide by your teacher on github)
+
* The <b>subprocess</b> module
** [https://docs.python.org/3/howto/argparse.html Argparse Tutorial] - should read this first.
+
 
** [https://docs.python.org/3/library/argparse.html Argparse API reference information page]
+
=== Argparse ===
 +
Argparse is a much, much better way of dealing with command line arguments. It not only handles positional arguments, but options as well and will generate usage messages.
 +
It's <i>very highly recommended</i> that you spend at least a few minutes reading through the [https://docs.python.org/3/howto/argparse.html Argparse Tutorial].
 +
 
 +
=== Datetime ===
 +
Since Python is a <i>batteries</i> included language, it's important to get accustomed with using some of the modules in the standard library. Since we are dealing with dates and times, you are required to work with the datetime module. The full docs can be found [https://docs.python.org/3/library/datetime.html here].
 +
 
 +
Datetime objects can be initialized from strings that match a particular format. One example is provided for you in the codebase: [https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime the strptime] function.
 +
 
 +
Once you have created datetime objects, you can do useful things with them:
 +
* <code> d1 > d2 </code> will return True is d1 is <b>later</b> than d2.
 +
* <code> d2 - d1 </code> will return a <i>timedelta</i> object, which is an amount of time between d2 and d1.
 +
 
 +
More interesting methods are in the [https://docs.python.org/3/library/datetime.html#datetime.datetime.fold table] in the docs.
 +
 
 +
=== Timedelta ===
 +
When performing some operations with datetime objects, we may see timedelta objects being created. In math, <b>delta</b> refers to a difference. So a timedelta object is essentially an object that represents a duration.
 +
 
 +
You can do useful things with timedelta objects:
 +
* <code> delta1 + delta2 </code> will add up two durations. For example, if delta1 is two hours and delta2 is three hours, then this operation will return five hours.
 +
* <code> str(delta1) </code> will represent the timedelta in a friendly format: H:MM:SS if the duration is less than 24 hours.
 +
 
 +
More interesting methods are in the [https://docs.python.org/3/library/datetime.html#datetime.datetime.fold table] in the docs.
  
 
== Instructions ==
 
== Instructions ==
 +
 +
Accept the Assignment #2 via the link on Blackboard, and clone the Github repository on a Linux machine of your choosing. Your code should only be located in the file "assignment2.py".
  
 
=== Program Name and valid command line arguments ===
 
=== Program Name and valid command line arguments ===
Name your Python3 script as <code>ur_[student_id].py</code>. Create a symbolic link to your script as ur.py (e.g. use the command ln -s ur_rchan.py ur.py to create the link) so that you can refer to your script as ur.py. Your script must accept one or more "file name" as its command line parameters and other optional parameters as shown below. Your python script should produce the following usage text when run with the --help option:
+
Your script must accept one or more "file name" as its command line parameters and other optional parameters as shown below. Your python script should produce the following usage text when run with the --help option:
 
<pre>
 
<pre>
[rchan@centos7 a1]$ python3 ./ur.py -h
+
[eric@centos7 a1]$ python3 ./assignment2.py -h
usage: ur_rchan.py [-h] [-l {user,host}] [-r RHOST] [-t {daily,weekly,monthly}]
+
usage: assignment2.py [-h] [-l {user,host}] [-r RHOST] [-t {daily,weekly}]
            [-u USER] [-v]
+
                      [-d DATE] [-u USER] [-v]
            F [F ...]
+
                      [files]
  
 
Usage Report based on the last command
 
Usage Report based on the last command
  
 
positional arguments:
 
positional arguments:
   F                    list of files to be processed
+
   files                 file to be processed, if blank, will call last
  
 
optional arguments:
 
optional arguments:
Line 69: Line 85:
 
   -r RHOST, --rhost RHOST
 
   -r RHOST, --rhost RHOST
 
                         usage report for the given remote host IP
 
                         usage report for the given remote host IP
   -t {daily,weekly,monthly}, --type {daily,weekly,monthly}
+
   -t {daily,weekly}, --time {daily,weekly}
                         type of report: daily, weekly, and monthly
+
                         type of report: day or week
 +
  -d DATE, --date DATE  specify date for report
 
   -u USER, --user USER  usage report for the given user name
 
   -u USER, --user USER  usage report for the given user name
   -v, --verbose        tune on output verbosity
+
   -v, --verbose        turn on output verbosity
 +
 
 +
Copyright 2022 - Eric Brauer
 +
 
  
Copyright 2020 - Raymond Chan
 
 
</pre>
 
</pre>
Replace the last line with your own full name  
+
Replace the last line with your own full name.
  
 +
Compare the usage output you have now with the one above. There is one option missing, you will need to change the <code>argparse</code> function to implement it.
  
If there is only one file name provided at the command line, read the login/logout records from the contents of the given file. If the file name is "online", get the record on the system your script is being execute using the Linux command "last -iwF". The format of each line in the file should be the same as the output of 'last -Fiw'. Filter out incomplete login/logout record (hints: check for the number of fields in each record).
+
You will that there is an 'args' object in assignment2.py. Once the <code>parse_command_args()</code> function is called, it will return an args object. The command line arguments will be stored as attributes of that object. <b>Do not use sys.argv to parse arguments.</b>
  
If there is more than one file name provided, merge all the files together with the first one at the top and the last one at the bottom. Read and process the file contents in that order in your program.
+
If there is a file name provided at the command line, read the login/logout records from the contents of the given file. If there is not file name, get the record on the system your script is being execute using the Linux command "last -iwF". The format of each line in the file should be the same as the output of 'last -Fiw'. Filter out incomplete login/logout record (hints: check for the number of fields in each record).
  
 
=== Header ===
 
=== Header ===
Line 87: Line 107:
 
All your Python codes for this assignment must be placed in a <font color='red'><b><u>single source file</u></b></font>. Please include the following declaration by <b><u>you</u></b> as the <font color='blue'><b>script level docstring</b></font> in your Python source code file (replace [Student_id] with your Seneca email user name, and "Student Name" with your own name):
 
All your Python codes for this assignment must be placed in a <font color='red'><b><u>single source file</u></b></font>. Please include the following declaration by <b><u>you</u></b> as the <font color='blue'><b>script level docstring</b></font> in your Python source code file (replace [Student_id] with your Seneca email user name, and "Student Name" with your own name):
  
<source>OPS435 Assignment 2 - Winter 2020
+
<source>OPS435 Assignment 2 - Fall 2022
Program: <b>ur_[seneca_id].py</b>
+
Program: assignment2.py
Author: "<font color='red'>Student Name</font>"
+
Author: "Student Name"
The python code in this file <b>ur_[seneca_id].py</b> is original work written by
+
The python code in this file assignment2.py is original work written by
"<font color='red'>Student Name</font>". No code in this file is copied from any other source  
+
"Student Name". No code in this file is copied from any other source  
 
including any person, textbook, or on-line resource except those provided
 
including any person, textbook, or on-line resource except those provided
 
by the course instructor. I have not shared this python file with anyone
 
by the course instructor. I have not shared this python file with anyone
Line 99: Line 119:
 
</source>
 
</source>
  
=== Sample outputs ===
+
=== Use of Github ===
The following are the reports generated by the usage report script (ur.py) with the "usage_data_file" mentioned in the overview section. You can download the file [https://scs.senecac.on.ca/~raymond.chan/ops435/a2/usage_data_file here] to test your ur.py script.
+
You will once again be graded partly on <b>correct use of version control</b>, that is use of numerous commits with sensible commit messages. In professional practice, this is critically important for the timely delivery of code. You will be expected to use:
 +
<ol>
 +
<li><code>git add assignment2.py</code>
 +
<li><code>git commit -m "a message that describes the change"</code>
 +
<li><code>git push</code>
 +
 
 +
after completing each step. There is no penalty for "too many commits", there is no such thing!
 +
 
 +
== Suggested Process ==
 +
<ol>
 +
<li> Read the rest of this document, try and understand what is expected.
 +
<li> Use the invite link posted to Blackboard to accept the assignment, and clone the repo to a Linux machine.
 +
<li> Run the script itself. Investigate argparse. <b>In the main block, print(args).</b> Experiment with the various options.
 +
<li> Read the usage output in the docs, what option must you implement? Go ahead and implement it. <b>Commit the change.</b>
 +
<li> Use the check script to check your work: <code>./checkA2.py -f -v TestHelp</code>. It should succeed.
 +
<li> Investigate the <code>parse_for_user()</code> function, with the <code>usage_data_file</code>.
 +
<li> <code>parse_for_user()</code> should take the list of lines from the file, and instead return a list of usernames. <b>In main, print the title header and the output. Commit the change.</b>
 +
<li> <b>Once you have `output` --> `parse_for_user()` --> correct output being printed, use if conditions to print only when `-l user` is in the command line arguments.</b>
 +
<li> Test using <code>./checkA2.py -f -v TestList</code>. You should see some tests succeeding, but some failing. Use the check script to start implementing the functions needed for <b>-l host</b>.
 +
<li> <b>Continue committing these changes as your proceed.</b> Your script should now be passing the TestList tests.  
 +
<li> Now implement the -d <date> option. This will filter your user list based on the date provided by the user.
 +
<li> Use <code>./checkA2.py -f -v TestDate</code> to check your work. <b>You have completed the first milestone!</b>
 +
<li> Use <code>./checkA2.py -f -v TestDaily</code> and <code>./checkA2.py -f -v TestWeekly</code> to check your work. <b>You have completed the second milestone!</b>
 +
<li> The next stage will be to implement the daily/weekly reports. Use <code>TestDaily</code> and <code>TestWeekly</code> with the check script.
 +
<li> Perform last checks and document your code. Write **why** your code is doing what it does, rather than **what** it's doing. You should have 100% of tests succeeding.
 +
</ol>
 +
 
 +
== Output Format ==
 +
The format of your log tables should be identical to the sample output below, in order to minimize test check error. The horizontal banner between title and data should be composed of equal signs (=), and be the length of the title string.
 +
List tables should need no extra formatting.
 +
For daily/montly tables with two columns, The first column should be 10 characters long and be left-aligned.
 +
The second column should be 15 characters long and be right-aligned.
 +
 
 +
<pre>
 +
Daily Usage Report for rchan < same number of characters
 +
============================ <
 +
Date                Usage < right justified
 +
13/02/2018        0:26:00 <
 +
15/02/2018        0:33:00
 +
Total            0:59:20
 +
llllllllllrrrrrrrrrrrrrrr    < left column is 10 chars, right column is 15
 +
^left just.
 +
</pre>
 +
 
 +
=== Sample Outputs ===
 +
The following are the reports generated by the usage report script (ur.py) with the "usage_data_file" mentioned in the overview section.
 
==== User List ====
 
==== User List ====
 
The following is the user list extracted from the usage_data_file created by the command:
 
The following is the user list extracted from the usage_data_file created by the command:
 
<pre>
 
<pre>
[rchan@centos7 a2]$ ./ur.py -l user usage_data_file
+
[eric@centos7 a2]$ ./assignment2.py -l user usage_data_file
 
</pre>
 
</pre>
  
Line 119: Line 184:
 
The following is the remote host list extracted from the usage_file_file created by the command:
 
The following is the remote host list extracted from the usage_file_file created by the command:
 
<pre>
 
<pre>
[rchan@centos7 a2]$ ./ur.py -l host usage_data_file
+
[eric@centos7 a2]$ ./assignment2.py -l host usage_data_file
 
</pre>
 
</pre>
  
Line 131: Line 196:
 
</pre>
 
</pre>
  
==== Daily Usage Report by User ====
+
==== The Verbose Option ====
The following is a Daily Usage Report created for user rchan by the following command:
+
Either of the following two tests can be modified with the <code>--verbose</code> option. You shouldn't have to do anything to get this working:
 +
<pre>
 +
[eric@centos7 a2]$ ./assignment2.py -l host -v usage_data_file
 +
</pre>
 +
 
 +
<pre>
 +
Files to be processed: usage_data_file
 +
Type of args for files <class 'str'>
 +
Host list for usage_data_file
 +
=============================
 +
10.40.105.130
 +
10.40.91.236
 +
10.40.91.247
 +
10.43.115.162
 +
</pre>
 +
 
 +
==== List For Specific Day ====
 +
Specifying a <code>--date</code> in YYYY-MM-DD format should list all users or hosts that were logged in at some point during that date, even if their start time or end time is different. For example, user <code>cwsmith</code> logged in on Feb 14 and logged off on Feb 15, but they show up when the following command is run:
 +
<pre>
 +
[eric@centos7 a2]$ ./assignment2.py -l user -d 2018-02-14 usage_data_file
 +
</pre>
 +
 
 +
<pre>
 +
User list for usage_data_file
 +
=============================
 +
cwsmith
 +
</pre>
 +
 
 +
This should work for hosts as well:
 +
<pre>
 +
[eric@centos7 a2]$ ./assignment2.py -l host -d 2018-02-14 usage_data_file
 +
</pre>
 +
 
 +
<pre>
 +
Host list for usage_data_file
 +
=============================
 +
10.40.105.130
 +
</pre>
 +
 
 +
If the user types in an invalid date, the script should halt and print the following error message:
 
<pre>
 
<pre>
[rchan@centos7 a2]$ ./ur.py -u rchan -t daily usage_data_file
+
[eric@centos7 a2]$ ./assignment2.py -l host -d 2018-02-xx usage_data_file
 
</pre>
 
</pre>
  
 
<pre>
 
<pre>
Daily Usage Report for rchan
+
Date not recognized. Use YYYY-MM-DD format.
============================
 
Date         Usage in Seconds
 
2018 02 15        1980
 
2018 02 13        1580
 
Total            3560
 
 
</pre>
 
</pre>
 +
 +
 +
==== Daily Usage Report by User ====
 +
The following are Daily Usage Report is created for user rchan.
  
 
<pre>
 
<pre>
[rchan@centos a2]$ ./ur.py -u cwsmith -t daily usage_data_file
+
[eric@centos a2]$ ./assignment2.py -u rchan -t daily usage_data_file
 
</pre>
 
</pre>
 +
 
<pre>
 
<pre>
Daily Usage Report for cwsmith
+
Daily Usage Report for rchan
==============================
+
============================
Date         Usage in Seconds
+
Date               Usage
2018 03 13       2280
+
13/02/2018        0:26:00
2018 02 15        7883
+
15/02/2018        0:33:00
2018 02 14       3047
+
Total             0:59:20
Total           13210
 
 
</pre>
 
</pre>
 +
 +
Be also sure to test with the <code>--verbose</code>
  
 
==== Daily Usage Report by Remote Host====
 
==== Daily Usage Report by Remote Host====
 
The following is a Daily Usage Report created for the Remote Host 10.40.105.103 by the command:
 
The following is a Daily Usage Report created for the Remote Host 10.40.105.103 by the command:
 
<pre>
 
<pre>
[rchan@centos7 a2]$ ./ur.py -r 10.40.105.130 -t daily usage_data_file
+
[eric@centos7 a2]$ ./assignment2.py -r 10.40.105.130 -t daily usage_data_file
 
</pre>
 
</pre>
  
Line 168: Line 273:
 
Daily Usage Report for 10.40.105.130
 
Daily Usage Report for 10.40.105.130
 
====================================
 
====================================
Date         Usage in Seconds
+
Date               Usage
2018 02 15        7883
+
14/02/2018       3:02:11
2018 02 14        3047
+
13/02/2018        2:12:49
2018 02 13       7969
+
Total             5:15:00
Total           18899
 
 
</pre>
 
</pre>
  
 
==== Weekly Usage Report by User ====
 
==== Weekly Usage Report by User ====
The following is a Weekly Usage Report created for user rchan by the command:
+
The following is a Weekly Usage Report created for user cwsmith by the command:
 
<pre>
 
<pre>
[rchan@centos7 a2]$ ./ur.py -u rchan -t weekly usage_data_file
+
[eric@centos7 a2]$ ./assignment2.py -u cwsmith -t weekly usage_data_file  
 
</pre>
 
</pre>
  
 
<pre>
 
<pre>
Weekly Usage Report for rchan
+
Weekly Usage Report for cwsmith
=============================
+
===============================
Week #        Usage in Seconds
+
Date                Usage
2018 07           3560
+
2018 06          3:02:11
Total            3560
+
2018 10           0:38:00
 +
Total            3:40:11
 
</pre>
 
</pre>
  
 +
==== Weekly Usage Report by Remote Host ====
 +
The following is a Weekly Usage Report created for the remote host 10.40.105.130 by the command:
 
<pre>
 
<pre>
[rchan@centos7 a2]$ ./ur.py -u cwsmith -t weekly usage_data_file
+
[eric@centos7 a2]$ ./assignment2.py -r 10.43.115.162 -t weekly usage_data_file
 
</pre>
 
</pre>
  
 
<pre>
 
<pre>
Weekly Usage Report for cwsmith
+
Weekly Usage Report for 10.43.115.162
===============================
+
=====================================
Week #        Usage in Seconds
+
Date                Usage
2018 11           2280
+
2018 06           0:02:31
2018 07          10930
+
Total            0:02:31
Total            13210
+
</pre>
  
 +
==== Daily Report From Online ====
 +
Running the script with <B>no filename</b> as a file argument should call a subprocess.Popen object and run the command <code>last -Fiw</code>.
 +
<pre>
 +
[eric@mtrx-node06pd ~]$ ./assignment2.py -l user
 
</pre>
 
</pre>
  
==== Weekly Usage Report by Remote Host ====
+
(Example Output from Matrix):
The following is a Weekly Usage Report created for the remote host 10.40.105.130 by the command:
 
 
<pre>
 
<pre>
[rchan@centos7 a2]$ ./ur.py -r 10.40.105.130 -t weekly usage_data_file
+
User list for online
 +
====================
 +
aabbas28
 +
aaddae1
 +
aali309
 +
aaljajah
 +
aalves-staffa
 +
aanees1
 +
aarham
 +
aassankanov
 +
abalandin
 +
abhaseen
 +
abholay
 +
acamuzcu
 +
acchikoti
 +
adas20
 +
adeel.javed
 +
...
 
</pre>
 
</pre>
  
 
<pre>
 
<pre>
Weekly Usage Report for 10.40.105.130
+
[eric@mtrx-node06pd ~]$ ./assignment2.py -u adas20 -t daily
=====================================
 
Week #        Usage in Seconds
 
2018 07          18899
 
Total            18899
 
 
</pre>
 
</pre>
  
== Suggested work-flow for this assignment ==
+
<pre>
=== create a <font color="blue">private</font> repository on github ===
+
Daily Usage Report for abholay
* Use the same github account you used for your assignment 1 repository.
+
==============================
* Create a private repository named "ops435-a2" for this assignment.
+
Date                Usage
* Populate your private repository with appropriate files. Please check out the sample repository <b><font color='blue'>[https://github.com/rayfreeping/ops435-a2 here]</font></b>
+
16/07/2020      00:13:09
 +
17/07/2020      00:08:59
 +
Total            00:22:08
  
=== Add collaborator to your ops435-a2 private repository ===
+
</pre>
* Add your professor's github account as one of the collaborators to your ops435-a2 private repository. This will allow your professor to pull the contents of your ops435-a2 repository and also to review and suggest changes and fixes to your algorithm and/or python script.
 
* <b><font color='blue'>Make sure that your professor accepted your invitation from github.com.</font></b>
 
 
 
=== Detail Algorithm Document ===
 
Follow the standard computation procedure: input - process - ouput when creating the algorithm document for this assignment.
 
==== input ====
 
* get data (command line arguments/options) from the user using the functions provided by the argparse module
 
* according to the arguments/options given at the command line, take appropriate processing action.
 
==== processing ====
 
* based on the file(s) specified, read the contents of each file and use appropriate objects to store it
 
* based on the command line arguments/options, process the data accordingly, which includes
 
** data preprocessing (split a multi-day record into single day record)
 
** record processing (preform required computation)
 
==== output ====
 
* output the required report based on the processed data
 
==== identify and select appropriate python objects and functions ====
 
The following python functions (to be created, you may have more) are useful in handling the following sub-tasks:
 
* reads login records from files and filters out unwanted records
 
* convert login records into proper python object type so that it can be processed using as much built-in functions as possible
 
* create function which generates daily usage reports by user and/or by remote host
 
* create function which generates weekly usage reports by user and/or by remote host
 
  
To  help you with this assignment, you can use the ur_template.py in the sample ops435-a2 repository as a starting point in designing your own Python Usage Report script.
+
Please note that there will no unit test for this, but it is still a requirement.
<font color='blue'><b>If you don't have enough time to create all the functions for the data processing steps, you should study the functions in the ur_funcs.py (provided by your teacher), pick and use the one that may help. If you use any of the functions from ur_funcs.py, there will be a cost of 10% to your overall grade. If you create all the functions yourself, you will get a bonus of 10%.</b></font>
 
  
=== Python script coding and debugging ===
+
== First Milestone ==
 +
By the first deadline, you should have your <code>TestHelp</code>, <code>TestList</code> and <code>TestDate</code> tests all passing. Make sure that the code is in your GitHub repository. I will use a pull request comment to give feedback, suggest changes or get you unstuck.
 +
== Second Milestone ==
 +
By the second deadline, you should have <code>TestDaily</code> as well as <code>TestWeekly</code> tests passing. Again, make sure that the code pushed to GitHub includes your latest work.
 +
== Python script coding and debugging ==
 
For each function, identify what type of objects should be passed to the function, and what type of objects should be returned to the caller.
 
For each function, identify what type of objects should be passed to the function, and what type of objects should be returned to the caller.
 
Once you have finished coding a function, you should start a Python3 interactive shell, import your functions and manually test each function and verify its correctness.
 
Once you have finished coding a function, you should start a Python3 interactive shell, import your functions and manually test each function and verify its correctness.
=== Final Test===
+
== Final Test==
 
Once you have all the individual function tested and that each is working properly, perform the final test with test data provided by your professor and verify that your script produces the correct results before submitting your python program on Blackboard. Upload all the files for this assignment 2 to your vm in myvmlab and perform the final test.
 
Once you have all the individual function tested and that each is working properly, perform the final test with test data provided by your professor and verify that your script produces the correct results before submitting your python program on Blackboard. Upload all the files for this assignment 2 to your vm in myvmlab and perform the final test.
  
== Sample login/logout records file and sample test run results==
 
* can be found from the sample repository github.com/rayfreeping/ops435-a2
 
  
 
== Rubric ==
 
== Rubric ==
Line 263: Line 368:
 
! Task !!  Maximum mark !! Actual mark
 
! Task !!  Maximum mark !! Actual mark
 
|-
 
|-
| User Requirement Document||20 ||
+
| First Milestone || 10 ||
 
|-
 
|-
| Program usage and Options || 20 ||
+
| Second Milestone || 10 ||
 
|-
 
|-
| Generate user name list || 10 ||
+
| Additional Check: 'online' || 5 ||
 
|-
 
|-
| Generate remote host IP list|| 10 ||
+
| GitHub Use || 10 ||
 
|-
 
|-
| Daily Usage Report by User || 10 ||
+
| List Functions || 10 ||  
 
|-
 
|-
| Daily Usage Report by Remote Host || 10 ||
+
| Daily/Weekly Functions || 10 ||
 
|-
 
|-
| Weekly Usage Report by User || 10 ||
+
| Date Functions || 10 ||  
 
|-
 
|-
| Weekly Usage Report by Remote Host || 10 ||
+
| Error checking and exception handline || 10 ||
 +
|-
 +
| Overall Design/Coherence || 10 ||
 +
|-
 +
| Documentation || 15 ||
 
|-
 
|-
 
 
| '''Total''' || 100 ||  
 
| '''Total''' || 100 ||  
 
 
|}
 
|}
  
 
== Submission ==
 
== Submission ==
* Stage 1: upload your algorithm document file to your ops435-a2 repository in github.com by March 20, 2020
+
* Stage 1: Complete the first milestone on GitHub by November 25, 2022.
* Stage 2: upload your python script for this assignment to your ops435-a2 repository in github.com and to your vm in myvmlab by March 26, 2020
+
* Stage 2: Complete the second milestone on GitHub by December 2, 2022.
* Stage 3: After fully tested and debugged your python script for this assignment, update your algorithm, your python script, and your est results to your ops435-a2 repository in github.com. Also submit the algorithm document, the python script and final test result to blackboard by April 3, 2020
+
* Stage 3: Use commits to push your python script for this assignment to Github.com. The final state of your repository will be looked at on December 9, 2022 at 11:59 PM.
 +
* Additionally: Copy your python script into a Word document and submit to Blackboard by December 9, 2022 at 11:59 PM.

Latest revision as of 16:18, 2 December 2022


Assignment 2 - Usage Report

Weight: 15% of the overall grade

Due Date: Please follow the three stages of submission schedule:

  • Complete the requirements for the first milestone and push to GitHub by November 25, 2022 by 11:59 PM,
  • Complete the requirements for the second milestone and push to GitHub by December 2, 2022 by 11:59 PM
  • Complete the your Python script and push to GitHub by December 9, 2022 at 11:59 PM, and
  • Copy your Python script into a Word document and submit to Blackboard by December 9, 2022 at 11:59 PM.

Overview

Most system administrators would like to know the utilization of their systems by their users. On a Linux system, each user's login records are normally stored in the binary file /var/log/wtmp. The login records in this binary file can not be viewed or edited directly using normal Linux text commands like 'less', 'cat', etc. The 'last' command is often used to display the login records stored in this file in a human readable form. Please check the man page of the 'last' command for available options. The following is the contents of the file named "usage_data_file", which is a sample output of the 'last' command with the '-Fiw' flag on:

$ last -Fiw > usage_data_file
$ cat usage_data_file
rchan    pts/9        10.40.91.236     Tue Feb 13 16:53:42 2018 - Tue Feb 13 16:57:02 2018  (00:03)    
cwsmith  pts/10       10.40.105.130    Wed Feb 14 23:09:12 2018 - Thu Feb 15 02:11:23 2018  (03:02)
rchan    pts/2        10.40.91.236     Tue Feb 13 16:22:00 2018 - Tue Feb 13 16:45:00 2018  (00:23)    
rchan    pts/5        10.40.91.236     Tue Feb 15 16:22:00 2018 - Tue Feb 15 16:55:00 2018  (00:33)    
asmith   pts/2        10.43.115.162    Tue Feb 13 16:19:29 2018 - Tue Feb 13 16:22:00 2018  (00:02)    
tsliu2   pts/4        10.40.105.130    Tue Feb 13 16:17:21 2018 - Tue Feb 13 16:30:10 2018  (00:12)    
cwsmith  pts/13       10.40.91.247     Tue Mar 13 18:08:52 2018 - Tue Mar 13 18:46:52 2018  (00:38)    
asmith   pts/11       10.40.105.130    Tue Feb 13 14:07:43 2018 - Tue Feb 13 16:07:43 2018  (02:00)

It is always desirable to have a daily, or monthly usage reports by user or by remote host based on the above information.

Tasks for this assignment

In this assignment, you will create a script that can generate usage reports based off of output from the last command or from a file in a similar format. You will use usage_data_file for testing but the script should also be tested on some other Linux machines and on Matrix.

Depending on the options selected, your script should list users or remote IP addresses, either overall or limited by a specific date. It should also generate daily usage reports for specific users/remote hosts, or weekly usage reports as well.

Allowed Python Modules

  • the os, sys modules
  • the argparse module
  • The datetime module
  • The subprocess module

Argparse

Argparse is a much, much better way of dealing with command line arguments. It not only handles positional arguments, but options as well and will generate usage messages. It's very highly recommended that you spend at least a few minutes reading through the Argparse Tutorial.

Datetime

Since Python is a batteries included language, it's important to get accustomed with using some of the modules in the standard library. Since we are dealing with dates and times, you are required to work with the datetime module. The full docs can be found here.

Datetime objects can be initialized from strings that match a particular format. One example is provided for you in the codebase: the strptime function.

Once you have created datetime objects, you can do useful things with them:

  • d1 > d2 will return True is d1 is later than d2.
  • d2 - d1 will return a timedelta object, which is an amount of time between d2 and d1.

More interesting methods are in the table in the docs.

Timedelta

When performing some operations with datetime objects, we may see timedelta objects being created. In math, delta refers to a difference. So a timedelta object is essentially an object that represents a duration.

You can do useful things with timedelta objects:

  • delta1 + delta2 will add up two durations. For example, if delta1 is two hours and delta2 is three hours, then this operation will return five hours.
  • str(delta1) will represent the timedelta in a friendly format: H:MM:SS if the duration is less than 24 hours.

More interesting methods are in the table in the docs.

Instructions

Accept the Assignment #2 via the link on Blackboard, and clone the Github repository on a Linux machine of your choosing. Your code should only be located in the file "assignment2.py".

Program Name and valid command line arguments

Your script must accept one or more "file name" as its command line parameters and other optional parameters as shown below. Your python script should produce the following usage text when run with the --help option:

[eric@centos7 a1]$ python3 ./assignment2.py -h
usage: assignment2.py [-h] [-l {user,host}] [-r RHOST] [-t {daily,weekly}]
                      [-d DATE] [-u USER] [-v]
                      [files]

Usage Report based on the last command

positional arguments:
  files                 file to be processed, if blank, will call last

optional arguments:
  -h, --help            show this help message and exit
  -l {user,host}, --list {user,host}
                        generate user name or remote host IP from the given
                        files
  -r RHOST, --rhost RHOST
                        usage report for the given remote host IP
  -t {daily,weekly}, --time {daily,weekly}
                        type of report: day or week
  -d DATE, --date DATE  specify date for report
  -u USER, --user USER  usage report for the given user name
  -v, --verbose         turn on output verbosity

Copyright 2022 - Eric Brauer


Replace the last line with your own full name.

Compare the usage output you have now with the one above. There is one option missing, you will need to change the argparse function to implement it.

You will that there is an 'args' object in assignment2.py. Once the parse_command_args() function is called, it will return an args object. The command line arguments will be stored as attributes of that object. Do not use sys.argv to parse arguments.

If there is a file name provided at the command line, read the login/logout records from the contents of the given file. If there is not file name, get the record on the system your script is being execute using the Linux command "last -iwF". The format of each line in the file should be the same as the output of 'last -Fiw'. Filter out incomplete login/logout record (hints: check for the number of fields in each record).

Header

All your Python codes for this assignment must be placed in a single source file. Please include the following declaration by you as the script level docstring in your Python source code file (replace [Student_id] with your Seneca email user name, and "Student Name" with your own name):

OPS435 Assignment 2 - Fall 2022
Program: assignment2.py
Author: "Student Name"
The python code in this file assignment2.py is original work written by
"Student Name". No code in this file is copied from any other source 
including any person, textbook, or on-line resource except those provided
by the course instructor. I have not shared this python file with anyone
or anything except for submission for grading.  
I understand that the Academic Honesty Policy will be enforced and violators 
will be reported and appropriate action will be taken.

Use of Github

You will once again be graded partly on correct use of version control, that is use of numerous commits with sensible commit messages. In professional practice, this is critically important for the timely delivery of code. You will be expected to use:

  1. git add assignment2.py
  2. git commit -m "a message that describes the change"
  3. git push after completing each step. There is no penalty for "too many commits", there is no such thing!

    Suggested Process

    1. Read the rest of this document, try and understand what is expected.
    2. Use the invite link posted to Blackboard to accept the assignment, and clone the repo to a Linux machine.
    3. Run the script itself. Investigate argparse. In the main block, print(args). Experiment with the various options.
    4. Read the usage output in the docs, what option must you implement? Go ahead and implement it. Commit the change.
    5. Use the check script to check your work: ./checkA2.py -f -v TestHelp. It should succeed.
    6. Investigate the parse_for_user() function, with the usage_data_file.
    7. parse_for_user() should take the list of lines from the file, and instead return a list of usernames. In main, print the title header and the output. Commit the change.
    8. Once you have `output` --> `parse_for_user()` --> correct output being printed, use if conditions to print only when `-l user` is in the command line arguments.
    9. Test using ./checkA2.py -f -v TestList. You should see some tests succeeding, but some failing. Use the check script to start implementing the functions needed for -l host.
    10. Continue committing these changes as your proceed. Your script should now be passing the TestList tests.
    11. Now implement the -d <date> option. This will filter your user list based on the date provided by the user.
    12. Use ./checkA2.py -f -v TestDate to check your work. You have completed the first milestone!
    13. Use ./checkA2.py -f -v TestDaily and ./checkA2.py -f -v TestWeekly to check your work. You have completed the second milestone!
    14. The next stage will be to implement the daily/weekly reports. Use TestDaily and TestWeekly with the check script.
    15. Perform last checks and document your code. Write **why** your code is doing what it does, rather than **what** it's doing. You should have 100% of tests succeeding.

    Output Format

    The format of your log tables should be identical to the sample output below, in order to minimize test check error. The horizontal banner between title and data should be composed of equal signs (=), and be the length of the title string. List tables should need no extra formatting. For daily/montly tables with two columns, The first column should be 10 characters long and be left-aligned. The second column should be 15 characters long and be right-aligned.

    Daily Usage Report for rchan < same number of characters
    ============================ <
    Date                Usage < right justified
    13/02/2018        0:26:00 <
    15/02/2018        0:33:00
    Total             0:59:20
    llllllllllrrrrrrrrrrrrrrr    < left column is 10 chars, right column is 15
    ^left just.
    

    Sample Outputs

    The following are the reports generated by the usage report script (ur.py) with the "usage_data_file" mentioned in the overview section.

    User List

    The following is the user list extracted from the usage_data_file created by the command:

    [eric@centos7 a2]$ ./assignment2.py -l user usage_data_file
    
    User list for usage_data_file
    =============================
    asmith
    cwsmith
    rchan
    tsliu2
    

    Remote Host List

    The following is the remote host list extracted from the usage_file_file created by the command:

    [eric@centos7 a2]$ ./assignment2.py -l host usage_data_file
    
    Host list for usage_data_file
    =============================
    10.40.105.130
    10.40.91.236
    10.40.91.247
    10.43.115.162
    

    The Verbose Option

    Either of the following two tests can be modified with the --verbose option. You shouldn't have to do anything to get this working:

    [eric@centos7 a2]$ ./assignment2.py -l host -v usage_data_file
    
    Files to be processed: usage_data_file
    Type of args for files <class 'str'>
    Host list for usage_data_file
    =============================
    10.40.105.130
    10.40.91.236
    10.40.91.247
    10.43.115.162
    

    List For Specific Day

    Specifying a --date in YYYY-MM-DD format should list all users or hosts that were logged in at some point during that date, even if their start time or end time is different. For example, user cwsmith logged in on Feb 14 and logged off on Feb 15, but they show up when the following command is run:

    [eric@centos7 a2]$ ./assignment2.py -l user -d 2018-02-14 usage_data_file
    
    User list for usage_data_file
    =============================
    cwsmith
    

    This should work for hosts as well:

    [eric@centos7 a2]$ ./assignment2.py -l host -d 2018-02-14 usage_data_file
    
    Host list for usage_data_file
    =============================
    10.40.105.130
    

    If the user types in an invalid date, the script should halt and print the following error message:

    [eric@centos7 a2]$ ./assignment2.py -l host -d 2018-02-xx usage_data_file
    
    Date not recognized. Use YYYY-MM-DD format.
    


    Daily Usage Report by User

    The following are Daily Usage Report is created for user rchan.

    [eric@centos a2]$ ./assignment2.py -u rchan -t daily usage_data_file
    
    Daily Usage Report for rchan
    ============================
    Date                Usage
    13/02/2018        0:26:00
    15/02/2018        0:33:00
    Total             0:59:20
    

    Be also sure to test with the --verbose

    Daily Usage Report by Remote Host

    The following is a Daily Usage Report created for the Remote Host 10.40.105.103 by the command:

    [eric@centos7 a2]$ ./assignment2.py -r 10.40.105.130 -t daily usage_data_file
    
    Daily Usage Report for 10.40.105.130
    ====================================
    Date                Usage
    14/02/2018        3:02:11
    13/02/2018        2:12:49
    Total             5:15:00
    

    Weekly Usage Report by User

    The following is a Weekly Usage Report created for user cwsmith by the command:

    [eric@centos7 a2]$ ./assignment2.py -u cwsmith -t weekly usage_data_file 
    
    Weekly Usage Report for cwsmith
    ===============================
    Date                Usage
    2018 06           3:02:11
    2018 10           0:38:00
    Total             3:40:11
    

    Weekly Usage Report by Remote Host

    The following is a Weekly Usage Report created for the remote host 10.40.105.130 by the command:

    [eric@centos7 a2]$ ./assignment2.py -r 10.43.115.162 -t weekly usage_data_file
    
    Weekly Usage Report for 10.43.115.162
    =====================================
    Date                Usage
    2018 06           0:02:31
    Total             0:02:31
    

    Daily Report From Online

    Running the script with no filename as a file argument should call a subprocess.Popen object and run the command last -Fiw.

    [eric@mtrx-node06pd ~]$ ./assignment2.py -l user
    

    (Example Output from Matrix):

    User list for online
    ====================
    aabbas28
    aaddae1
    aali309
    aaljajah
    aalves-staffa
    aanees1
    aarham
    aassankanov
    abalandin
    abhaseen
    abholay
    acamuzcu
    acchikoti
    adas20
    adeel.javed
    ...
    
    [eric@mtrx-node06pd ~]$ ./assignment2.py -u adas20 -t daily
    
    Daily Usage Report for abholay
    ==============================
    Date                Usage
    16/07/2020       00:13:09
    17/07/2020       00:08:59
    Total            00:22:08
    
    

    Please note that there will no unit test for this, but it is still a requirement.

    First Milestone

    By the first deadline, you should have your TestHelp, TestList and TestDate tests all passing. Make sure that the code is in your GitHub repository. I will use a pull request comment to give feedback, suggest changes or get you unstuck.

    Second Milestone

    By the second deadline, you should have TestDaily as well as TestWeekly tests passing. Again, make sure that the code pushed to GitHub includes your latest work.

    Python script coding and debugging

    For each function, identify what type of objects should be passed to the function, and what type of objects should be returned to the caller. Once you have finished coding a function, you should start a Python3 interactive shell, import your functions and manually test each function and verify its correctness.

    Final Test

    Once you have all the individual function tested and that each is working properly, perform the final test with test data provided by your professor and verify that your script produces the correct results before submitting your python program on Blackboard. Upload all the files for this assignment 2 to your vm in myvmlab and perform the final test.


    Rubric

    Task Maximum mark Actual mark
    First Milestone 10
    Second Milestone 10
    Additional Check: 'online' 5
    GitHub Use 10
    List Functions 10
    Daily/Weekly Functions 10
    Date Functions 10
    Error checking and exception handline 10
    Overall Design/Coherence 10
    Documentation 15
    Total 100

    Submission

    • Stage 1: Complete the first milestone on GitHub by November 25, 2022.
    • Stage 2: Complete the second milestone on GitHub by December 2, 2022.
    • Stage 3: Use commits to push your python script for this assignment to Github.com. The final state of your repository will be looked at on December 9, 2022 at 11:59 PM.
    • Additionally: Copy your python script into a Word document and submit to Blackboard by December 9, 2022 at 11:59 PM.