Difference between revisions of "OPS435 Assignment 2 for Section A"

From CDOT Wiki
Jump to: navigation, search
(Created page with "Category:OPS435-PythonCategory:rchan =Assignment 2 - Usage Report= '''Weight:''' 10% of the overall grade '''Due Date:''' Please follow the three stages of submissio...")
 
(Second Milestone (due August 2))
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Category:OPS435-Python]][[Category:rchan]]
+
[[Category:OPS435-Python]][[Category:ebrauer]]
 +
= Overview: du Improved =
 +
<code>du</code> is a tool for inspecting directories. It will return the contents of a directory along with how much drive space they are using. However, it can be parse its output quickly, as it usually returns file sizes as a number of bytes:
  
=Assignment 2 - Usage Report=
+
<code><b>user@host ~ $ du --max-depth 1 /usr/local/lib</b></code>
'''Weight:''' 10% of the overall grade
+
<pre>
 
+
164028 /usr/local/lib/heroku
'''Due Date:''' Please follow the three stages of submission schedule:
+
11072 /usr/local/lib/python2.7
* Complete the algorithm document for this assignment script by July 31, 2020 and submit on Blackboard by 9:00 PM,
+
92608 /usr/local/lib/node_modules
* Complete the your Python script and push to Github by August 14, 2020 at 9:00 PM, and
+
8 /usr/local/lib/python3.8
* Copy your Python script into a Word document and submit to Blackboard by August 14, 2020 at 9:00 PM.
+
267720 /usr/local/lib
 +
</pre>
 +
You will therefore be creating a tool called <b>duim (du improved)</b>. Your script will call du and return the contents of a specified directory, and generate a bar graph for each subdirectory. The bar graph will represent the drive space as percent of the total drive space for the specified directory.
 +
An example of the finished code your script might produce is this:
  
==Overview==
+
<code><b>user@host ~ $ ./duim.py -H /usr/local/lib</b></code>
Most system administrators would like to know the utilization of their systems by their users. On a Linux system, each user's login records are normally stored in the binary file /var/log/wtmp. The login records in this binary file can not be viewed or edited directly using normal Linux text commands like 'less', 'cat', etc. The 'last' command is often used to display the login records stored in this file in a human readable form. Please check the man page of the 'last' command for available options. The following is the contents of the file named "usage_data_file", which is a sample output of the 'last' command with the '-Fiw' flag on:
 
 
<pre>
 
<pre>
$ last -Fiw > usage_data_file
+
61 % [============       ] 160.2 M /usr/local/lib/heroku
$ cat usage_data_file
+
   4 % [=                  ] 10.8 M /usr/local/lib/python2.7
rchan    pts/9       10.40.91.236    Tue Feb 13 16:53:42 2018 - Tue Feb 13 16:57:02 2018  (00:03)   
+
  34 % [=======            ] 90.4 M /usr/local/lib/node_modules
cwsmith  pts/10      10.40.105.130    Wed Feb 14 23:09:12 2018 - Thu Feb 15 02:11:23 2018  (03:02)
+
   0 % [                    ] 8.0 K /usr/local/lib/python3.8
rchan    pts/2        10.40.91.236    Tue Feb 13 16:22:00 2018 - Tue Feb 13 16:45:00 2018  (00:23)   
+
Total: 261.4 M /usr/local/lib
rchan    pts/5        10.40.91.236    Tue Feb 15 16:22:00 2018 - Tue Feb 15 16:55:00 2018  (00:33)   
 
asmith  pts/2        10.43.115.162    Tue Feb 13 16:19:29 2018 - Tue Feb 13 16:22:00 2018  (00:02)   
 
tsliu2   pts/4       10.40.105.130    Tue Feb 13 16:17:21 2018 - Tue Feb 13 16:30:10 2018  (00:12)   
 
cwsmith pts/13      10.40.91.247    Tue Mar 13 18:08:52 2018 - Tue Mar 13 18:46:52 2018  (00:38)   
 
asmith   pts/11      10.40.105.130    Tue Feb 13 14:07:43 2018 - Tue Feb 13 16:07:43 2018  (02:00)
 
 
</pre>
 
</pre>
It is always desirable to have a daily, or monthly usage reports by user or by remote host based on the above information.
 
  
== Tasks for this assignment ==
+
Notice that total size of the target directory (/usr/local/lib) is around 260 Megabytes. Of that 260 Megabytes, 160 Megabytes can be found in the heroku subdirectory.
In this assignment, your should preform the following activities:
+
 
# Complete a detail algorithm for producing monthly usage reports by user or by remote host based on the information stored in any given files generated from the 'last' command.  
+
160 MB represents 61% of the total 160 MB. The percentages don't have to add up to 100%, since with these arguments we are excluding files in the target directory. You may choose to add an option to your script to print files as well.  
# Once you have complete the detail algorithm, you should then <b>design the structure of your python script</b> by identifying the appropriate python objects, functions and modules to be used for each task in your algorithm and the main control logic. Make sure to identify the followings:
 
## input data,
 
## computation tasks, and
 
## outputs.
 
# implement your computational solution using a single python script. You can use any built-in functions and functions from the python modules list in the "Allowed Python Modules" section below to implement your solution.
 
# Test and review your working python code to see whether you can improve the interface of each function to facilitate better code re-use (this process is called <b>refactoring</b>).
 
  
== Allowed Python Modules ==
+
The bar chart in this example is 20 characters long, but this must be dynamic. The 20 characters does <i>not</i> the square brackets. The resolution of the bar chart must become more accurate as you increase the total size. For example, if the user specifies a length of 100 total characters, in this example 61 of those characters would be equal signs and 39 would be spaces.
* the <b>os, sys</b> modules
 
* the <b>argparse</b> module
 
* The <b>time</b> module
 
* The <b>subprocess</b> module
 
** [https://docs.python.org/3/howto/argparse.html Argparse Tutorial] - should read this first.
 
** [https://docs.python.org/3/library/argparse.html Argparse API reference information page]
 
  
== Instructions ==
+
The output of each subdirectory should include percentage, size in bytes (or Human readable if the user uses the -H option), the bar chart and the name of the subdirectory.
 +
Specific formatting of the final output will be up to you, but should be formatted in such a way that the output is easy to read. (ie. use columns!)
  
Accept the Assignment #2 via the link on Blackboard, and clone the Github repository on a Linux machine of your choosing. Rename "a2_template.py" to "a2_<your myseneca username>.py, just as we did in Assignment 1. You may also want to create a symbolic link using <code>ln -s a2_<myseneca_id>.py a2.py</code> to save time.
+
You will be required to fulfill some specific requirements before completing your script. Read on...
  
=== Program Name and valid command line arguments ===
+
= Assignment Requirements =
Name your Python3 script as <code>a2_[student_id].py</code>. Your script must accept one or more "file name" as its command line parameters and other optional parameters as shown below. Your python script should produce the following usage text when run with the --help option:
 
<pre>
 
[eric@centos7 a1]$ python3 ./a2.py -h
 
usage: new_template.py [-h] [-l {user,host}] [-r RHOST] [-t {daily,monthly}]
 
                      [-u USER] [-s] [-v]
 
                      F [F ...]
 
  
Usage Report based on the last command
+
== Starting Code ==
 +
The first step for the assignment will be to accept the assignment using the invite code provided by your instructor. You will need to create a GitHub account to do this. (If you already have a GitHub account, you may use this).
  
positional arguments:
+
In your repository you will find a file called <b>duim.py</b>. This file contains starting code. You will complete the assignment inside duim.py. <b>Do not rename this file or the functions inside, unit tests will fail and you will lose marks!</b>
  F                    list of files to be processed
 
  
optional arguments:
+
Also in your repository you will find <b>checkA2.py</b>. You can use this check script to check your work.
  -h, --help            show this help message and exit
 
  -l {user,host}, --list {user,host}
 
                        generate user name or remote host IP from the given
 
                        files
 
  -r RHOST, --rhost RHOST
 
                        usage report for the given remote host IP
 
  -t {daily,monthly}, --type {daily,monthly}
 
                        type of report: daily or monthly
 
  -u USER, --user USER  usage report for the given user name
 
  -s, --seconds        return times in seconds
 
  -v, --verbose        turn on output verbosity
 
  
Copyright 2020 - Eric Brauer
+
== Permitted Modules ==
 +
<b><font color='blue'>Your python script is allowed to import only the <u>os, subprocess, argparse and sys</u> modules from the standard library.</font></b>
  
</pre>
+
== Required Functions ==
Replace the last line with your own full name.  
+
You will need to complete the functions inside the provided file called <code>duim.py</code>. The provided <code>checkA2.py</code> will be used to test these functions.
  
Compare the usage output you have now with the one above. There is one option missing, you will need to change the <code>argparse</code> function to implement it.
+
* <code>call_du_sub()</code> should take the target directory as an argument and return a list of strings returned by the command <b>du -d 1<target directory></b>.
 +
** Use subprocess.Popen.
 +
** '-d 1' specifies a <i>max depth</i> of 1. Your list shouldn't include files, just a list of subdirectories in the target directory.
 +
** Your list should <u>not</u> contain newline characters.
 +
* <code>percent_to_graph()</code> should take two arguments: percent and the total chars. It should return a 'bar graph' as a string.
 +
** Your function should check that the percent argument is a valid number between 0 and 100. It should fail if it isn't. You can <code>raise ValueError</code> in this case.
 +
** <b>total chars</b> refers to the total number of characters that the bar graph will be composed of. You can use equal signs <code>=</code> or any other character that makes sense, but the empty space <b>must be composed of spaces</b>, at least until you have passed the first milestone.
 +
** The string returned by this function should only be composed of these two characters. For example, calling <code>percent_to_graph(50, 10)</code> should return:
 +
    '=====    '
 +
<b>Please note that the '' characters should <u>not</u> be part of the output, they are here to indicate that this is a string!</b>
 +
* <code>create_dir_dict</code> should take a list as the argument, and should return a dictionary.
 +
** The list can be the list returned by <code>call_du_sub()</code>.
 +
** The dictionary that you return should have the full directory name as <i>key</i>, and the number of bytes in the directory as the <i>value</i>. This value should be an integer. For example, using the example of <b>/usr/local/lib</b>, the function would return:
 +
    {'/usr/local/lib/heroku': 164028, '/usr/local/lib/python2.7': 11072, ...}
  
You will that there is an 'args' object in a2_template.py. Once the <code>parse_command_args()</code> function is called, it will return an args object. The command line arguments will be stored as attributes of that object. <b>Do not use sys.argv to parse arguments.</b>
+
== Additional Functions ==
 +
You may create any other functions that you think appropriate, especially when you begin to build additional functionality. Part of your evaluation will be on how "re-usable" your functions are, and sensible use of arguments and return values.  
  
If there is only one file name provided at the command line, read the login/logout records from the contents of the given file. If the file name is "online", get the record on the system your script is being execute using the Linux command "last -iwF". The format of each line in the file should be the same as the output of 'last -Fiw'. Filter out incomplete login/logout record (hints: check for the number of fields in each record).
+
== Use of GitHub ==
 +
You will be graded partly on the quality of your Github commits. You may make as many commits as you wish, it will have no impact on your grade. The only exception to this is <b>assignments with very few commits.</b> These will receive low marks for GitHub use and may be flagged for possible academic integrity violations.
 +
<b><font color='blue'>Assignments that do not adhere to these requirements may not be accepted.</font></b>
  
If there is more than one file name provided, merge all the files together with the first one at the top and the last one at the bottom. Read and process the file contents in that order in your program.
+
Professionals generally follow these guidelines:
 +
* commit their code after every significant change,  
 +
* the code <i>should hopefully</i> run without errors after each commit, and
 +
* every commit has a descriptive commit message.
  
=== Header ===
+
After completing each function, make a commit and push your code.
  
All your Python codes for this assignment must be placed in a <font color='red'><b><u>single source file</u></b></font>. Please include the following declaration by <b><u>you</u></b> as the <font color='blue'><b>script level docstring</b></font> in your Python source code file (replace [Student_id] with your Seneca email user name, and "Student Name" with your own name):
+
After fixing a problem, make a commit and push your code.
  
<source>OPS435 Assignment 2 - Summer 2020
+
<b><u>GitHub is your backup and your proof of work.</u></b>  
Program: a2_[seneca_id].py
 
Author: "Student Name"
 
The python code in this file a2_[seneca_id].py is original work written by
 
"Student Name". No code in this file is copied from any other source
 
including any person, textbook, or on-line resource except those provided
 
by the course instructor. I have not shared this python file with anyone
 
or anything except for submission for grading. 
 
I understand that the Academic Honesty Policy will be enforced and violators
 
will be reported and appropriate action will be taken.
 
</source>
 
  
=== Use of Github ===
+
These guidelines are not always possible, but you will be expected to follow these guidelines as much as possible. Break your problem into smaller pieces, and work iteratively to solve each small problem. Test your code after each small change you make, and address errors as soon as they arise. It will make your life easier!
You will once again be graded partly on <b>correct use of version control</b>, that is use of numerous commits with sensible commit messages. In professional practice, this is critically important for the timely delivery of code. You will be expected to use:
 
<ol>
 
<li><code>git add *.py</code>
 
<li><code>git commit -m "a message that describes the change"</code>
 
<li><code>git push</code>
 
  
after completing each step. There is no penalty for "too many commits", there is no such thing!
+
== Coding Standard ==
 +
Your python script must follow the following coding guide:
 +
* [https://www.python.org/dev/peps/pep-0008/ PEP-8 -- Style Guide for writing Python Code]
  
=== Suggested Process ===
+
=== Documentation ===
<ol>
+
There are three types of comments in programming and your assignment should contain each:
<li> Read the rest of this document, try and understand what is expected.
+
* The top-level docstring should contain information about what your script does. This is included in the duim.py file. <b>Please complete the top-level docstring.</b>
<li> Use the invite link posted to Blackboard to accept the assignment, and clone the repo to a Linux machine.
+
* Use Python's function docstrings to document how each of the functions work. The docstring should describe what each function does.
<li> Copy a2_template.py into a2_<myseneca_id>.py. Replace with your Myseneca username.
+
* Your script should also include in-line comments to explain anything that isn't immediately obvious to a beginner programmer. For these comments, it's always better to explain <i>why</i> your code is doing what it does rather than <i>what</i> it's doing. Also: <b><u>It is expected that you will be able to explain how each part of your code works in detail.</u></b>
<li> Run the script itself. Investigate argparse. Experiment with the various options, particularly -v. Read the docs, what option must you implement? Go ahead and implement it. Test with print() for now. <b>Commit the change.</b>
 
<li> Investigate the `parse_user()` function, with the <code>usage_data_file</code>. This should take the list of lines from the file, and instead return a list of usernames. <b>Commit the change.</b>
 
<li> Use argparse with `-l user` `usage__data_file` to call the `parse_user()` function. <b>Commit the change.</b>
 
<li> Write a function to print the list from `parse_for_user()`. Now you have input -> processing -> output. <b>Continue committing these changes as your proceed.</b>
 
<li> Implement the same things as parse_for_user but for `parse_for_hosts`. Output should be sorted.  
 
<li> Compare your output with the output below.
 
<li> Write the `parse_for_daily()` function using the pseudocode given. This should be taking the list of lines from your file, and output a dictionary with start dates in DD/MM/YYYY format as the key and usage in seconds as the value.
 
<li> <code> {'01/01/1980': 1200, '02/01/1980': 2400, '03/01/1980': 2200} </code>
 
<li> Once your `parse_for_daily()` function works, call it with the argparse options, and display the contents.
 
<li> Write (or modify) a function to do the same for remote hosts.
 
<li> Implement the outputting of the duration in HH:MM:SS instead of seconds. It's recommended you write a function to take in seconds and return a string. Call this when the `-s` option is absent. Make sure this is working with remote hosts as well. You should now have x of y tests passing.
 
<li> Finally, implement the `--monthly` option. Create a new function and get it working. start with seconds, then duration and make sure it works with remote as well.
 
<li> Perform last checks and document your code. Write **why** your code is doing what it does, rather than **what** it's doing. You should have 100% of tests succeeding.
 
</ol>
 
  
=== Output Format ===
+
=== Authorship Declaration ===
The format of your log tables should be identical to the sample output below, in order to minimize test check error. The horizontal banner between title and data should be composed of equal signs (=), and be the length of the title string.
+
All your Python code for this assignment must be placed in the provided Python file called <b>duim.py</b>. <u>Do not change the name of this file.</u> Please complete the declaration <b><u>as part of the top-level docstring</u></b> in your Python source code file (replace "Student Name" with your own name).
List tables should need no extra formatting.
 
For daily/montly tables with two columns, The first column should be 10 characters long and be left-aligned.
 
The second column should be 15 characters long and be right-aligned.  
 
  
=== Sample Outputs ===
+
= Submission Guidelines and Process =
The following are the reports generated by the usage report script (ur.py) with the "usage_data_file" mentioned in the overview section. You can download the file [https://scs.senecac.on.ca/~raymond.chan/ops435/a2/usage_data_file here] to test your ur.py script.
 
==== User List ====
 
The following is the user list extracted from the usage_data_file created by the command:
 
<pre>
 
[eric@centos7 a2]$ ./a2.py -l user usage_data_file
 
</pre>
 
  
<pre>
+
== Clone Your Repo (ASAP) ==
User list for usage_data_file
+
The first step will be to clone the Assignment 2 repository. The invite link will be provided to you by your professor. <b>You will need a free GitHub account to complete this assignment.</b> If you already have an existing GitHub account, you may use it.
=============================
 
asmith
 
cwsmith
 
rchan
 
tsliu2
 
</pre>
 
  
==== Remote Host List ====
+
The repo will contain a check script, a README file, and the file where you will enter your code.
The following is the remote host list extracted from the usage_file_file created by the command:
 
<pre>
 
[eric@centos7 a2]$ ./a2.py -l host usage_data_file
 
</pre>
 
  
<pre>
+
== The First Milestone (due July 26) ==
Host list for usage_data_file
+
For the first milestone you will have two functions to complete.
=============================
+
* <code>call_du_sub</code> will take one argument and return a list. The argument is a target directory. The function will use <code>subprocess.Popen</code> to run the command <b>du -d l <target_directory></b>.  
10.40.105.130
+
* <code>percent_to_graph</code> will take two arguments and return a string.
10.40.91.236
 
10.40.91.247
 
10.43.115.162
 
</pre>
 
  
==== Daily Usage Report by User ====
+
In order to complete <code>percent_to_graph()</code>, it's helpful to know the equation for converting a number from one scale to another.
The following are Daily Usage Reports created for user rchan. The output can be displayed either in seconds:
 
<pre>
 
[eric@centos7 a2]$ ./a2.py -u rchan -t daily usage_data_file --seconds
 
</pre>
 
  
<pre>
+
[[File:Scaling-formula.png]]
Daily Usage Report for rchan
 
============================
 
Date                Usage
 
13/02/2018          1580
 
15/02/2018          1980
 
Total                3560
 
</pre>
 
  
...or by omitting the <code>--seconds</code> option, in HH:MM:SS format.  
+
In this equation, ``x`` refers to your input value percent and ``y`` will refer to the number of symbols to print. The max of percent is 100 and the min of percent is 0.
<pre>
+
Be sure that you are rounding to an integer, and then print that number of symbols to represent the percentage. The number of spaces that you print will be the inverse.
[eric@centos a2]$ ./a2.py -u rchan -t daily usage_data_file
 
</pre>
 
  
<pre>
+
Test your functions with the Python interpreter. Use <code>python3</code>, then:
Daily Usage Report for rchan
+
    import duim
============================
+
    duim.percent_to_graph(50, 10)
Date                Usage
 
13/02/2018      00:26:00
 
15/02/2018      00:33:00
 
Total            00:59:20
 
</pre>
 
It's recommended you get the seconds working first, then create a function to converts seconds to HH:MM:SS.
 
  
==== Daily Usage Report by Remote Host====
+
To test with the check script, run the following:
The following is a Daily Usage Report created for the Remote Host 10.40.105.103 by the command:
 
<pre>
 
[eric@centos7 a2]$ ./a2.py -r 10.40.105.130 -t daily usage_data_file -s
 
</pre>
 
  
<pre>
+
<code>python3 checkA2.py -f -v TestPercent</code>
Daily Usage Report for 10.40.105.130
 
====================================
 
Date            Usage
 
14/02/2018      10931
 
13/02/2018        7969
 
Total            18900
 
</pre>
 
  
Just as you did with <code>--user</code>, your script should also display the time in HH:MM:SS by omitting the <code>--seconds</code> option.
+
<code>python3 checkA2.py -f -v TestDuSub</code>
  
==== Monthly Usage Report by User ====
+
== Second Milestone (due August 2) ==
The following is a Monthly Usage Report created for user rchan by the command:
+
For the second milestone you will have two more functions to complete.
<pre>
+
* <code>create_dir_dict</code> will take your list from <code>call_du_sub</code> and return a dictionary.
[eric@centos7 a2]$ ./a2.py -u rchan -t monthly usage_data_file -s
+
** Every item in your list should create a key in your dictionary.
</pre>
+
** Your dictionary values should be a number of bytes.
  
<pre>
+
For example: <code>{'/usr/lib/local': 33400}</code>
Monthly Usage Report for rchan
+
==============================
+
** Again, test using your Python interpreter or the check script.
Date                Usage
 
02/2018              3560
 
Total                3560
 
</pre>
 
  
<pre>
+
To run the check script, enter the following:
[eric@centos7 a2]$ ./a2.py -u cwsmith -t monthly usage_data_file
 
</pre>
 
  
<pre>
+
<code>python3 checkA2.py -f -v TestDirDict</code>
Monthly Usage Report for cwsmith
 
================================
 
Date                Usage
 
02/2018          03:02:11
 
03/2018          00:38:00
 
Total            03:40:11
 
</pre>
 
  
==== Monthly Usage Report by Remote Host ====
+
You will be using a module in the standard library called <b>Argparse</b>. This will help handle more complex sets of options and arguments than simply using sys.argv.
The following is a Monthly Usage Report created for the remote host 10.40.105.130 by the command:
+
Refer to the argparse documentation to complete the <code>parse_command_args</code> function. At minimum, your assignment should handle the following options and arguments:
<pre>
 
[eric@centos7 a2]$ ./a2.py -r 10.40.105.130 -t monthly usage_data_file
 
</pre>
 
  
<pre>
+
* -h will print a usage message. This will automatically be created by argparse itself, you will not need to implement this. However, refer carefully to the sample output and ensure that your help message matches the required output.
Monthly Usage Report for 10.40.105.130
+
* -H will print file sizes in Human readable format. For example, 1024 bytes will be printed as 1K, 1024 kilobytes will be printed as 1M, and so on.
======================================
+
* -l <number> will set the maximum length of the bar graph. The default should be 20 character. This option will require an option argument that is an integer.
Date                Usage
+
* Your script will also require one positional argument which contains the target directory for scanning.
02/2018          05:15:00
 
Total            05:15:00
 
</pre>
 
  
As discussed before, this command should also accept the <code>--seconds</code> option.
+
Your assignment should be able to produce the following:
  
==== List Users With Verbose ====
+
<code><b>user@host ~ $ python3 duim.py -h</b></code>
Calling any of the previous commands with the <code>--verbose</code> option should cause the script to output more information:
 
 
<pre>
 
<pre>
[eric@centos7 a2]$ ./a2.py -l user usage_data_file -v
+
usage: duim.py [-h] [-H] [-l LENGTH] [target]
</pre>
 
  
<pre>
+
DU Improved -- See Disk Usage Report with bar charts
Files to be processed: ['usage_data_file']
 
Type of args for files <class 'list'>
 
User list for usage_data_file
 
=============================
 
asmith
 
cwsmith
 
rchan
 
tsliu2
 
</pre>
 
  
<pre>
+
positional arguments:
[eric@centos7 a2]$ ./a2.py -r 10.40.105.130 -t monthly usage_data_file -v
+
  target                The directory to scan.
</pre>
 
  
<pre>
+
optional arguments:
Files to be processed: ['usage_data_file']
+
  -h, --help            show this help message and exit
Type of args for files <class 'list'>
+
  -H, --human-readable  print sizes in human readable format (e.g. 1K 23M 2G)
usage report for remote host: 10.40.105.130
+
  -l LENGTH, --length LENGTH
usage report type: monthly
+
                        Specify the length of the graph. Default is 20.
Monthly Usage Report for 10.40.105.130
 
======================================
 
Date                Usage
 
02/2018          05:15:00
 
Total            05:15:00
 
</pre>
 
  
==== Daily Report From Online ====
+
Copyright 2021
Running the script with "online" as a file argument should call a subprocess.Popen object and run the command <code>last -Fiw</code>.
 
<pre>
 
[eric@mtrx-node06pd ~]$ ./a2.py -l user online
 
 
</pre>
 
</pre>
  
(Example Output from Matrix):
+
Use the following to test your code:
<pre>
 
User list for online
 
====================
 
aabbas28
 
aaddae1
 
aali309
 
aaljajah
 
aalves-staffa
 
aanees1
 
aarham
 
aassankanov
 
abalandin
 
abhaseen
 
abholay
 
acamuzcu
 
acchikoti
 
adas20
 
adeel.javed
 
...
 
</pre>
 
  
<pre>
+
<code>python3 checkA2.py -f -v TestArgs</code>
[eric@mtrx-node06pd ~]$ ./a2.py -u adas20 -t daily online
 
</pre>
 
  
<pre>
+
== Minimum Viable Product ==  
Daily Usage Report for abholay
+
Once you have achieved the Milestones, you will have to do the following to get a minimum viable product:
==============================
+
* In your <code>if __name__ == '__main__'</code> block, you will have to call the parse_command_args function. Experiment with print statements so that you understand how each option and argument are stored.
Date                Usage
+
** If the user has entered more than one argument, or their argument isn't a valid directory, print an error message.
16/07/2020      00:13:09
+
** If the user doesn't specify any target, use the current directory.
17/07/2020      00:08:59
+
* Call <code>call_du_sub</code> with the target directory.
Total            00:22:08
+
* Pass the return value from that function to <code>create_dir_dict</code>
 +
* You may wish to create one or more functions to do the following:
 +
** Use the total size of the target directory to calculate percentage.
 +
** For each subdirectory of target directory, you will need to calculate a percentage, using the total of the target directory.
 +
** Once you've calculated percentage, call <code>percent_to_graph</code> with a max_size of your choice.
 +
** For every subdirectory, print <i>at least</i> the percent, the bar graph, and the name of the subdirectory.
 +
** The target directory <b>should not</b> have a bar graph.
  
</pre>
+
== Additional Features ==
  
=== Detail Algorithm Document ===
+
After completing the above, you are expected to add some additional features. Some improvements you could make are:
Follow the standard computation procedure: input - process - ouput when creating the algorithm document for this assignment.
 
==== input ====
 
* get data (command line arguments/options) from the user using the functions provided by the argparse module
 
* according to the arguments/options given at the command line, take appropriate processing action.  
 
==== processing ====
 
* based on the file(s) specified, read the contents of each file and use appropriate objects to store it
 
* based on the command line arguments/options, process the data accordingly, which includes
 
** data preprocessing (split a multi-day record into single day record)
 
** record processing (preform required computation)
 
==== output ====
 
* output the required report based on the processed data
 
==== identify and select appropriate python objects and functions ====
 
The following python functions (to be created, you may have more) are useful in handling the following sub-tasks:
 
* reads login records from files and filters out unwanted records
 
* convert login records into proper python object type so that it can be processed using as much built-in functions as possible
 
* create functions which generate daily usage reports by user and/or by remote host
 
* create functions which generate monthly usage reports by user and/or by remote host
 
  
To  help you with this assignment, you should use the a2_template.py in the repository as a starting point in designing your own Python Usage Report script.
+
* Format the output in a way that is easy to read.
 +
* Add colour to the output.
 +
* Include files in the output.
 +
* Include a threshold, so that results that are less than a user-specified size get excluded from results.
 +
* Add more error checking, print a usage message to the user.
 +
* Accept more options from the user.
 +
* Sort the output by percentage, or by filename.
  
=== Python script coding and debugging ===
+
It is expected that the additional features you provided should be useful, non-trivial, they should not require super-user privileges and should not require the installation of additional packages to work. (ie: I shouldn't have to run pip to make your assignment work).
For each function, identify what type of objects should be passed to the function, and what type of objects should be returned to the caller.
 
Once you have finished coding a function, you should start a Python3 interactive shell, import your functions and manually test each function and verify its correctness.
 
=== Final Test===
 
Once you have all the individual function tested and that each is working properly, perform the final test with test data provided by your professor and verify that your script produces the correct results before submitting your python program on Blackboard. Upload all the files for this assignment 2 to your vm in myvmlab and perform the final test.
 
  
 +
== The Assignment (due August 6, 11:59pm) ==
 +
* Be sure to make your final commit before the deadline. Don't forget to also use <code>git push</code> to push your code into the online repository!
 +
* Then, copy the contents of your <b>duim.py</b> file into a Word document, and submit it to Blackboard. <i>I will use GitHub to evaluate your deadline, but submitting to Blackboard tells me that you wish to be evaluated.</i>
  
== Rubric ==
+
= Rubric =
  
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
! Task !!  Maximum mark !! Actual mark
 
! Task !!  Maximum mark !! Actual mark
 
|-
 
|-
| Algorithm Submission || 10 ||
+
| Program Authorship Declaration || 5 ||
 
|-
 
|-
| Check Script Results || 30 ||
+
| required functions design || 5 ||
 
|-
 
|-
| Additional Check: 'online' || 5 ||
+
| required functions readability || 5 ||
 
|-
 
|-
| GitHub Use || 15 ||
+
| main loop design || 10 ||
 
|-
 
|-
| List Functions || 5 ||  
+
| main loop readability || 10 ||
 
|-
 
|-
| Daily/Monthly Functions || 10 ||
+
| output function design || 5 ||
 
|-
 
|-
| Output Functions || 5 ||  
+
| output function readability || 5 ||
 
|-
 
|-
| Other Functions || 5 ||
+
| additional features implemented || 20 ||
 
|-
 
|-
| Overall Design/Coherence || 10 ||
+
| docstrings and comments || 5 ||
 
|-
 
|-
| Documentation || 5 ||
+
| First Milestone || 10 ||
 
|-
 
|-
| '''Total''' || 100 ||  
+
| Second Milestone || 10 ||
 +
|-
 +
| github.com repository: Commit messages and use || 10 ||
 +
|-
 +
|'''Total''' || 100 ||  
 +
 
 
|}
 
|}
  
== Submission ==
+
= Due Date and Final Submission requirement =
* Stage 1: Submit your algorithm document file to Blackboard by July 31, 2020.
+
 
* Stage 2: Use commits to push your python script for this assignment to Github.com. The final state of your repository will be looked at on August 14, 2020 at 9:00 PM.
+
Please submit the following files by the due date:
* Stage 3: Copy your python script into a Word document and submit to Blackboard by August 14, 2020 at 9:00 PM.
+
* [ ] your python script, named as 'duim.py', in your repository, and also '''submitted to Blackboard''', by August 6 at 11:59pm.

Latest revision as of 09:56, 15 July 2021

Overview: du Improved

du is a tool for inspecting directories. It will return the contents of a directory along with how much drive space they are using. However, it can be parse its output quickly, as it usually returns file sizes as a number of bytes:

user@host ~ $ du --max-depth 1 /usr/local/lib

164028	/usr/local/lib/heroku
11072	/usr/local/lib/python2.7
92608	/usr/local/lib/node_modules
8	/usr/local/lib/python3.8
267720	/usr/local/lib

You will therefore be creating a tool called duim (du improved). Your script will call du and return the contents of a specified directory, and generate a bar graph for each subdirectory. The bar graph will represent the drive space as percent of the total drive space for the specified directory. An example of the finished code your script might produce is this:

user@host ~ $ ./duim.py -H /usr/local/lib

 61 % [============        ] 160.2 M	/usr/local/lib/heroku
  4 % [=                   ] 10.8 M	/usr/local/lib/python2.7
 34 % [=======             ] 90.4 M	/usr/local/lib/node_modules
  0 % [                    ] 8.0 K	/usr/local/lib/python3.8
Total: 261.4 M 	 /usr/local/lib

Notice that total size of the target directory (/usr/local/lib) is around 260 Megabytes. Of that 260 Megabytes, 160 Megabytes can be found in the heroku subdirectory.

160 MB represents 61% of the total 160 MB. The percentages don't have to add up to 100%, since with these arguments we are excluding files in the target directory. You may choose to add an option to your script to print files as well.

The bar chart in this example is 20 characters long, but this must be dynamic. The 20 characters does not the square brackets. The resolution of the bar chart must become more accurate as you increase the total size. For example, if the user specifies a length of 100 total characters, in this example 61 of those characters would be equal signs and 39 would be spaces.

The output of each subdirectory should include percentage, size in bytes (or Human readable if the user uses the -H option), the bar chart and the name of the subdirectory. Specific formatting of the final output will be up to you, but should be formatted in such a way that the output is easy to read. (ie. use columns!)

You will be required to fulfill some specific requirements before completing your script. Read on...

Assignment Requirements

Starting Code

The first step for the assignment will be to accept the assignment using the invite code provided by your instructor. You will need to create a GitHub account to do this. (If you already have a GitHub account, you may use this).

In your repository you will find a file called duim.py. This file contains starting code. You will complete the assignment inside duim.py. Do not rename this file or the functions inside, unit tests will fail and you will lose marks!

Also in your repository you will find checkA2.py. You can use this check script to check your work.

Permitted Modules

Your python script is allowed to import only the os, subprocess, argparse and sys modules from the standard library.

Required Functions

You will need to complete the functions inside the provided file called duim.py. The provided checkA2.py will be used to test these functions.

  • call_du_sub() should take the target directory as an argument and return a list of strings returned by the command du -d 1<target directory>.
    • Use subprocess.Popen.
    • '-d 1' specifies a max depth of 1. Your list shouldn't include files, just a list of subdirectories in the target directory.
    • Your list should not contain newline characters.
  • percent_to_graph() should take two arguments: percent and the total chars. It should return a 'bar graph' as a string.
    • Your function should check that the percent argument is a valid number between 0 and 100. It should fail if it isn't. You can raise ValueError in this case.
    • total chars refers to the total number of characters that the bar graph will be composed of. You can use equal signs = or any other character that makes sense, but the empty space must be composed of spaces, at least until you have passed the first milestone.
    • The string returned by this function should only be composed of these two characters. For example, calling percent_to_graph(50, 10) should return:
   '=====     '

Please note that the characters should not be part of the output, they are here to indicate that this is a string!

  • create_dir_dict should take a list as the argument, and should return a dictionary.
    • The list can be the list returned by call_du_sub().
    • The dictionary that you return should have the full directory name as key, and the number of bytes in the directory as the value. This value should be an integer. For example, using the example of /usr/local/lib, the function would return:
   {'/usr/local/lib/heroku': 164028, '/usr/local/lib/python2.7': 11072, ...}

Additional Functions

You may create any other functions that you think appropriate, especially when you begin to build additional functionality. Part of your evaluation will be on how "re-usable" your functions are, and sensible use of arguments and return values.

Use of GitHub

You will be graded partly on the quality of your Github commits. You may make as many commits as you wish, it will have no impact on your grade. The only exception to this is assignments with very few commits. These will receive low marks for GitHub use and may be flagged for possible academic integrity violations. Assignments that do not adhere to these requirements may not be accepted.

Professionals generally follow these guidelines:

  • commit their code after every significant change,
  • the code should hopefully run without errors after each commit, and
  • every commit has a descriptive commit message.

After completing each function, make a commit and push your code.

After fixing a problem, make a commit and push your code.

GitHub is your backup and your proof of work.

These guidelines are not always possible, but you will be expected to follow these guidelines as much as possible. Break your problem into smaller pieces, and work iteratively to solve each small problem. Test your code after each small change you make, and address errors as soon as they arise. It will make your life easier!

Coding Standard

Your python script must follow the following coding guide:

Documentation

There are three types of comments in programming and your assignment should contain each:

  • The top-level docstring should contain information about what your script does. This is included in the duim.py file. Please complete the top-level docstring.
  • Use Python's function docstrings to document how each of the functions work. The docstring should describe what each function does.
  • Your script should also include in-line comments to explain anything that isn't immediately obvious to a beginner programmer. For these comments, it's always better to explain why your code is doing what it does rather than what it's doing. Also: It is expected that you will be able to explain how each part of your code works in detail.

Authorship Declaration

All your Python code for this assignment must be placed in the provided Python file called duim.py. Do not change the name of this file. Please complete the declaration as part of the top-level docstring in your Python source code file (replace "Student Name" with your own name).

Submission Guidelines and Process

Clone Your Repo (ASAP)

The first step will be to clone the Assignment 2 repository. The invite link will be provided to you by your professor. You will need a free GitHub account to complete this assignment. If you already have an existing GitHub account, you may use it.

The repo will contain a check script, a README file, and the file where you will enter your code.

The First Milestone (due July 26)

For the first milestone you will have two functions to complete.

  • call_du_sub will take one argument and return a list. The argument is a target directory. The function will use subprocess.Popen to run the command du -d l <target_directory>.
  • percent_to_graph will take two arguments and return a string.

In order to complete percent_to_graph(), it's helpful to know the equation for converting a number from one scale to another.

Scaling-formula.png

In this equation, ``x`` refers to your input value percent and ``y`` will refer to the number of symbols to print. The max of percent is 100 and the min of percent is 0. Be sure that you are rounding to an integer, and then print that number of symbols to represent the percentage. The number of spaces that you print will be the inverse.

Test your functions with the Python interpreter. Use python3, then:

   import duim
   duim.percent_to_graph(50, 10)

To test with the check script, run the following:

python3 checkA2.py -f -v TestPercent

python3 checkA2.py -f -v TestDuSub

Second Milestone (due August 2)

For the second milestone you will have two more functions to complete.

  • create_dir_dict will take your list from call_du_sub and return a dictionary.
    • Every item in your list should create a key in your dictionary.
    • Your dictionary values should be a number of bytes.

For example: {'/usr/lib/local': 33400}

    • Again, test using your Python interpreter or the check script.

To run the check script, enter the following:

python3 checkA2.py -f -v TestDirDict

You will be using a module in the standard library called Argparse. This will help handle more complex sets of options and arguments than simply using sys.argv. Refer to the argparse documentation to complete the parse_command_args function. At minimum, your assignment should handle the following options and arguments:

  • -h will print a usage message. This will automatically be created by argparse itself, you will not need to implement this. However, refer carefully to the sample output and ensure that your help message matches the required output.
  • -H will print file sizes in Human readable format. For example, 1024 bytes will be printed as 1K, 1024 kilobytes will be printed as 1M, and so on.
  • -l <number> will set the maximum length of the bar graph. The default should be 20 character. This option will require an option argument that is an integer.
  • Your script will also require one positional argument which contains the target directory for scanning.

Your assignment should be able to produce the following:

user@host ~ $ python3 duim.py -h

usage: duim.py [-h] [-H] [-l LENGTH] [target]

DU Improved -- See Disk Usage Report with bar charts

positional arguments:
  target                The directory to scan.

optional arguments:
  -h, --help            show this help message and exit
  -H, --human-readable  print sizes in human readable format (e.g. 1K 23M 2G)
  -l LENGTH, --length LENGTH
                        Specify the length of the graph. Default is 20.

Copyright 2021

Use the following to test your code:

python3 checkA2.py -f -v TestArgs

Minimum Viable Product

Once you have achieved the Milestones, you will have to do the following to get a minimum viable product:

  • In your if __name__ == '__main__' block, you will have to call the parse_command_args function. Experiment with print statements so that you understand how each option and argument are stored.
    • If the user has entered more than one argument, or their argument isn't a valid directory, print an error message.
    • If the user doesn't specify any target, use the current directory.
  • Call call_du_sub with the target directory.
  • Pass the return value from that function to create_dir_dict
  • You may wish to create one or more functions to do the following:
    • Use the total size of the target directory to calculate percentage.
    • For each subdirectory of target directory, you will need to calculate a percentage, using the total of the target directory.
    • Once you've calculated percentage, call percent_to_graph with a max_size of your choice.
    • For every subdirectory, print at least the percent, the bar graph, and the name of the subdirectory.
    • The target directory should not have a bar graph.

Additional Features

After completing the above, you are expected to add some additional features. Some improvements you could make are:

  • Format the output in a way that is easy to read.
  • Add colour to the output.
  • Include files in the output.
  • Include a threshold, so that results that are less than a user-specified size get excluded from results.
  • Add more error checking, print a usage message to the user.
  • Accept more options from the user.
  • Sort the output by percentage, or by filename.

It is expected that the additional features you provided should be useful, non-trivial, they should not require super-user privileges and should not require the installation of additional packages to work. (ie: I shouldn't have to run pip to make your assignment work).

The Assignment (due August 6, 11:59pm)

  • Be sure to make your final commit before the deadline. Don't forget to also use git push to push your code into the online repository!
  • Then, copy the contents of your duim.py file into a Word document, and submit it to Blackboard. I will use GitHub to evaluate your deadline, but submitting to Blackboard tells me that you wish to be evaluated.

Rubric

Task Maximum mark Actual mark
Program Authorship Declaration 5
required functions design 5
required functions readability 5
main loop design 10
main loop readability 10
output function design 5
output function readability 5
additional features implemented 20
docstrings and comments 5
First Milestone 10
Second Milestone 10
github.com repository: Commit messages and use 10
Total 100

Due Date and Final Submission requirement

Please submit the following files by the due date:

  • [ ] your python script, named as 'duim.py', in your repository, and also submitted to Blackboard, by August 6 at 11:59pm.