Difference between revisions of "Tutorial10: Shell Scripting - Part 1"

From CDOT Wiki
Jump to: navigation, search
(Tutorial Reference Material)
 
(723 intermediate revisions by the same user not shown)
Line 1: Line 1:
=INTRODUCTION TO SHELL SCRIPTING=
+
=USING SED & AWK UTILTIES=
 
<br>
 
<br>
 
===Main Objectives of this Practice Tutorial===
 
===Main Objectives of this Practice Tutorial===
  
:* Understand the process for planning prior to writing a shell script.
+
:* Use the '''sed''' command to '''manipulate text''' contained in a file.
  
:* Understand the purpose of a she-bang line contained at the top of a shell script.
+
:* List and explain several '''addresses''' and '''instructions''' associated with the '''sed''' command.
  
:* Setting permissions for a shell script and properly execute a shell script.  
+
:* Use the '''sed''' command as a '''filter''' with Linux pipeline commands.
  
:* Understand and use environment and user-defined variables within a shell script.
+
:* Use the '''awk''' command to '''manipulate text''' contained in a file.
  
:* Understand the purpose of control flow statements used with shell scripts.
+
:* List and explain '''comparison operators''', '''variables''' and '''actions''' associated with the '''awk''' command.
  
:* Use the test command to test various conditions.
+
:* Use the '''awk''' command as a '''filter''' with Linux pipeline commands.
 
+
<br><br>
:* Use the if logic statement and the for loop statement within shell scripts.
 
  
 
===Tutorial Reference Material===
 
===Tutorial Reference Material===
Line 31: Line 30:
 
|- valign="top" style="padding-left:15px;"
 
|- valign="top" style="padding-left:15px;"
  
|colspan="2" |Course Notes:<ul><li>[https://ict.senecacollege.ca/~murray.saul/uli101/ULI101-Week10.pdf PDF] | [https://ict.senecacollege.ca/~murray.saul/uli101/ULI101-Week10.pptx PPTX]</li></ul>
+
|colspan="2" |'''Slides''':<ul><li>Week 11 Lecture 1 Notes:<br> [[Media:ULI101-Week11.1.pdf | PDF]] | [https://matrix.senecacollege.ca/~chris.johnson/ULI101/ULI101-Week11.1.pptx PPTX]</li><li>Week 11 Lecture 2 Notes:<br> [[Media:ULI101-Week11.2.pdf | PDF]] | [https://matrix.senecacollege.ca/~jason.carman/slides/ULI101-Week11.2.pptx PPTX] <br></li></ul>
  
  
|  style="padding-left:15px;" |Shell Scripting
+
|  style="padding-left:15px;" |'''Text Manipulation:'''
* [https://searchdatacenter.techtarget.com/definition/shell-script Purpose]
+
* [https://www.digitalocean.com/community/tutorials/the-basics-of-using-the-sed-stream-editor-to-manipulate-text-in-linux Purpose of using the sed utility]
* [https://www.youtube.com/watch?v=cQepf9fY6cE Creating and Running a Shell Script]<br>
+
* [https://www.digitalocean.com/community/tutorials/how-to-use-the-awk-language-to-manipulate-text-in-linux Purpose of using the awk utility]
Variables
 
* Environment
 
* User Defined
 
  
|  style="padding-left:15px;"|Control Flow Statements
+
|  style="padding-left:15px;" |'''Commands:'''
* Purpose
+
* [https://man7.org/linux/man-pages/man1/sed.1p.html sed]
* test command
+
* [https://man7.org/linux/man-pages/man1/awk.1p.html awk]
* if statement
 
* for loop
 
  
|colspan="1" style="padding-left:15px;" width="30%"|Brauer Instructional Videos:<ul><li>[https://www.youtube.com/watch?v=kxEP-KUhOSg&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=5 Introduction to Shell Scripting]</li><li>[https://www.youtube.com/watch?v=XVTwbINXnk4&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=6 Using Variables and Control Flow Statements in Shell Scripting]</ul>
+
 
 +
|colspan="1" style="padding-left:15px;" width="30%"|'''Brauer Instructional Videos:'''<ul><li>[https://www.youtube.com/watch?v=npU6S61AIko&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=14 Using the sed Utility]</li><li>[https://www.youtube.com/watch?v=OV3XzjDYgJo&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=13 Using the awk Utility]</ul>
 
|}
 
|}
  
 
= KEY CONCEPTS =
 
= KEY CONCEPTS =
  
A shell script is a computer program designed to be run by the Unix shell, a command-line interpreter.<br> The various dialects of shell scripts are considered to be scripting languages.
 
  
Reference:  https://en.wikipedia.org/wiki/Shell_script
+
===Using the sed Utility===
  
===Creating & Executing Shell Scripts===
 
  
[[Image:ipso.png|thumb|right|500px|An IPSO Diagram (INPUT, PROCESSING, STORAGE, OUTPUT) can be used to map-out and then list the sequence of steps to assist when coding your shell script.]]
+
'''Usage:'''
It is recommended to '''plan''' out on a piece of paper the purpose of the shell script.<br>You can do this by creating a simple '''IPSO''' diagram (stands for '''INPUT''', '''PROCESSING''', '''STORAGE''', '''OUTPUT''').
 
  
First, list the INPUTS into the script (eg. prompting user for data, reading data from file, etc), then listing the expected OUTPUTS from the script. You can then list the steps to process the INPUT to provide the OUTPUT (including file storage).
+
'''<span style="color:blue;font-weight:bold;font-family:courier;">Syntax:  sed [-n] 'address instruction' filename</span>'''
  
Once you have planned your shell script by listing the sequence of steps in your script, you need to create a file (using a '''text editor''') that will contain your Linux commands.<br>'''NOTE:'''  Avoid using filenames of already existing Linux Commands to avoid confusion. Using shell script filenames that include the file extension of the shell that the script will run within is recommended.
 
  
'''Using a Shebang Line'''
+
'''How it Works:'''
  
[[Image:shebang.png|thumb|right|200px|The '''shebang line''' <u>must</u> appear on the '''first line''' and at the '''beginning''' of the shell script.]]If you are learning Bash scripting by reading other people’s code you might have noticed<br>that the first line in the scripts starts with the #! characters and the path to the Bash interpreter.
+
* The sed command reads all lines in the input file and will be exposed to the expression<br>(i.e. area contained within quotes) one line at a time.
This sequence of characters (#!) is called '''shebang''' and is used to tell the operating system<br>which interpreter to use to parse the rest of the file. Reference: https://linuxize.com/post/bash-shebang/
+
* The expression can be within single quotes or double quotes.
 +
* The expression contains an address (match condition) and an instruction (operation).
 +
* If the line matches the address, then it will perform the instruction.
 +
* Lines will display be default unless the '''–n''' option is used to suppress default display
 +
<br>
 +
'''Address:'''
  
The '''shebang line''' <u>must</u> appear on the '''first line''' and at the '''beginning''' of the shell script,<br>otherwise, it will be treated as a regular comment and ignored.
+
* Can use a line number, to select a specific line (for example: '''5''')
 +
* Can specify a range of line numbers (for example: '''5,7''')
 +
* Regular expressions are contained within forward slashes (e.g. /regular-expression/)
 +
* Can specify a regular expression to select all lines that match a pattern  (e.g '''/^[0-9].*[0-9]$/''')
 +
* If NO address is present, the instruction will apply to ALL lines
  
'''Setting Permissions &amp; Running a Shell Script'''
 
  
To run your shell script by name, you need to assign execute permissions for the user.<br>To run the shell script, you can execute it using a relative, absolute, or relative-to-home pathname
+
[[Image:sed.png|right|500px|]]
 +
'''Instruction:'''
 +
*'''Action''' to take for matched line(s)
 +
*Refer to table on right-side for list of some<br>'''common instructions''' and their purpose
 +
<br><br>
  
'''Example:<br><br><span style="font-family:courier;">chmod u+x myscript.bash<br>./myscript.bash<br>/home/username/myscript.bash<br>~/myscript.bash</span>
+
===Using the awk Utility===
'''
+
  
===Using Variables in Shell Scripts===
+
'''Usage:'''
  
'''Definition'''
+
<span style="color:blue;font-weight:bold;font-family:courier;">awk [-F] 'selection-criteria {action}’ file-name</span>
  
'''Variables''' are used to '''store information''' to be referenced and manipulated in a computer program.<br>They also provide a way of labeling data with a descriptive name, so our programs can be understood<br>more clearly by the reader and ourselves.<br>'''Reference:''' https://launchschool.com/books/ruby/read/variables
 
  
 +
'''How It Works:'''
  
'''Environment Variables'''
+
* The '''awk''' command reads all lines in the input file and will be exposed to the expression (contained within quotes) for processing.
 +
*The '''expression''' (contained in quotes) represents '''selection criteria''',  and '''action''' to execute contained within braces '''{}'''
 +
* if selection criteria is matched, then action (between braces) is executed.
 +
* The '''–F''' option can be used to specify the default '''field delimiter''' (separator) character<br>eg. '''awk –F”;”'''  (would indicate a semi-colon delimited input file).
 +
<br>
 +
'''Selection Criteria'''
  
[[Image:environment.png|thumb|right|500px|Examples of using '''Environment''' and '''User Defined''' variables.]]Shell environment variables shape the working environment whenever you are logged in Common shell. Some of these variables are displayed via Linux commands in the diagram displayed on the right-side.<br>(you can issue the pipeline command '''set | more''' to view all variables)
+
* You can use a regular expression, enclosed within slashes, as a pattern. For example: '''/pattern/'''
 +
* The ~ operator tests whether a field or variable matches a regular expression. For example:  '''$1 ~ /^[0-9]/'''
 +
* The '''!~''' operator tests for no match. For example: '''$2 !~ /line/'''
 +
* You can perform both numeric and string comparisons using relational operators ( '''>''' , '''>=''' , '''<''' , '''<=''' , '''==''' , '''!=''' ).
 +
* You can combine any of the patterns using the Boolean operators '''||''' (OR) and '''&&''' (AND).
 +
* You can use built-in variables (like NR or "record number" representing line number) with comparison operators.<br>For example: '''NR >=1 && NR <= 5'''  
 +
<br>
 +
'''Action (execution):'''
  
Placing a dollar sign ('''$''') prior to the variable name will cause the variable to expand to the value contained in the variable.
+
* Action to be executed is contained within braces '''{}'''
 +
* The '''print''' command can be used to display text (fields).
 +
* You can use parameters which represent fields within records (lines) within the expression of the awk utility.
 +
* The parameter '''$0''' represents all of the fields contained in the record (line).
 +
* The parameters '''$1''', '''$2''', '''$3''' … '''$9''' represent the first, second and third  to the 9th fields contained within the record.
 +
* Parameters greater than nine requires the value of the parameter to be placed within braces (for example:  '''${10}''','''${11}''','''${12}''', etc.)
 +
* You can use built-in '''variables''' (such as '''NR''' or "record number" representing line number)<br>eg. '''{print NR,$0}'''  (will print record number, then entire record).
  
 +
=INVESTIGATION 1: USING THE SED UTILITY=
  
'''User Defined Variables'''
+
<span style="color:red;">'''ATTENTION''': Effective '''May 9, 2022''' - this online tutorial will be required to be completed by '''Friday in week 11 by midnight'''<br>to obtain a grade of '''2%''' towards this course</span><br><br>
  
'''User-defined variables''' are variables which can be '''created by the user''' and exist in the session. This means that no one can access user-defined variables that have been set by another user,<br>and when the session is closed these variables expire.<br>'''Reference:''' https://mariadb.com/kb/en/user-defined-variables/
+
In this investigation, you will learn how to manipulate text using the '''sed''' utility.
  
Data can be stored and removed within a variable using an equal sign.<br>The '''read''' command can be used to prompt the user to enter data into a variable.<br>Refer to the diagram on the right-side to see how user-defined variables are assigned data.
 
  
'''Positional Parameters and Special Parameters'''
+
'''Perform the Following Steps:'''
  
[[Image:positional.png|thumb|right|220px|Examples of using '''positional''' and '''special''' parameters.]]A '''positional parameter''' is a variable within a shell program; its value is set from an argument specified on the command line that invokes the program.
+
# '''Login''' to your matrix account and confirm you are located in your '''home''' directory.<br><br>
Positional parameters are numbered and are referred to with a preceding "'''$'''": '''$1''', '''$2''', '''$3''', and so on. The positional parameter $0 refers to either the name of shell where command was issued, or name of shell script being executed. If using '''positional parameters''' greater than '''9''', then you need to include number within braces.<br>Examples: '''echo ${10}''', '''ls ${23}'''
+
# Issue a Linux command to create a directory called '''sed'''<br><br>
 +
# Issue a Linux command to <u>change</u> to the '''sed''' directory and confirm that you are located in the '''sed''' directory.<br><br>
 +
# Issue the following Linux command to download the data.txt file<br>('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/data.txt</nowiki></span><br><br>
 +
# Issue the '''more''' command to quickly view the contents of the '''data.txt''' file.<br>When finished, exit the more command by pressing the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span>[[Image:sed-1.png|thumb|right|300px|Issuing the '''p''' instruction without using the '''-n''' option (to suppress original output) will display lines twice.]]<br><br>The '''p''' instruction with the '''sed''' command is used to<br>'''print''' (i.e. ''display'') the contents of a text file.<br><br>
 +
# Issue the following Linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed 'p' data.txt</span><br><br>'''NOTE: You should notice that each line appears twice'''.<br><br>The reason why standard output appears twice is that the sed command<br>(without the '''-n option''') displays all lines regardless of an address used.<br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files<br>for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br>
 +
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n 'p' data.txt | tee sed-1.txt</span><br><br>What do you notice? You should see only one line.<br><br>You can specify an '''address''' to display lines using the sed utility<br>(eg. ''line #'', '''line #s''' or range of '''line #s''').<br><br>
 +
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '1 p' data.txt | tee sed-2.txt</span><br><br>You should see the first line of the text file displayed.<br>What other command is used to only display the first line in a file?<br><br>[[Image:sed-2.png|thumb|right|500px|Using the sed command to display a '''range''' of lines.]]
 +
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '2,5 p' data.txt | tee sed-3.txt</span><br><br>What is displayed? How would you modify the sed command to display the line range 10 to 50?<br><br>The '''s''' instruction is used to '''substitute''' text<br>(a similar to method was demonstrated in the vi editor in tutorial 9).<br><br>
 +
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '2,5 s/TUTORIAL/LESSON/g' data.txt | tee sed-4.txt | more</span><br><br>What do you notice? View the original contents of lines 2 to 5 in the '''data.txt''' file<br>in another shell to confirm that the substitution occurred.<br><br>[[Image:sed-3.png|thumb|right|500px|Using the sed command with the '''-q''' option to display up to a line number, then quit.]]The '''q''' instruction terminates or '''quits''' the execution of the sed utility as soon as it is read in a particular line or matching pattern.<br><br>
 +
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '11 q' data.txt | tee sed-5.txt</span><br><br>What did you notice? How many lines were displayed<br>before the sed command exited?<br><br>You can use '''regular expressions''' to select lines that match a pattern. In fact,<br>the sed command was one of the <u>first</u> Linux commands that used regular expression.<br><br>The rules remain the same for using regular expressions as demonstrated in '''tutorial 9'''<br>except the regular expression must be contained within '''forward slashes'''<br>(eg. <span style="font-family:courier;font-weight:bold;">/regexp/</span> ).<br><br>[[Image:sed-4.png|thumb|right|400px|Using the sed command using regular expressions with '''anchors'''.]]
 +
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/^The/ p' data.txt | tee sed-6.txt</span><br><br>What do you notice?<br><br>
 +
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p' data.txt | tee sed-7.txt</span><br><br>What do you notice?<br><br>The '''sed''' utility can also be used as a '''filter''' to manipulate text that<br>was generated from Linux commands.<br><br>[[Image:sed-5.png|thumb|right|400px|Using the sed command with '''pipeline''' commands.]]
 +
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">who | sed -n '/^[a-m]/ p' | tee sed-8.txt | more</span><br><br>What did you notice?<br><br>
 +
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">ls | sed -n '/txt$/ p' | tee sed-9.txt</span><br><br>What did you notice?<br><br>
 +
# Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-1</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script<br>until you receive a congratulations message, then you can proceed.<br><br>
  
The '''shift''' command can be used with positional parameters to shift positional parameters<br>to the left by one or more positions.
+
:In the next investigation, you will learn how to manipulate text using the '''awk''' utility.<br><br>
  
There are a few ways to assign values as positional parameters:
+
=INVESTIGATION 2: USING THE AWK UTILITY =
:*Use the '''set''' command with the values as argument after the set command
 
:*Run a shell script containing arguments
 
  
 +
In this investigation, you will learn how to use the awk utility to manipulate text and generate reports.
  
There are a group of '''special parameters''' that can be used for shell scripting.<br>A few of these special parameters and their purpose are displayed below:<br>'''$*''' , '''“$*”''' , '''"$@"''' , '''$#''' , '''$?'''
+
'''Perform the Following Steps:'''
  
Refer to the diagram to the right for examples using positional and special parameters.
+
# Change to your '''home''' directory and issue a command to '''confirm'''<br>you are located in your ''home'' directory.<br><br>
 +
# Issue a Linux command to create a directory called '''awk'''<br><br>
 +
# Issue a Linux command to <u>change</u> to the '''awk''' directory and confirm you are located in the '''awk''' directory.<br><br>Let's download a database file that contains information regarding classic cars.<br><br>
 +
# Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars.txt</nowiki></span><br><br>
 +
# Issue the '''cat''' command to quickly view the contents of the '''cars.txt''' file.<br><br>The "'''print'''" action (command) is the <u>default</u> action of awk to print<br>all selected lines that match a '''pattern'''.<br><br>This '''action''' (contained in braces) can provide more options<br>such as printing '''specific fields''' of selected lines (or records) from a database.<br><br>[[Image:awk-1.png|thumb|right|400px|Using the awk command to display matches of the pattern '''ford'''.]]
 +
# Issue the following linux command all to display all lines (i.e. records) in the '''cars.txt''' database that matches the pattern (or "make") called '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/ {print}' cars.txt</span><br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br>
 +
# Issue the following linux pipeline command all to display records<br>in the '''cars.txt''' database that contain the pattern (i.e. make) '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/' cars.txt | tee awk-1.txt</span><br><br>What do you notice? You should notice ALL lines displayed <u>without</u> using '''search criteria'''.<br><br>You can use ''builtin'' '''variables''' with the '''print''' command for further processing.<br>We will discuss the following variables in this tutorial:<br><br>[[Image:awk-2.png|thumb|right|400px|Using the awk command to print search results by '''field number'''.]]'''$0''' - Current record (entire line)<br>'''$1''' - First field in record<br>'''$n''' - nth field in record<br>'''NR''' - Record Number (order in database)<br> '''NF''' - Number of fields in current record<br><br>For a listing of more variables, please consult your course notes.<br><br>
 +
# Issue the following linux pipeline command to display the '''model''', '''year''', '''quantity''' and price<br>in the '''cars.txt''' database for makes of '''chevy''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/chevy/ {print $2,$3,$4,$5}' cars.txt | tee awk-2.txt</span><br><br>Notice that a '''space''' is the delimiter for the fields that appear as standard output.<br><br>The '''tilde character''' '''~''' is used to search for a pattern or display standard output for a particular field.<br><br>
 +
# Issue the following linux pipeline command to display all '''plymouths''' ('''plym''')<br>by '''model name''', '''price''' and '''quantity''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /plym/ {print $2,$3,$4,$5}' cars.txt | tee awk-3.txt</span><br><br>You can also use '''comparison operators''' to specify conditions for processing with matched patterns<br>when using the awk command. Since they are used WITHIN the awk expression,<br>they are not confused with redirection symbols<br><br>[[Image:awk-3.png|thumb|right|400px|Using the awk command to display results based on '''comparison operators'''.]]'''<''' &nbsp;&nbsp;&nbsp;&nbsp;Less than<br>'''<=''' &nbsp;&nbsp;Less than or equal<br>'''>''' &nbsp;&nbsp;&nbsp;&nbsp;Greater than<br>'''>=''' &nbsp;&nbsp;Greater than or equal<br>'''==''' &nbsp;&nbsp;Equal<br>'''!=''' &nbsp;&nbsp;&nbsp;Not equal<br><br>
 +
# Issue the following linux pipeline command to display display the '''car make''', '''model''', '''quantity''' and '''price''' of all vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-4.txt</span><br><br>What do you notice?<br><br>
 +
# Issue the following linux pipeline command to display display '''price''',<br>'''quantity''', '''model''' and '''car make''' of vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $5,$4,$2,$1}' cars.txt | tee awk-5.txt</span><br><br>
 +
# Issue the following linux pipeline command to display the '''car make''',<br>'''year''' and '''quantity''' of cars that '''begin''' with the '''letter 'f'''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $1,$2,$4}' cars.txt | tee awk-6.txt</span><br><br>[[Image:awk-4.png|thumb|right|400px|Using the awk command to display combined search results based on '''compound operators'''.]]Combined pattern searches can be made<br>by using '''compound operator''' symbols:<br><br>'''&&''' &nbsp;&nbsp;&nbsp;&nbsp;(and)<br>'''||''' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)<br><br>
 +
# Issue the following linux pipeline command to list all '''fords'''<br>whose '''price is greater than $10,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /ford/ && $5 > 10000 {print $0}' cars.txt | tee awk-7.txt</span><br><br>
 +
# Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars2.txt</nowiki></span><br><br>
 +
# Issue the '''cat''' command to quickly view the contents of the '''cars2.txt''' file.<br><br>
 +
# Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-8.txt</span><br><br>What did you notice?<br><br>The problem is that the '''cars2.txt''' database separates each field by a semi-colon (''';''') <u>instead</u> of '''TAB'''.<br>Therefore, it does not recognize the second and fourth fields.<br><br>You need to issue awk with the -F option to indicate that this file's fields are separated (delimited) by a semi-colorn.<br><br>
 +
# Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk -F";" '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-9.txt</span><br><br>What did you notice this time?<br><br>
 +
# Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-2</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.<br><br>
  
===Using Control Flow Statements in Shell Scripts===
+
= LINUX PRACTICE QUESTIONS =
 
<table align="right"><tr valign="top"><td>[[Image:test-1.png|thumb|right|140px|Examples of simple comparisons using the test command.]]</td><td>[[Image:test-2.png|thumb|right|140px|Examples of using additional comparisons using the test command.]]</td></table>
 
Control Flow Statement are used to make your shell scripts more flexible and can adapt to changing situations.
 
  
The special parameter '''$?''' Is used to determine the exit status of the previously issued Linux command.
+
The purpose of this section is to obtain '''extra practice''' to help with '''quizzes''', your '''midterm''', and your '''final exam'''.
The exit status will either display a zero (representing TRUE) or a non-zero number (representing FALSE). This can be used to determined if a Linux command was correctly or incorrectly executed.
 
  
The '''test''' Linux command is used to test conditions to see if they are '''TRUE''' (i.e. value '''zero''') or '''FALSE''' (i.e. value '''non-zero''') so they can be used with control flow statements to control the sequence of a shell script.
+
Here is a link to the MS Word Document of ALL of the questions displayed below but with extra room to answer on the document to
 +
simulate a quiz:
  
You CANNOT use the '''>''' or '''<''' symbols when using the test command since these are redirection symbols. Instead, you need to use options when performing numerical comparisons.
+
https://ict.senecacollege.ca/~murray.saul/uli101/uli101_week11_practice.docx
Refer to the table below for test options and their purposes.
 
  
There are <u>other</u> comparison options that can be used with the test command such as testing to see if a regular file or directory pathname exists, or if the regular file pathname is –non-empty.
+
Your instructor may take-up these questions during class. It is up to the student to attend classes in order to obtain the answers to the following questions. Your instructor will NOT provide these answers in any other form (eg. e-mail, etc).
  
Refer to diagrams to the right involving the test command.
 
  
'''Logic Statements'''
+
'''Review Questions:'''
  
A '''logic statement''' is used to determine which Linux commands to be executed based<br>on the result of a condition (i.e. TRUE (zero value) or FALSE (non-zero value)).
+
'''Part A: Display Results from Using the sed Utility'''
  
<table align="right"><tr valign="top"><td>[[Image:logic-1.png|thumb|right|250px|Example of using the '''if''' logic control-flow statement.]]</td><td>[[Image:loop-1.png|thumb|right|250px|Example of using the '''for''' looping control-flow statement.]]</td></table>
+
Note the contents from the following tab-delimited file called '''~murray.saul/uli101/stuff.txt''':
 +
(this file pathname exists for checking your work)
  
There are several logic statements, but we will just concentrate on the if statement.
+
<pre>
<pre style="width:20%">
+
Line one.
if test condition
+
This is the second line.
  then
+
This is the third.
    command(s)
+
This is line four.
fi
+
Five.
 +
Line six follows
 +
Followed by 7
 +
Now line 8
 +
and line nine
 +
Finally, line 10
 
</pre>
 
</pre>
  
Refer to the diagram relating to logic statements on the right side for an example.
 
  
'''Loop Statements'''
+
Write the results of each of the following Linux commands for the above-mentioned file:
  
A '''loop statement''' is a series of steps or sequence of statements executed repeatedly zero or more times satisfying the given condition is satisfied.<br>Reference: https://www.chegg.com/homework-help/definitions/loop-statement-3
 
  
There are several loops, but we will look at the for loop using a list.
+
# <span style="font-family:courier;font-weight:bold">sed -n '3,6 p' ~murray.saul/uli101/stuff.txt</span><br><br>
 +
# <span style="font-family:courier;font-weight:bold">sed '4 q' ~murray.saul/uli101/stuff.txt</span><br><br>
 +
# <span style="font-family:courier;font-weight:bold">sed '/the/ d' ~murray.saul/uli101/stuff.txt</span><br><br>
 +
# <span style="font-family:courier;font-weight:bold">sed 's/line/NUMBER/g' ~murray.saul/uli101/stuff.txt</span>
  
<pre style="width:20%">
 
for item in list
 
do
 
    command(s)
 
done
 
</pre>
 
  
Refer to the diagram relating to looping statements on the right side for an example.
+
'''Part B: Writing Linux Commands Using the sed Utility'''
  
=INVESTIGATION 1: CREATING A SHELL SCRIPT=
+
Write a single Linux command to perform the specified tasks for each of the following questions.
  
<br>
 
In this section, you will learn how to ...
 
  
 +
# Write a Linux sed command to display only lines 5 to 9 for the file: '''~murray.saul/uli101/stuff.txt'''<br><br>
 +
# Write a Linux sed command to display only lines the begin the pattern “and” for the file: '''~murray.saul/uli101/stuff.txt'''<br><br>
 +
# Write a Linux sed command to display only lines that end with a digit for the file: '''~murray.saul/uli101/stuff.txt'''<br><br>
 +
# Write a Linux sed command to save lines that match the pattern “line” (upper or lowercase) for the file: '''~murray.saul/uli101/stuff.txt''' and save results (overwriting previous contents) to: '''~/results.txt'''<br><br>
  
  
 +
'''Part C: Writing Linux Commands Using the awk Utility'''
  
'''Perform the Following Steps:'''
+
Note the contents from the following tab-delimited file called '''~murray.saul/uli101/stuff.txt''':
 +
(this file pathname exists for checking your work)
  
# x<br><br>
+
<pre>
 +
Line one.
 +
This is the second line.
 +
This is the third.
 +
This is line four.
 +
Five.
 +
Line six follows
 +
Followed by 7
 +
Now line 8
 +
and line nine
 +
Finally, line 10
 +
</pre>
  
In the next investigation, you will ...<br><br>
 
  
=INVESTIGATION 2: USING VARIABLES IN SHELL SCRIPTS =
+
'''Write the results of each of the following Linux commands for the above-mentioned file:'''
  
In this section, you will learn how to ...
 
  
 +
# <span style="font-family:courier;font-weight:bold">awk ‘NR == 3 {print}’ ~murray.saul/uli101/stuff.txt</span><br><br>
 +
# <span style="font-family:courier;font-weight:bold">awk ‘NR >= 2 && NR <= 5 {print}’ ~murray.saul/uli101/stuff.txt</span><br><br>
 +
# <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $2}’ ~murray.saul/uli101/stuff.txt</span><br><br>
 +
# <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $3,$2}’ ~murray.saul/uli101/stuff.txt</span><br><br>
  
  
'''Perform the Following Steps:'''
+
'''Part D: Writing Linux Commands Using the awk Utility'''
 
 
# x<br><br>
 
 
 
In the next investigation, you will ...
 
 
 
=INVESTIGATION 3: USING CONTROL FLOW STATEMENTS IN SHELL SCRIPTS =
 
 
 
In this section, you will learn how to ...
 
 
 
 
 
 
 
'''Perform the Following Steps:'''
 
  
# x<br>
 
  
= LINUX PRACTICE QUESTIONS =
+
Write a single Linux command to perform the specified tasks for each of the following questions.
  
The purpose of this section is to obtain '''extra practice''' to help with '''quizzes''', your '''midterm''', and your '''final exam'''.
 
  
Here is a link to the MS Word Document of ALL of the questions displayed below but with extra room to answer on the document to
+
# Write a Linux awk command to display all records for the file: '''~/cars''' whose fifth field is greater than 10000.<br><br>
simulate a quiz:
+
# Write a Linux awk command to display the first and fourth fields for the file: '''~/cars''' whose fifth field begins with a number.<br><br>
 +
# Write a Linux awk command to display the second and third fields for the file: '''~/cars''' for records that match the pattern “chevy”.<br><br>
 +
# Write a Linux awk command to display the first and second fields for all the records contained in the file: '''~/cars'''<br><br>
  
https://ict.senecacollege.ca/~murray.saul/uli101/uli101_week10_practice.docx
 
 
Your instructor may take-up these questions during class. It is up to the student to attend classes in order to obtain the answers to the following questions. Your instructor will NOT provide these answers in any other form (eg. e-mail, etc).
 
 
 
'''Review Questions:'''
 
  
# x
 
# x
 
# x
 
# x
 
# x
 
# x
 
# x
 
# x
 
  
  
 
[[Category:ULI101]]
 
[[Category:ULI101]]

Latest revision as of 06:17, 29 April 2022

USING SED & AWK UTILTIES


Main Objectives of this Practice Tutorial

  • Use the sed command to manipulate text contained in a file.
  • List and explain several addresses and instructions associated with the sed command.
  • Use the sed command as a filter with Linux pipeline commands.
  • Use the awk command to manipulate text contained in a file.
  • List and explain comparison operators, variables and actions associated with the awk command.
  • Use the awk command as a filter with Linux pipeline commands.



Tutorial Reference Material

Course Notes
Linux Command/Shortcut Reference
YouTube Videos
Slides:


Text Manipulation: Commands:


Brauer Instructional Videos:

KEY CONCEPTS

Using the sed Utility

Usage:

Syntax: sed [-n] 'address instruction' filename


How it Works:

  • The sed command reads all lines in the input file and will be exposed to the expression
    (i.e. area contained within quotes) one line at a time.
  • The expression can be within single quotes or double quotes.
  • The expression contains an address (match condition) and an instruction (operation).
  • If the line matches the address, then it will perform the instruction.
  • Lines will display be default unless the –n option is used to suppress default display


Address:

  • Can use a line number, to select a specific line (for example: 5)
  • Can specify a range of line numbers (for example: 5,7)
  • Regular expressions are contained within forward slashes (e.g. /regular-expression/)
  • Can specify a regular expression to select all lines that match a pattern (e.g /^[0-9].*[0-9]$/)
  • If NO address is present, the instruction will apply to ALL lines


Sed.png

Instruction:

  • Action to take for matched line(s)
  • Refer to table on right-side for list of some
    common instructions and their purpose



Using the awk Utility

Usage:

awk [-F] 'selection-criteria {action}’ file-name


How It Works:

  • The awk command reads all lines in the input file and will be exposed to the expression (contained within quotes) for processing.
  • The expression (contained in quotes) represents selection criteria, and action to execute contained within braces {}
  • if selection criteria is matched, then action (between braces) is executed.
  • The –F option can be used to specify the default field delimiter (separator) character
    eg. awk –F”;” (would indicate a semi-colon delimited input file).


Selection Criteria

  • You can use a regular expression, enclosed within slashes, as a pattern. For example: /pattern/
  • The ~ operator tests whether a field or variable matches a regular expression. For example: $1 ~ /^[0-9]/
  • The !~ operator tests for no match. For example: $2 !~ /line/
  • You can perform both numeric and string comparisons using relational operators ( > , >= , < , <= , == , != ).
  • You can combine any of the patterns using the Boolean operators || (OR) and && (AND).
  • You can use built-in variables (like NR or "record number" representing line number) with comparison operators.
    For example: NR >=1 && NR <= 5


Action (execution):

  • Action to be executed is contained within braces {}
  • The print command can be used to display text (fields).
  • You can use parameters which represent fields within records (lines) within the expression of the awk utility.
  • The parameter $0 represents all of the fields contained in the record (line).
  • The parameters $1, $2, $3$9 represent the first, second and third to the 9th fields contained within the record.
  • Parameters greater than nine requires the value of the parameter to be placed within braces (for example: ${10},${11},${12}, etc.)
  • You can use built-in variables (such as NR or "record number" representing line number)
    eg. {print NR,$0} (will print record number, then entire record).

INVESTIGATION 1: USING THE SED UTILITY

ATTENTION: Effective May 9, 2022 - this online tutorial will be required to be completed by Friday in week 11 by midnight
to obtain a grade of 2% towards this course


In this investigation, you will learn how to manipulate text using the sed utility.


Perform the Following Steps:

  1. Login to your matrix account and confirm you are located in your home directory.

  2. Issue a Linux command to create a directory called sed

  3. Issue a Linux command to change to the sed directory and confirm that you are located in the sed directory.

  4. Issue the following Linux command to download the data.txt file
    (copy and paste to save time):
    wget https://ict.senecacollege.ca/~murray.saul/uli101/data.txt

  5. Issue the more command to quickly view the contents of the data.txt file.
    When finished, exit the more command by pressing the letter q
    Issuing the p instruction without using the -n option (to suppress original output) will display lines twice.


    The p instruction with the sed command is used to
    print (i.e. display) the contents of a text file.

  6. Issue the following Linux command:
    sed 'p' data.txt

    NOTE: You should notice that each line appears twice.

    The reason why standard output appears twice is that the sed command
    (without the -n option) displays all lines regardless of an address used.

    We will use pipeline commands to both display stdout to the screen and save to files
    for confirmation of running these pipeline commands when run a checking-script later in this investigation.

  7. Issue the following Linux pipeline command:
    sed -n 'p' data.txt | tee sed-1.txt

    What do you notice? You should see only one line.

    You can specify an address to display lines using the sed utility
    (eg. line #, line #s or range of line #s).

  8. Issue the following Linux pipeline command:
    sed -n '1 p' data.txt | tee sed-2.txt

    You should see the first line of the text file displayed.
    What other command is used to only display the first line in a file?

    Using the sed command to display a range of lines.
  9. Issue the following Linux pipeline command:
    sed -n '2,5 p' data.txt | tee sed-3.txt

    What is displayed? How would you modify the sed command to display the line range 10 to 50?

    The s instruction is used to substitute text
    (a similar to method was demonstrated in the vi editor in tutorial 9).

  10. Issue the following Linux pipeline command:
    sed '2,5 s/TUTORIAL/LESSON/g' data.txt | tee sed-4.txt | more

    What do you notice? View the original contents of lines 2 to 5 in the data.txt file
    in another shell to confirm that the substitution occurred.

    Using the sed command with the -q option to display up to a line number, then quit.
    The q instruction terminates or quits the execution of the sed utility as soon as it is read in a particular line or matching pattern.

  11. Issue the following Linux pipeline command:
    sed '11 q' data.txt | tee sed-5.txt

    What did you notice? How many lines were displayed
    before the sed command exited?

    You can use regular expressions to select lines that match a pattern. In fact,
    the sed command was one of the first Linux commands that used regular expression.

    The rules remain the same for using regular expressions as demonstrated in tutorial 9
    except the regular expression must be contained within forward slashes
    (eg. /regexp/ ).

    Using the sed command using regular expressions with anchors.
  12. Issue the following Linux pipeline command:
    sed -n '/^The/ p' data.txt | tee sed-6.txt

    What do you notice?

  13. Issue the following Linux pipeline command:
    sed -n '/d$/ p' data.txt | tee sed-7.txt

    What do you notice?

    The sed utility can also be used as a filter to manipulate text that
    was generated from Linux commands.

    Using the sed command with pipeline commands.
  14. Issue the following Linux pipeline command:
    who | sed -n '/^[a-m]/ p' | tee sed-8.txt | more

    What did you notice?

  15. Issue the following Linux pipeline command:
    ls | sed -n '/txt$/ p' | tee sed-9.txt

    What did you notice?

  16. Issue the following to run a checking script:
    ~uli101/week11-check-1

    If you encounter errors, make corrections and re-run the checking script
    until you receive a congratulations message, then you can proceed.

In the next investigation, you will learn how to manipulate text using the awk utility.

INVESTIGATION 2: USING THE AWK UTILITY

In this investigation, you will learn how to use the awk utility to manipulate text and generate reports.

Perform the Following Steps:

  1. Change to your home directory and issue a command to confirm
    you are located in your home directory.

  2. Issue a Linux command to create a directory called awk

  3. Issue a Linux command to change to the awk directory and confirm you are located in the awk directory.

    Let's download a database file that contains information regarding classic cars.

  4. Issue the following linux command (copy and paste to save time):
    wget https://ict.senecacollege.ca/~murray.saul/uli101/cars.txt

  5. Issue the cat command to quickly view the contents of the cars.txt file.

    The "print" action (command) is the default action of awk to print
    all selected lines that match a pattern.

    This action (contained in braces) can provide more options
    such as printing specific fields of selected lines (or records) from a database.

    Using the awk command to display matches of the pattern ford.
  6. Issue the following linux command all to display all lines (i.e. records) in the cars.txt database that matches the pattern (or "make") called ford:
    awk '/ford/ {print}' cars.txt

    We will use pipeline commands to both display stdout to the screen and save to files for confirmation of running these pipeline commands when run a checking-script later in this investigation.

  7. Issue the following linux pipeline command all to display records
    in the cars.txt database that contain the pattern (i.e. make) ford:
    awk '/ford/' cars.txt | tee awk-1.txt

    What do you notice? You should notice ALL lines displayed without using search criteria.

    You can use builtin variables with the print command for further processing.
    We will discuss the following variables in this tutorial:

    Using the awk command to print search results by field number.
    $0 - Current record (entire line)
    $1 - First field in record
    $n - nth field in record
    NR - Record Number (order in database)
    NF - Number of fields in current record

    For a listing of more variables, please consult your course notes.

  8. Issue the following linux pipeline command to display the model, year, quantity and price
    in the cars.txt database for makes of chevy:
    awk '/chevy/ {print $2,$3,$4,$5}' cars.txt | tee awk-2.txt

    Notice that a space is the delimiter for the fields that appear as standard output.

    The tilde character ~ is used to search for a pattern or display standard output for a particular field.

  9. Issue the following linux pipeline command to display all plymouths (plym)
    by model name, price and quantity:
    awk '$1 ~ /plym/ {print $2,$3,$4,$5}' cars.txt | tee awk-3.txt

    You can also use comparison operators to specify conditions for processing with matched patterns
    when using the awk command. Since they are used WITHIN the awk expression,
    they are not confused with redirection symbols

    Using the awk command to display results based on comparison operators.
    <     Less than
    <=   Less than or equal
    >     Greater than
    >=   Greater than or equal
    ==   Equal
    !=    Not equal

  10. Issue the following linux pipeline command to display display the car make, model, quantity and price of all vehicles whose prices are less than $5,000:
    awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-4.txt

    What do you notice?

  11. Issue the following linux pipeline command to display display price,
    quantity, model and car make of vehicles whose prices are less than $5,000:
    awk '$5 < 5000 {print $5,$4,$2,$1}' cars.txt | tee awk-5.txt

  12. Issue the following linux pipeline command to display the car make,
    year and quantity of cars that begin with the letter 'f':
    awk '$1 ~ /^f/ {print $1,$2,$4}' cars.txt | tee awk-6.txt

    Using the awk command to display combined search results based on compound operators.
    Combined pattern searches can be made
    by using compound operator symbols:

    &&     (and)
    ||        (or)

  13. Issue the following linux pipeline command to list all fords
    whose price is greater than $10,000:
    awk '$1 ~ /ford/ && $5 > 10000 {print $0}' cars.txt | tee awk-7.txt

  14. Issue the following linux command (copy and paste to save time):
    wget https://ict.senecacollege.ca/~murray.saul/uli101/cars2.txt

  15. Issue the cat command to quickly view the contents of the cars2.txt file.

  16. Issue the following linux pipeline command to display the year
    and quantity of cars that begin with the letter 'f' for the cars2.txt database:
    awk '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-8.txt

    What did you notice?

    The problem is that the cars2.txt database separates each field by a semi-colon (;) instead of TAB.
    Therefore, it does not recognize the second and fourth fields.

    You need to issue awk with the -F option to indicate that this file's fields are separated (delimited) by a semi-colorn.

  17. Issue the following linux pipeline command to display the year
    and quantity of cars that begin with the letter 'f' for the cars2.txt database:
    awk -F";" '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-9.txt

    What did you notice this time?

  18. Issue the following to run a checking script:
    ~uli101/week11-check-2

    If you encounter errors, make corrections and re-run the checking script until you
    receive a congratulations message, then you can proceed.

LINUX PRACTICE QUESTIONS

The purpose of this section is to obtain extra practice to help with quizzes, your midterm, and your final exam.

Here is a link to the MS Word Document of ALL of the questions displayed below but with extra room to answer on the document to simulate a quiz:

https://ict.senecacollege.ca/~murray.saul/uli101/uli101_week11_practice.docx

Your instructor may take-up these questions during class. It is up to the student to attend classes in order to obtain the answers to the following questions. Your instructor will NOT provide these answers in any other form (eg. e-mail, etc).


Review Questions:

Part A: Display Results from Using the sed Utility

Note the contents from the following tab-delimited file called ~murray.saul/uli101/stuff.txt: (this file pathname exists for checking your work)

Line one.
This is the second line.
This is the third.
This is line four.
Five.
Line six follows
Followed by 7
Now line 8
and line nine
Finally, line 10


Write the results of each of the following Linux commands for the above-mentioned file:


  1. sed -n '3,6 p' ~murray.saul/uli101/stuff.txt

  2. sed '4 q' ~murray.saul/uli101/stuff.txt

  3. sed '/the/ d' ~murray.saul/uli101/stuff.txt

  4. sed 's/line/NUMBER/g' ~murray.saul/uli101/stuff.txt


Part B: Writing Linux Commands Using the sed Utility

Write a single Linux command to perform the specified tasks for each of the following questions.


  1. Write a Linux sed command to display only lines 5 to 9 for the file: ~murray.saul/uli101/stuff.txt

  2. Write a Linux sed command to display only lines the begin the pattern “and” for the file: ~murray.saul/uli101/stuff.txt

  3. Write a Linux sed command to display only lines that end with a digit for the file: ~murray.saul/uli101/stuff.txt

  4. Write a Linux sed command to save lines that match the pattern “line” (upper or lowercase) for the file: ~murray.saul/uli101/stuff.txt and save results (overwriting previous contents) to: ~/results.txt


Part C: Writing Linux Commands Using the awk Utility

Note the contents from the following tab-delimited file called ~murray.saul/uli101/stuff.txt: (this file pathname exists for checking your work)

Line one.
This is the second line.
This is the third.
This is line four.
Five.
Line six follows
Followed by 7
Now line 8
and line nine
Finally, line 10


Write the results of each of the following Linux commands for the above-mentioned file:


  1. awk ‘NR == 3 {print}’ ~murray.saul/uli101/stuff.txt

  2. awk ‘NR >= 2 && NR <= 5 {print}’ ~murray.saul/uli101/stuff.txt

  3. awk ‘$1 ~ /This/ {print $2}’ ~murray.saul/uli101/stuff.txt

  4. awk ‘$1 ~ /This/ {print $3,$2}’ ~murray.saul/uli101/stuff.txt


Part D: Writing Linux Commands Using the awk Utility


Write a single Linux command to perform the specified tasks for each of the following questions.


  1. Write a Linux awk command to display all records for the file: ~/cars whose fifth field is greater than 10000.

  2. Write a Linux awk command to display the first and fourth fields for the file: ~/cars whose fifth field begins with a number.

  3. Write a Linux awk command to display the second and third fields for the file: ~/cars for records that match the pattern “chevy”.

  4. Write a Linux awk command to display the first and second fields for all the records contained in the file: ~/cars