Changes

Jump to: navigation, search

Tutorial11: Sed & Awk Utilities

18,743 bytes added, 10:23, 26 July 2023
no edit summary
{{Admon/caution|DO NOT USE THIS VERSION OF THE LAB. LOOK FOR LAB 10: SED & AWK|'''This is an archive version.'''}}
 
=USING SED & AWK UTILTIES=
<br>
===Main Objectives of this Practice Tutorial===
:* xUse the '''sed''' command to '''manipulate text''' contained in a file. :* List and explain several '''addresses''' and '''instructions''' associated with the '''sed''' command. :* Use the '''sed''' command as a '''filter''' with Linux pipeline commands.
:* xUse the '''awk''' command to '''manipulate text''' contained in a file.
:* xList and explain '''comparison operators''', '''variables''' and '''actions''' associated with the '''awk''' command.
:* xUse the '''awk''' command as a '''filter''' with Linux pipeline commands.<br><br>
===Tutorial Reference Material===
|- valign="top" style="padding-left:15px;"
|colspan="2" |Course Notes'''Slides''':<ul><li>Week 11 Lecture 1 Notes:<br> [[Media:ULI101-Week11.1.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murraychris.sauljohnson/ULI101/uli101ULI101-Week11.1.pptx PPTX]</li><li>Week 11 Lecture 2 Notes:<br> [[Media:ULI101-Week11.2.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murrayjason.saulcarman/uli101slides/ULI101-Week11.2.pptx PPTX]<br></li></ul>
| style="padding-left:15px;" |'''Text Manipulation:'''* [https://www.digitalocean.com/community/tutorials/the-basics-of-using-the-sed-stream-editor-to-manipulate-text-in-linux Purpose of using the sed utility]* [https://www.digitalocean.com/community/tutorials/how-to-use-the-awk-language-to-manipulate-text-in-linux Purpose of using the awk utility]
| style="padding-left:15px;" |Man Pages'''Commands:'''
* [https://man7.org/linux/man-pages/man1/sed.1p.html sed]
* [https://man7.org/linux/man-pages/man1/awk.1p.html awk]
|colspan="1" style="padding-left:15px;" width="30%"|'''Brauer Instructional Videos:'''<ul><li>[https://www.youtube.com/watch?v=npU6S61AIko&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=14 Using the sed Utility]</li><li>[https://www.youtube.com/watch?v=OV3XzjDYgJo&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=13 Using the awk Utility]</ul>
|}
= KEY CONCEPTS =
===Manipulating Text===
x===Using the sed Utility===  '''Usage:''' '''<span style="color:blue;font-weight:bold;font-family:courier;">Syntax: sed [-n] 'address instruction' filename</span>'''   '''How it Works:'''
* The sed command reads all lines in the input file and will be exposed to the expression<br>(i.e. area contained within quotes) one line at a time.
* The expression can be within single quotes or double quotes.
* The expression contains an address (match condition) and an instruction (operation).
* If the line matches the address, then it will perform the instruction.
* Lines will display be default unless the '''–n''' option is used to suppress default display
<br>
'''Address:'''
===Using * Can use a line number, to select a specific line (for example: '''5''')* Can specify a range of line numbers (for example: '''5,7''')* Regular expressions are contained within forward slashes (e.g. /regular-expression/)* Can specify a regular expression to select all lines that match a pattern (e.g '''/^[0-9].*[0-9]$/''') * If NO address is present, the sed Utility===instruction will apply to ALL lines
x
[[Image:sed.png|right|500px|]]
'''Instruction:'''
*'''Action''' to take for matched line(s)
*Refer to table on right-side for list of some<br>'''common instructions''' and their purpose
<br><br>
===Using the awk Utility===
x
=INVESTIGATION 1'''Usage: USING THE SED UTILITY='''
<brspan style="color:blue;font-weight:bold;font-family:courier;">awk [-F] 'selection-criteria {action}’ file-name</span>In this section, you will learn how to ...
'''How It Works:'''
* The '''awk''' command reads all lines in the input file and will be exposed to the expression (contained within quotes) for processing.
*The '''expression''' (contained in quotes) represents '''selection criteria''', and '''action''' to execute contained within braces '''{}'''
* if selection criteria is matched, then action (between braces) is executed.
* The '''–F''' option can be used to specify the default '''field delimiter''' (separator) character<br>eg. '''awk –F”;”''' (would indicate a semi-colon delimited input file).
<br>
'''Selection Criteria'''
* You can use a regular expression, enclosed within slashes, as a pattern. For example: '''Perform /pattern/'''* The ~ operator tests whether a field or variable matches a regular expression. For example: '''$1 ~ /^[0-9]/'''* The '''!~''' operator tests for no match. For example: '''$2 !~ /line/'''* You can perform both numeric and string comparisons using relational operators ( '''>''' , '''>=''' , '''<''' , '''<=''' , '''==''' , '''!=''' ).* You can combine any of the Following Stepspatterns using the Boolean operators '''||''' (OR) and '''&&''' (AND).* You can use built-in variables (like NR or "record number" representing line number) with comparison operators.<br>For example: '''NR >=1 && NR <= 5''' <br>'''Action (execution):'''
# x<br>* Action to be executed is contained within braces '''{}'''* The '''print''' command can be used to display text (fields).* You can use parameters which represent fields within records (lines) within the expression of the awk utility.* The parameter '''$0''' represents all of the fields contained in the record (line).* The parameters '''$1''', '''$2''', '''$3''' … '''$9''' represent the first, second and third to the 9th fields contained within the record. * Parameters greater than nine requires the value of the parameter to be placed within braces (for example: '''${10}''','''${11}''','''${12}''', etc.)* You can use built-in '''variables''' (such as '''NR''' or "record number" representing line number)<br>eg. '''{print NR,$0}''' (will print record number, then entire record).
In the next investigation, you will ...<br><br>=INVESTIGATION 1: USING THE SED UTILITY=
<span style=INVESTIGATION 2"color:red;">'''ATTENTION''': USING THE AWK UTILITY = In Effective '''May 9, 2022''' - this section, you online tutorial will learn how be required to be completed by '''Friday in week 11 by midnight'''<br>to ...obtain a grade of '''2%''' towards this course</span><br><br>
In this investigation, you will learn how to manipulate text using the '''sed''' utility.
'''Perform the Following Steps:'''
# x'''Login''' to your matrix account and confirm you are located in your '''home''' directory.<br><br># Issue a Linux command to create a directory called '''sed'''<br><br># Issue a Linux command to <u>change</u> to the '''sed''' directory and confirm that you are located in the '''sed''' directory.<br><br># Issue the following Linux command to download the data.txt file<br>('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/data.txt</nowiki></span><br><br># Issue the '''more''' command to quickly view the contents of the '''data.txt''' file.<br>When finished, exit the more command by pressing the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span>[[Image:sed-1.png|thumb|right|300px|Issuing the '''p''' instruction without using the '''-n''' option (to suppress original output) will display lines twice.]]<br><br>The '''p''' instruction with the '''sed''' command is used to<br>'''print''' (i.e. ''display'') the contents of a text file.<br><br># Issue the following Linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed 'p' data.txt</span><br><br>'''NOTE: You should notice that each line appears twice'''.<br><br>The reason why standard output appears twice is that the sed command<br>(without the '''-n option''') displays all lines regardless of an address used.<br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files<br>for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n 'p' data.txt | tee sed-1.txt</span><br><br>What do you notice? You should see only one line.<br><br>You can specify an '''address''' to display lines using the sed utility<br>(eg. ''line #'', '''line #s''' or range of '''line #s''').<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '1 p' data.txt | tee sed-2.txt</span><br><br>You should see the first line of the text file displayed.<br>What other command is used to only display the first line in a file?<br><br>[[Image:sed-2.png|thumb|right|500px|Using the sed command to display a '''range''' of lines.]]# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '2,5 p' data.txt | tee sed-3.txt</span><br><br>What is displayed? How would you modify the sed command to display the line range 10 to 50?<br><br>The '''s''' instruction is used to '''substitute''' text<br>(a similar to method was demonstrated in the vi editor in tutorial 9).<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '2,5 s/TUTORIAL/LESSON/g' data.txt | tee sed-4.txt | more</span><br><br>What do you notice? View the original contents of lines 2 to 5 in the '''data.txt''' file<br>in another shell to confirm that the substitution occurred.<br><br>[[Image:sed-3.png|thumb|right|500px|Using the sed command with the '''-q''' option to display up to a line number, then quit.]]The '''q''' instruction terminates or '''quits''' the execution of the sed utility as soon as it is read in a particular line or matching pattern.<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '11 q' data.txt | tee sed-5.txt</span><br><br>What did you notice? How many lines were displayed<br>before the sed command exited?<br><br>You can use '''regular expressions''' to select lines that match a pattern. In fact,<br>the sed command was one of the <u>first</u> Linux commands that used regular expression.<br><br>The rules remain the same for using regular expressions as demonstrated in '''tutorial 9'''<br>except the regular expression must be contained within '''forward slashes'''<br>(eg. <span style="font-family:courier;font-weight:bold;">/regexp/</span> ).<br><br>[[Image:sed-4.png|thumb|right|400px|Using the sed command using regular expressions with '''anchors'''.]]# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/^The/ p' data.txt | tee sed-6.txt</span><br><br>What do you notice?<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p' data.txt | tee sed-7.txt</span><br><br>What do you notice?<br><br>The '''sed''' utility can also be used as a '''filter''' to manipulate text that<br>was generated from Linux commands.<br><br>[[Image:sed-5.png|thumb|right|400px|Using the sed command with '''pipeline''' commands.]]# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">who | sed -n '/^[a-m]/ p' | tee sed-8.txt | more</span><br><br>What did you notice?<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">ls | sed -n '/txt$/ p' | tee sed-9.txt</span><br><br>What did you notice?<br><br># Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-1</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script<br>until you receive a congratulations message, then you can proceed.<br><br>
:In the next investigation, you will ... =INVESTIGATION 3: USING SED &amp; AWK UTILITIES AS FILTERS = In this section, you will learn how to manipulate text using the '''awk''' utility...<br><br>
=INVESTIGATION 2: USING THE AWK UTILITY =
In this investigation, you will learn how to use the awk utility to manipulate text and generate reports.
'''Perform the Following Steps:'''
# xChange to your '''home''' directory and issue a command to '''confirm'''<br>you are located in your ''home'' directory.<br><br># Issue a Linux command to create a directory called '''awk'''<br><br># Issue a Linux command to <u>change</u> to the '''awk''' directory and confirm you are located in the '''awk''' directory.<br><br>Let's download a database file that contains information regarding classic cars.<br><br># Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars.txt</nowiki></span><br><br># Issue the '''cat''' command to quickly view the contents of the '''cars.txt''' file.<br><br>The "'''print'''" action (command) is the <u>default</u> action of awk to print<br>all selected lines that match a '''pattern'''.<br><br>This '''action''' (contained in braces) can provide more options<br>such as printing '''specific fields''' of selected lines (or records) from a database.<br><br>[[Image:awk-1.png|thumb|right|400px|Using the awk command to display matches of the pattern '''ford'''.]]# Issue the following linux command all to display all lines (i.e. records) in the '''cars.txt''' database that matches the pattern (or "make") called '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/ {print}' cars.txt</span><br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br># Issue the following linux pipeline command all to display records<br>in the '''cars.txt''' database that contain the pattern (i.e. make) '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/' cars.txt | tee awk-1.txt</span><br><br>What do you notice? You should notice ALL lines displayed <u>without</u> using '''search criteria'''.<br><br>You can use ''builtin'' '''variables''' with the '''print''' command for further processing.<br>We will discuss the following variables in this tutorial:<br><br>[[Image:awk-2.png|thumb|right|400px|Using the awk command to print search results by '''field number'''.]]'''$0''' - Current record (entire line)<br>'''$1''' - First field in record<br>'''$n''' - nth field in record<br>'''NR''' - Record Number (order in database)<br> '''NF''' - Number of fields in current record<br><br>For a listing of more variables, please consult your course notes.<br><br># Issue the following linux pipeline command to display the '''model''', '''year''', '''quantity''' and price<br>in the '''cars.txt''' database for makes of '''chevy''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/chevy/ {print $2,$3,$4,$5}' cars.txt | tee awk-2.txt</span><br><br>Notice that a '''space''' is the delimiter for the fields that appear as standard output.<br><br>The '''tilde character''' '''~''' is used to search for a pattern or display standard output for a particular field.<br><br># Issue the following linux pipeline command to display all '''plymouths''' ('''plym''')<br>by '''model name''', '''price''' and '''quantity''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /plym/ {print $2,$3,$4,$5}' cars.txt | tee awk-3.txt</span><br><br>You can also use '''comparison operators''' to specify conditions for processing with matched patterns<br>when using the awk command. Since they are used WITHIN the awk expression,<br>they are not confused with redirection symbols<br><br>[[Image:awk-3.png|thumb|right|400px|Using the awk command to display results based on '''comparison operators'''.]]'''<''' &nbsp;&nbsp;&nbsp;&nbsp;Less than<br>'''<=''' &nbsp;&nbsp;Less than or equal<br>'''>''' &nbsp;&nbsp;&nbsp;&nbsp;Greater than<br>'''>=''' &nbsp;&nbsp;Greater than or equal<br>'''==''' &nbsp;&nbsp;Equal<br>'''!=''' &nbsp;&nbsp;&nbsp;Not equal<br><br># Issue the following linux pipeline command to display display the '''car make''', '''model''', '''quantity''' and '''price''' of all vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-4.txt</span><br><br>What do you notice?<br><br># Issue the following linux pipeline command to display display '''price''',<br>'''quantity''', '''model''' and '''car make''' of vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $5,$4,$2,$1}' cars.txt | tee awk-5.txt</span><br><br># Issue the following linux pipeline command to display the '''car make''',<br>'''year''' and '''quantity''' of cars that '''begin''' with the '''letter 'f'''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $1,$2,$4}' cars.txt | tee awk-6.txt</span><br><br>[[Image:awk-4.png|thumb|right|400px|Using the awk command to display combined search results based on '''compound operators'''.]]Combined pattern searches can be made<br>by using '''compound operator''' symbols:<br><br>'''&&''' &nbsp;&nbsp;&nbsp;&nbsp;(and)<br>'''||''' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)<br><br># Issue the following linux pipeline command to list all '''fords'''<br>whose '''price is greater than $10,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /ford/ && $5 > 10000 {print $0}' cars.txt | tee awk-7.txt</span><br><br># Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars2.txt</nowiki></span><br><br># Issue the '''cat''' command to quickly view the contents of the '''cars2.txt''' file.<br><br># Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-8.txt</span><br><br>What did you notice?<br><br>The problem is that the '''cars2.txt''' database separates each field by a semi-colon (''';''') <u>instead</u> of '''TAB'''.<br>Therefore, it does not recognize the second and fourth fields.<br><br>You need to issue awk with the -F option to indicate that this file's fields are separated (delimited) by a semi-colorn.<br><br># Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk -F";" '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-9.txt</span><br><br>What did you notice this time?<br><br># Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-2</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.<br><br>
= LINUX PRACTICE QUESTIONS =
# <span style="font-family:courier;font-weight:bold">sed -n '3,6 p' ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">sed '4 q' ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">sed '/the/ d' ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">sed 's/line/NUMBER/g' ~murray.saul/uli101/stuff.txt</span>
# Write a Linux sed command to display only lines 5 to 9 for the file: '''~murray.saul/uli101/stuff.txt'''<br><br># Write a Linux sed command to display only lines the begin the pattern “and” for the file: '''~murray.saul/uli101/stuff.txt'''<br><br># Write a Linux sed command to display only lines that end with a digit for the file: '''~murray.saul/uli101/stuff.txt'''<br><br># Write a Linux sed command to save lines that match the pattern “line” (upper or lowercase) for the file: '''~murray.saul/uli101/stuff.txt ''' and save results (overwriting previous contents) to: '''~/results.txt'''<br><br>
# <span style="font-family:courier;font-weight:bold">awk ‘NR == 3 {print}’ ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">awk ‘NR >= 2 && NR <= 5 {print}’ ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $2}’ ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $3,$2}’ ~murray.saul/uli101/stuff.txt</span><br><br>     
'''Part D: Writing Linux Commands Using the awk Utility'''
Write a single Linux command to perform the specified tasks for each of the following questions.
# Write a Linux awk command to display all records for the file: '''~/cars''' whose fifth field is greater than 10000.<br><br>
# Write a Linux awk command to display the first and fourth fields for the file: '''~/cars''' whose fifth field begins with a number.<br><br>
# Write a Linux awk command to display the second and third fields for the file: '''~/cars''' for records that match the pattern “chevy”.<br><br>
# Write a Linux awk command to display the first and second fields for all the records contained in the file: '''~/cars'''<br><br>

Navigation menu