Open main menu

CDOT Wiki β

Changes

Tutorial11: Sed & Awk Utilities

16,771 bytes added, 10:23, 26 July 2023
no edit summary
{{Admon/caution|DO NOT USE THIS VERSION OF THE LAB. LOOK FOR LAB 10: SED & AWK|'''This is an archive version.'''}}
 
=USING SED & AWK UTILTIES=
<br>
===Main Objectives of this Practice Tutorial===
:* Learn how to issue Use the '''sed''' command to '''manipulate text ''' contained in a file.
:* List and understand explain several '''addresses''' and '''instructions ''' associated with the '''sed''' command.
:* Use the '''sed''' command as a '''filter to manipulate text''' with Linux pipeline commands.
:* Learn how to issue Use the '''awk''' command to '''manipulate text ''' contained in a file.
:* List and explain '''comparison operators''', '''variables''' and '''actions''' associated with the '''awk''' command. :* Use the '''awk''' command as a '''filter to manipulate text''' with Linux pipeline commands.
<br><br>
 
===Tutorial Reference Material===
|- valign="top" style="padding-left:15px;"
|colspan="2" |Course Notes'''Slides''':<ul><li>Week 11 Lecture 1 Notes:<br> [[Media:ULI101-Week11.1.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murraychris.sauljohnson/ULI101/uli101ULI101-Week11.1.pptx PPTX]</li><li>Week 11 Lecture 2 Notes:<br> [[Media:ULI101-Week11.2.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murrayjason.saulcarman/uli101slides/ULI101-Week11.2.pptx PPTX]<br></li></ul>
| style="padding-left:15px;" |'''Text Manipulation:'''* [https://www.digitalocean.com/community/tutorials/the-basics-of-using-the-sed-stream-editor-to-manipulate-text-in-linux Purpose of using the sed utility]* [https://www.digitalocean.com/community/tutorials/how-to-use-the-awk-language-to-manipulate-text-in-linux Purpose of using the awk utility]
| style="padding-left:15px;" |Man Pages'''Commands:'''
* [https://man7.org/linux/man-pages/man1/sed.1p.html sed]
* [https://man7.org/linux/man-pages/man1/awk.1p.html awk]
|colspan="1" style="padding-left:15px;" width="30%"|'''Brauer Instructional Videos:'''<ul><li>[https://www.youtube.com/watch?v=npU6S61AIko&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=14 Using the sed Utility]</li><li>[https://www.youtube.com/watch?v=OV3XzjDYgJo&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=13 Using the awk Utility]</ul>
|}
===Using the sed Utility===
Sed Utility Usage:
'''Usage:'''
'''<span style="color:blue;font-weight:bold;font-family:courier;">Syntax: sed [-n] 'address instruction' filename</span>'''
'''How it Works:'''
 
* The sed command reads all lines in the input file and will be exposed to the expression<br>(i.e. area contained within quotes) one line at a time.
* The expression can be within single quotes or double quotes.
* The expression contains an address (match condition) and an instruction (operation).
* If the line matches the address, then it will perform the instruction.
* Lines will display be default unless the '''–n''' option is used to suppress default display
<br>
'''Address:'''
*can Can use a line number, to select a specific line (for example: '''5''') *can Can specify a range of line numbers (for example: '''5,7''') *can Regular expressions are contained within forward slashes (e.g. /regular-expression/)* Can specify a regular expression to select all lines that match a pattern (e.g '''/^happy[0-9].*[0-9]$/''') *Note: when using regular expressions, you must delimit them with a forward-slash"/"<br>default If NO address (if none is specified) present, the instruction will match every lineapply to ALL lines
 [[Image:sed.png|thumb|right|500px|Instructions to take action if text matches an address.]]
'''Instruction:'''
*'''Action ''' to take for matched line(s)*Refer to table on right-side for list of some <br>'''common instructions ''' and their purpose<br><br>
===Using the awk Utility===
x
 
=INVESTIGATION 1: USING THE SED UTILITY=
<br>In this section, you will learn how to ...'''Usage:'''
<span style="color:blue;font-weight:bold;font-family:courier;">awk [-F] 'selection-criteria {action}’ file-name</span>
'''How It Works:'''
* The '''Perform awk''' command reads all lines in the input file and will be exposed to the expression (contained within quotes) for processing.*The '''expression''' (contained in quotes) represents '''selection criteria''', and '''action''' to execute contained within braces '''{}'''* if selection criteria is matched, then action (between braces) is executed.* The '''–F''' option can be used to specify the Following Steps:default '''field delimiter''' (separator) character<br>eg. '''awk –F”;”''' (would indicate a semi-colon delimited input file).<br>'''Selection Criteria'''
# x* You can use a regular expression, enclosed within slashes, as a pattern. For example: '''/pattern/'''* The ~ operator tests whether a field or variable matches a regular expression. For example: '''$1 ~ /^[0-9]/'''* The '''!~''' operator tests for no match. For example: '''$2 !~ /line/'''* You can perform both numeric and string comparisons using relational operators ( '''>''' , '''>=''' , '''<''' , '''<=''' , '''==''' , '''!=''' ).* You can combine any of the patterns using the Boolean operators '''||''' (OR) and '''&&''' (AND).* You can use built-in variables (like NR or "record number" representing line number) with comparison operators.<br>For example: '''NR >=1 && NR <= 5''' <br>'''Action (execution):'''
In * Action to be executed is contained within braces '''{}'''* The '''print''' command can be used to display text (fields).* You can use parameters which represent fields within records (lines) within the expression of the awk utility.* The parameter '''$0''' represents all of the next investigationfields contained in the record (line).* The parameters '''$1''', you will .'''$2''', '''$3''' … '''$9''' represent the first, second and third to the 9th fields contained within the record.* Parameters greater than nine requires the value of the parameter to be placed within braces (for example: '''${10}''','''${11}''','''${12}''', etc.)* You can use built-in '''variables''' (such as '''NR''' or "record number" representing line number)<br><br>eg. '''{print NR,$0}''' (will print record number, then entire record).
=INVESTIGATION 21: USING THE AWK SED UTILITY =
In <span style="color:red;">'''ATTENTION''': Effective '''May 9, 2022''' - this section, you online tutorial will learn how be required to be completed by '''Friday in week 11 by midnight'''<br>to ...obtain a grade of '''2%''' towards this course</span><br><br>
In this investigation, you will learn how to manipulate text using the '''sed''' utility.
'''Perform the Following Steps:'''
# x'''Login''' to your matrix account and confirm you are located in your '''home''' directory.<br><br># Issue a Linux command to create a directory called '''sed'''<br><br># Issue a Linux command to <u>change</u> to the '''sed''' directory and confirm that you are located in the '''sed''' directory.<br><br># Issue the following Linux command to download the data.txt file<br>('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/data.txt</nowiki></span><br><br># Issue the '''more''' command to quickly view the contents of the '''data.txt''' file.<br>When finished, exit the more command by pressing the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span>[[Image:sed-1.png|thumb|right|300px|Issuing the '''p''' instruction without using the '''-n''' option (to suppress original output) will display lines twice.]]<br><br>The '''p''' instruction with the '''sed''' command is used to<br>'''print''' (i.e. ''display'') the contents of a text file.<br><br># Issue the following Linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed 'p' data.txt</span><br><br>'''NOTE: You should notice that each line appears twice'''.<br><br>The reason why standard output appears twice is that the sed command<br>(without the '''-n option''') displays all lines regardless of an address used.<br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files<br>for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n 'p' data.txt | tee sed-1.txt</span><br><br>What do you notice? You should see only one line.<br><br>You can specify an '''address''' to display lines using the sed utility<br>(eg. ''line #'', '''line #s''' or range of '''line #s''').<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '1 p' data.txt | tee sed-2.txt</span><br><br>You should see the first line of the text file displayed.<br>What other command is used to only display the first line in a file?<br><br>[[Image:sed-2.png|thumb|right|500px|Using the sed command to display a '''range''' of lines.]]# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '2,5 p' data.txt | tee sed-3.txt</span><br><br>What is displayed? How would you modify the sed command to display the line range 10 to 50?<br><br>The '''s''' instruction is used to '''substitute''' text<br>(a similar to method was demonstrated in the vi editor in tutorial 9).<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '2,5 s/TUTORIAL/LESSON/g' data.txt | tee sed-4.txt | more</span><br><br>What do you notice? View the original contents of lines 2 to 5 in the '''data.txt''' file<br>in another shell to confirm that the substitution occurred.<br><br>[[Image:sed-3.png|thumb|right|500px|Using the sed command with the '''-q''' option to display up to a line number, then quit.]]The '''q''' instruction terminates or '''quits''' the execution of the sed utility as soon as it is read in a particular line or matching pattern.<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '11 q' data.txt | tee sed-5.txt</span><br><br>What did you notice? How many lines were displayed<br>before the sed command exited?<br><br>You can use '''regular expressions''' to select lines that match a pattern. In fact,<br>the sed command was one of the <u>first</u> Linux commands that used regular expression.<br><br>The rules remain the same for using regular expressions as demonstrated in '''tutorial 9'''<br>except the regular expression must be contained within '''forward slashes'''<br>(eg. <span style="font-family:courier;font-weight:bold;">/regexp/</span> ).<br><br>[[Image:sed-4.png|thumb|right|400px|Using the sed command using regular expressions with '''anchors'''.]]# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/^The/ p' data.txt | tee sed-6.txt</span><br><br>What do you notice?<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p' data.txt | tee sed-7.txt</span><br><br>What do you notice?<br><br>The '''sed''' utility can also be used as a '''filter''' to manipulate text that<br>was generated from Linux commands.<br><br>[[Image:sed-5.png|thumb|right|400px|Using the sed command with '''pipeline''' commands.]]# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">who | sed -n '/^[a-m]/ p' | tee sed-8.txt | more</span><br><br>What did you notice?<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">ls | sed -n '/txt$/ p' | tee sed-9.txt</span><br><br>What did you notice?<br><br># Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-1</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script<br>until you receive a congratulations message, then you can proceed.<br><br>
:In the next investigation, you will ... =INVESTIGATION 3: USING SED &amp; AWK UTILITIES AS FILTERS = In this section, you will learn how to manipulate text using the '''awk''' utility...<br><br>
=INVESTIGATION 2: USING THE AWK UTILITY =
In this investigation, you will learn how to use the awk utility to manipulate text and generate reports.
'''Perform the Following Steps:'''
# xChange to your '''home''' directory and issue a command to '''confirm'''<br>you are located in your ''home'' directory.<br><br># Issue a Linux command to create a directory called '''awk'''<br><br># Issue a Linux command to <u>change</u> to the '''awk''' directory and confirm you are located in the '''awk''' directory.<br><br>Let's download a database file that contains information regarding classic cars.<br><br># Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars.txt</nowiki></span><br><br># Issue the '''cat''' command to quickly view the contents of the '''cars.txt''' file.<br><br>The "'''print'''" action (command) is the <u>default</u> action of awk to print<br>all selected lines that match a '''pattern'''.<br><br>This '''action''' (contained in braces) can provide more options<br>such as printing '''specific fields''' of selected lines (or records) from a database.<br><br>[[Image:awk-1.png|thumb|right|400px|Using the awk command to display matches of the pattern '''ford'''.]]# Issue the following linux command all to display all lines (i.e. records) in the '''cars.txt''' database that matches the pattern (or "make") called '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/ {print}' cars.txt</span><br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br># Issue the following linux pipeline command all to display records<br>in the '''cars.txt''' database that contain the pattern (i.e. make) '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/' cars.txt | tee awk-1.txt</span><br><br>What do you notice? You should notice ALL lines displayed <u>without</u> using '''search criteria'''.<br><br>You can use ''builtin'' '''variables''' with the '''print''' command for further processing.<br>We will discuss the following variables in this tutorial:<br><br>[[Image:awk-2.png|thumb|right|400px|Using the awk command to print search results by '''field number'''.]]'''$0''' - Current record (entire line)<br>'''$1''' - First field in record<br>'''$n''' - nth field in record<br>'''NR''' - Record Number (order in database)<br> '''NF''' - Number of fields in current record<br><br>For a listing of more variables, please consult your course notes.<br><br># Issue the following linux pipeline command to display the '''model''', '''year''', '''quantity''' and price<br>in the '''cars.txt''' database for makes of '''chevy''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/chevy/ {print $2,$3,$4,$5}' cars.txt | tee awk-2.txt</span><br><br>Notice that a '''space''' is the delimiter for the fields that appear as standard output.<br><br>The '''tilde character''' '''~''' is used to search for a pattern or display standard output for a particular field.<br><br># Issue the following linux pipeline command to display all '''plymouths''' ('''plym''')<br>by '''model name''', '''price''' and '''quantity''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /plym/ {print $2,$3,$4,$5}' cars.txt | tee awk-3.txt</span><br><br>You can also use '''comparison operators''' to specify conditions for processing with matched patterns<br>when using the awk command. Since they are used WITHIN the awk expression,<br>they are not confused with redirection symbols<br><br>[[Image:awk-3.png|thumb|right|400px|Using the awk command to display results based on '''comparison operators'''.]]'''<''' &nbsp;&nbsp;&nbsp;&nbsp;Less than<br>'''<=''' &nbsp;&nbsp;Less than or equal<br>'''>''' &nbsp;&nbsp;&nbsp;&nbsp;Greater than<br>'''>=''' &nbsp;&nbsp;Greater than or equal<br>'''==''' &nbsp;&nbsp;Equal<br>'''!=''' &nbsp;&nbsp;&nbsp;Not equal<br><br># Issue the following linux pipeline command to display display the '''car make''', '''model''', '''quantity''' and '''price''' of all vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-4.txt</span><br><br>What do you notice?<br><br># Issue the following linux pipeline command to display display '''price''',<br>'''quantity''', '''model''' and '''car make''' of vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $5,$4,$2,$1}' cars.txt | tee awk-5.txt</span><br><br># Issue the following linux pipeline command to display the '''car make''',<br>'''year''' and '''quantity''' of cars that '''begin''' with the '''letter 'f'''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $1,$2,$4}' cars.txt | tee awk-6.txt</span><br><br>[[Image:awk-4.png|thumb|right|400px|Using the awk command to display combined search results based on '''compound operators'''.]]Combined pattern searches can be made<br>by using '''compound operator''' symbols:<br><br>'''&&''' &nbsp;&nbsp;&nbsp;&nbsp;(and)<br>'''||''' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)<br><br># Issue the following linux pipeline command to list all '''fords'''<br>whose '''price is greater than $10,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /ford/ && $5 > 10000 {print $0}' cars.txt | tee awk-7.txt</span><br><br># Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars2.txt</nowiki></span><br><br># Issue the '''cat''' command to quickly view the contents of the '''cars2.txt''' file.<br><br># Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-8.txt</span><br><br>What did you notice?<br><br>The problem is that the '''cars2.txt''' database separates each field by a semi-colon (''';''') <u>instead</u> of '''TAB'''.<br>Therefore, it does not recognize the second and fourth fields.<br><br>You need to issue awk with the -F option to indicate that this file's fields are separated (delimited) by a semi-colorn.<br><br># Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk -F";" '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-9.txt</span><br><br>What did you notice this time?<br><br># Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-2</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.<br><br>
= LINUX PRACTICE QUESTIONS =
# <span style="font-family:courier;font-weight:bold">sed -n '3,6 p' ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">sed '4 q' ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">sed '/the/ d' ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">sed 's/line/NUMBER/g' ~murray.saul/uli101/stuff.txt</span>
# <span style="font-family:courier;font-weight:bold">awk ‘NR == 3 {print}’ ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">awk ‘NR >= 2 && NR <= 5 {print}’ ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $2}’ ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $3,$2}’ ~murray.saul/uli101/stuff.txt</span><br><br>