Changes

Jump to: navigation, search

Tutorial11: Sed & Awk Utilities

11,170 bytes added, 10:23, 26 July 2023
no edit summary
{{Admon/caution|DO NOT USE THIS VERSION OF THE LAB. LOOK FOR LAB 10: SED & AWK|'''This is an archive version.'''}}
 
=USING SED & AWK UTILTIES=
<br>
===Main Objectives of this Practice Tutorial===
:* Learn how to issue Use the '''sed''' command to '''manipulate text ''' contained in a file. :* List and explain several '''addresses''' and '''instructions''' associated with the '''sed''' command.
:* List and understand several instructions associated with Use the '''sed''' commandas a '''filter''' with Linux pipeline commands.
:* Use the '''sedawk''' command as a filter to '''manipulate text''' contained in a file.
:* Learn how to issue List and explain '''comparison operators''', '''variables''' and '''actions''' associated with the '''awk''' command to manipulate text contained in a file.
:* Use the '''awk''' command as a '''filter to manipulate text''' with Linux pipeline commands.
<br><br>
 
===Tutorial Reference Material===
|- valign="top" style="padding-left:15px;"
|colspan="2" |Course Notes'''Slides''':<ul><li>Week 11 Lecture 1 Notes:<br> [[Media:ULI101-Week11.1.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murraychris.sauljohnson/ULI101/uli101ULI101-Week11.1.pptx PPTX]</li><li>Week 11 Lecture 2 Notes:<br> [[Media:ULI101-Week11.2.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murrayjason.saulcarman/uli101slides/ULI101-Week11.2.pptx PPTX]<br></li></ul>
| style="padding-left:15px;" |'''Text Manipulation:'''* [https://www.digitalocean.com/community/tutorials/the-basics-of-using-the-sed-stream-editor-to-manipulate-text-in-linux Purpose of using the sed utility]* [https://www.digitalocean.com/community/tutorials/how-to-use-the-awk-language-to-manipulate-text-in-linux Purpose of using the awk utility]
| style="padding-left:15px;" |Man Pages'''Commands:'''
* [https://man7.org/linux/man-pages/man1/sed.1p.html sed]
* [https://man7.org/linux/man-pages/man1/awk.1p.html awk]
|colspan="1" style="padding-left:15px;" width="30%"|'''Brauer Instructional Videos:'''<ul><li>[https://www.youtube.com/watch?v=npU6S61AIko&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=14 Using the sed Utility]</li><li>[https://www.youtube.com/watch?v=OV3XzjDYgJo&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=13 Using the awk Utility]</ul>
|}
'''Usage:'''
'''<span style="color:blue;font-weight:bold;font-family:courier;">Syntax: sed [-n] 'address instruction' filename</span>'''
'''<span style="color:blue;font-weight:bold;font-family:courier;">Syntax: sed [-n] 'address instruction' filename</span>'''
'''How it Works:'''
* The sed command reads all lines in the input file and will be exposed to the expression<br>(i.e. area contained within quotes) one line at a time.
* The expression can be within single quotes or double quotes.
* The expression contains an address (match condition) and an instruction (operation).
* If the line matches the address, then it will perform the instruction.
* Lines will display be default unless the '''–n''' option is used to suppress default display
<br>
'''Address:'''
*can Can use a line number, to select a specific line (for example: '''5''') *can Can specify a range of line numbers (for example: '''5,7''')* Regular expressions are contained within forward slashes (e.g. /regular-expression/) *can Can specify a regular expression to select all lines that match a pattern (e.g '''/^happy[0-9].*[0-9]$/''') *Note: when using regular expressions, you must delimit them with a forward-slash"/"<br>default If NO address (if none is specified) present, the instruction will match every lineapply to ALL lines
 [[Image:sed.png|thumb|right|500px|Instructions to take action if text matches an address.]]
'''Instruction:'''
*'''Action ''' to take for matched line(s)*Refer to table on right-side for list of some <br>'''common instructions ''' and their purpose<br><br>
===Using the awk Utility===
'''Usage:'''
<span style="color:blue;font-weight:bold;font-family:courier;">awk [-F] 'selection-criteria {action}’ file-name</span>
<span style="color:blue;font-weight:bold;font-family:courier;">awk options 'selection _criteria {action }’ file-name</span>
'''How It Works:'''
* The '''Note:awk''' command reads all lines in the input file and will be exposed to the expression (contained within quotes) for processing.*The '''expression''' (contained in quotes) represents '''selection criteria''', and '''action''' to execute contained within braces '''{}'''* if selection criteria is matched, then action (between braces) is executed.* The '''–F''' option can be used to specify the default '''field delimiter''' (separator) character<br>eg. '''awk –F”;”''' (would indicate a semi-colon delimited input file).<br>'''Selection Criteria'''
* You can use a regular expression, enclosed within slashes, as a pattern. For example: '''/pattern/'''*The awk command reads all lines in the input file ~ operator tests whether a field or variable matches a regular expression. For example: '''$1 ~ /^[0-9]/'''* The '''!~''' operator tests for no match. For example: '''$2 !~ /line/'''* You can perform both numeric and will be exposed to the expression string comparisons using relational operators (contained within quotes'''>''' , '''>=''' , '''<''' , '''<=''' , '''==''' , '''!=''' ) for processing.*Expression You can combine any of the patterns using the Boolean operators '''||''' (contained in quotesOR) represents selection criteria, and action to execute '''&&''' (contained within bracesAND) if selection criteria is matched .*If no pattern is specified, awk selects all lines You can use built-in the inputvariables (like NR or "record number" representing line number) with comparison operators.<br>For example: '''NR >=1 && NR <= 5''' <br>*If no action is specified, awk copies the selected lines to standard output'''Action (execution):'''
* Action to be executed is contained within braces '''{}'''
* The '''print''' command can be used to display text (fields).
* You can use parameters which represent fields within records (lines) within the expression of the awk utility.
* The parameter '''$0''' represents all of the fields contained in the record (line).
* The parameters '''$1''', '''$2''', '''$3''' … '''$9''' represent the first, second and third to the 9th fields contained within the record.
* Parameters greater than nine requires the value of the parameter to be placed within braces (for example: '''${10}''','''${11}''','''${12}''', etc.)
* You can use built-in '''variables''' (such as '''NR''' or "record number" representing line number)<br>eg. '''{print NR,$0}''' (will print record number, then entire record).
'''Patterns=INVESTIGATION 1: Regular Expressions''' *You can use a regular expression, enclosed within slashes, as a pattern. *The '''~''' operator tests whether a field or variable matches a regular expression *The '''!~''' operator tests for no match.*You can perform both numeric and string comparisons using relational operators *You can combine any of the patterns using the Boolean operators '''||''' (OR) and '''&&''' (AND)  [[Image:awk.png|thumb|right|300px|Operators used with the '''awk''' command.]]'''Patterns: Relational Operators''' *The following operators (in the table on the right-side) can be used with the awk utility to pattern searching. *Since those symbols are used within the expression, they are NOT confused with redirection symbols.<br><br><br><br>USING THE SED UTILITY=
<span style=INVESTIGATION 1"color: USING THE SED UTILITY=red;">'''ATTENTION''': Effective '''May 9, 2022''' - this online tutorial will be required to be completed by '''Friday in week 11 by midnight'''<br>to obtain a grade of '''2%''' towards this course</span><br><br>
<br>In this sectioninvestigation, you will learn how to manipulate textusing the '''sed''' utility.
'''Perform the Following Steps:'''
# '''Login''' to your matrix accountand confirm you are located in your '''home''' directory.<br><br># Issue a Linux command to create a directory called '''confirmsed'''<br><br># Issue a Linux command to <u>change</u> to the '''sed''' directory and confirm that you are located in your home the '''sed''' directory.<br><br># Issue the following linux Linux command to download the data.txt file<br>('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/data.txt</nowiki></span><br><br># Issue the '''more''' command to quickly view the contents of the '''data.txt''' file.<br>When finished, exit the more command by pressing the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span>[[Image:sed-1.png|thumb|right|300px|Issuing the '''p''' instruction without using the '''-n''' option (to suppress original output) will display lines twice.]]<br><br># The '''p''' instruction with the '''sed''' command in sed is used to <br>'''print or ''' (i.e. ''display '') the contents of a text file.<br><br># Issue the following linux Linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed 'p' data.txt</span><br><br>'''NOTE: You should notice that each line appears twice'''. <br><br>The reason why standard output appears twice is that the sed command <br>(without the '''-n option''') displays all lines regardless if they had been specified as of an address used.<br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files<br>for <u>confirmation</u> of running these pipeline commands when run a pattern'''checking-script''' later in this investigation.<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n 'p' data.txt | tee sed-1.txt</span><br><br>What do you notice?You should see only one line.<br><br>You can specify an '''address ''' to display lines using the sed utility<br>(eg. ''line #'', '''line #s ''' or range of '''line #s''') when using the sed utility.<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '1 p' data.txt | tee sed-2.txt</span><br><br>You should see the first line of the text file displayed.<br>What other command is used to only display the first line in a file?<br><br>[[Image:sed-2.png|thumb|right|500px|Using the sed command to display a '''range''' of lines.]]# Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '2,5 p' data.txt | tee sed-3.txt</span><br><br>What do you think is displayed? (in another SSH session, compare with contents in How would you modify the sed command to display the data.txt text file line range 10 to confirm).50?<br><br>The '''s''' command instruction is used to '''substitute patterns ''' text<br>(a similar to method demonstratedin was demonstrated in the vi editorin tutorial 9).<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '2,5 s/TUTORIAL/LESSON/g' data.txt| tee sed-4.txt | more</span><br><br>What do you notice? View the original contents of lines 2 to 5 in the '''data.txt''' file <br>in another shell to confirm that the substitution occurred.<br><br>[[Image:sed-3.png|thumb|right|500px|Using the sed command with the '''-q''' option to display up to a line number, then quit.]]The '''q''' command instruction terminates or '''quits ''' the execution of the sed utility as soon as it is read in a particular line or matching pattern.<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '11 q' data.txt | tee sed-5.txt</span><br><br>What did you notice? How many lines were displayed<br>before the sed command exited?<br><br>You can use '''regular expressions ''' to select lines that match a pattern. In fact,<br>the sed command was one of the <u>first</u> Linux commands that used regular expression.<br><br>The rules remain the same for using regular expressions as demonstrated in lab8 '''tutorial 9'''<br>except the regular expression must be contained within delimiters such as the '''forward slash slashes'''<br>(eg. <span style="font-family:courier;font-weight:bold;">/" when using the sed utilityregexp/</span> ).<br><br>[[Image:sed-4.png|thumb|right|400px|Using the sed command using regular expressions with '''anchors'''.]]# Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/^The/ p' data.txt | tee sed-6.txt</span><br><br>What do you notice?<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p' data.txt | tee sed-7.txt</span><br><br>What do you notice?<br><br>The '''sed''' utility can also be used as a '''filter ''' to manipulate text that <br>was generated from linux Linux commands.<br><br>[[Image:sed-5.png|thumb|right|400px|Using the sed command with '''pipeline''' commands.]]# Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">ls who | sed -n '/txt$^[a-m]/ p'| tee sed-8.txt | more</span><br><br>What did you notice?<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">who ls | sed -n '/^[a-m]txt$/ p' | moretee sed-9.txt</span><br><br>What did you notice?<br><br># Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-1</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script<br>until you receive a congratulations message, then you can proceed.<br><br>
:In the next investigation, you will learn how to manipulate text using the '''awk ''' utility.<br><br>
=INVESTIGATION 2: USING THE AWK UTILITY =
In this sectioninvestigation, you will learn how to use the awk utility to manipulate text and generate reports. 
'''Perform the Following Steps:'''
# Issue Change to your '''home''' directory and issue a command to '''confirm''' <br>you are located in your ''home '' directory.<br><br># Issue a Linux command to create a directory called '''awk'''<br><br># Issue a Linux command to <u>change</u> to the '''awk''' directoryand confirm you are located in the '''awk''' directory.<br><br>Let's download a database file that contains information regarding classic cars.<br><br># Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/datacars.txt</nowiki></span><br><br># Issue the '''morecat''' command to quickly view the contents of the '''datacars.txt''' file.<br>When finished, exit <br>The "'''print'''" action (command) is the <u>default</u> action of awk to print<br>all selected lines that match a '''pattern'''.<br><br>This '''action''' (contained in braces) can provide more options<br>such as printing '''specific fields''' of selected lines (or records) from a database.<br><br>[[Image:awk-1.png|thumb|right|400px|Using the awk command to display matches of the pattern '''ford'''.]]# Issue the following linux command all to display all lines (i.e. records) in the '''cars.txt''' database that matches the pattern (or "make") called '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/ {print}' cars.txt</span><br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br># Issue the following linux pipeline command all to display records<br>in the '''cars.txt''' database that contain the pattern (i.e. make) '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/' cars.txt | tee awk-1.txt</span><br><br>What do you notice? You should notice ALL lines displayed <u>without</u> using '''search criteria'''.<br><br>You can use ''builtin'' '''variables''' with the '''print''' command for further processing.<br>We will discuss the following variables in this tutorial:<br><br>[[Image:awk-2.png|thumb|right|400px|Using the awk command to print search results by pressing '''field number'''.]]'''$0''' - Current record (entire line)<br>'''$1''' - First field in record<br>'''$n''' - nth field in record<br>'''NR''' - Record Number (order in database)<br> '''NF''' - Number of fields in current record<br><br>For a listing of more variables, please consult your course notes.<br><br># Issue the following linux pipeline command to display the '''model''', '''year''', '''quantity''' and price<br>in the letter '''cars.txt''' database for makes of '''chevy''':<br><span style="color:blue;font-weight:bold;font-family:courier;">qawk '/chevy/ {print $2,$3,$4,$5}' cars.txt | tee awk-2.txt</span><br><br>Notice that a '''space''' is the delimiter for the fields that appear as standard output.<br><br>The '''tilde character''' '''~''' is used to search for a pattern or display standard output for a particular field.<br><br># Issue the following linux pipeline command to display all '''plymouths''' ('''plym''')<br>by '''model name''', '''price''' and '''quantity''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /plym/ {print $2,$3,$4,$5}' cars.txt | tee awk-3.txt</span><br><br>You can also use '''comparison operators''' to specify conditions for processing with matched patterns<br>when using the awk command. Since they are used WITHIN the awk expression,<br>they are not confused with redirection symbols<br><br>[[Image:awk-3.png|thumb|right|400px|Using the awk command to display results based on '''comparison operators'''.]]'''<''' &nbsp;&nbsp;&nbsp;&nbsp;Less than<br>'''<=''' &nbsp;&nbsp;Less than or equal<br>'''>''' &nbsp;&nbsp;&nbsp;&nbsp;Greater than<br>'''>=''' &nbsp;&nbsp;Greater than or equal<br>'''==''' &nbsp;&nbsp;Equal<br>'''!=''' &nbsp;&nbsp;&nbsp;Not equal<br><br># Issue the following linux pipeline command to display display the '''car make''', '''model''', '''quantity''' and '''price''' of all vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-4.txt</span><br><br>What do you notice?<br><br># Issue the following linux pipeline command to display display '''price''',<br>'''quantity''', '''model''' and '''car make''' of vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $5,$4,$2,$1}' cars.txt | tee awk-5.txt</span><br><br># Issue the following linux pipeline command to display the '''car make''',<br>'''year''' and '''quantity''' of cars that '''begin''' with the '''letter 'f'''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $1,$2,$4}' cars.txt | tee awk-6.txt</span><br><br>[[Image:awk-4.png|thumb|right|400px|Using the awk command to display combined search results based on '''compound operators'''.]]Combined pattern searches can be made<br>by using '''compound operator''' symbols:<br><br>'''&&''' &nbsp;&nbsp;&nbsp;&nbsp;(and)<br>'''||''' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)<br><br># Issue the following linux pipeline command to list all '''fords'''<br>whose '''price is greater than $10,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /ford/ && $5 > 10000 {print $0}' cars.txt | tee awk-7.txt</span><br><br># Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars2.txt</nowiki></span><br><br># Issue the '''cat''' command to quickly view the contents of the '''cars2.txt''' file.<br><br># After Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-8.txt</span><br><br>What did you complete notice?<br><br>The problem is that the Review Questions sections '''cars2.txt''' database separates each field by a semi-colon (''';''') <u>instead</u> of '''TAB'''.<br>Therefore, it does not recognize the second and fourth fields.<br><br>You need to issue awk with the -F option to indicate that this file's fields are separated (delimited) by a semi-colorn.<br><br># Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk -F";" '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-9.txt</span><br><br>What did you notice this time?<br><br># Issue the following to get additional practicerun a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-2</span><br><br>If you encounter errors, then work on yourmake corrections and '''re-run''' the checking script until you<br>online assignment 3receive a congratulations message, then you can proceed.<br><br>
= LINUX PRACTICE QUESTIONS =
# <span style="font-family:courier;font-weight:bold">sed -n '3,6 p' ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">sed '4 q' ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">sed '/the/ d' ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">sed 's/line/NUMBER/g' ~murray.saul/uli101/stuff.txt</span>
# <span style="font-family:courier;font-weight:bold">awk ‘NR == 3 {print}’ ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">awk ‘NR >= 2 && NR <= 5 {print}’ ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $2}’ ~murray.saul/uli101/stuff.txt</span><br><br># <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $3,$2}’ ~murray.saul/uli101/stuff.txt</span><br><br>

Navigation menu