Changes

Jump to: navigation, search

Tutorial11: Sed & Awk Utilities

3,501 bytes added, 10:23, 26 July 2023
no edit summary
{{Admon/caution|DO NOT USE THIS VERSION OF THE LAB. LOOK FOR LAB 10: SED & AWK|'''This is an archive version.'''}}
 
=USING SED & AWK UTILTIES=
<br>
:* Use the '''sed''' command to '''manipulate text''' contained in a file.
:* List and explain several '''addresses''' and '''instructions''' associated with the '''sed''' command.
:* Use the '''sed''' command as a '''filter''' with Linux pipeline commands.
:* Use the '''awk''' command to '''manipulate text''' contained in a file.
:* List and explain several '''comparison operators''', '''variables''' and variables '''actions''' associated with the '''awk''' command.
:* Use the '''awk''' command as a '''filter''' with Linux pipeline commands.
|- valign="top" style="padding-left:15px;"
|colspan="2" |Course Notes'''Slides''':<ul><li>Week 11 Lecture 1 Notes:<br> [[Media:ULI101-Week11.1.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murraychris.sauljohnson/ULI101/uli101ULI101-Week11.1.pptx PPTX]</li><li>Week 11 Lecture 2 Notes:<br> [[Media:ULI101-Week11.2.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murrayjason.saulcarman/uli101slides/ULI101-Week11.2.pptx PPTX]<br></li></ul>
| style="padding-left:15px;" |'''Text Manipulation:'''
* [https://www.digitalocean.com/community/tutorials/the-basics-of-using-the-sed-stream-editor-to-manipulate-text-in-linux Purpose of using the sed utility]
* [https://www.digitalocean.com/community/tutorials/how-to-use-the-awk-language-to-manipulate-text-in-linux Purpose of using the awk utility]
| style="padding-left:15px;" |Man Pages'''Commands:'''
* [https://man7.org/linux/man-pages/man1/sed.1p.html sed]
* [https://man7.org/linux/man-pages/man1/awk.1p.html awk]
|colspan="1" style="padding-left:15px;" width="30%"|'''Brauer Instructional Videos:'''<ul><li>[https://www.youtube.com/watch?v=npU6S61AIko&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=14 Using the sed Utility]</li><li>[https://www.youtube.com/watch?v=OV3XzjDYgJo&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=13 Using the awk Utility]</ul>
|}
'''How it Works:'''
* The sed command reads all lines in the input file and will be exposed to the expression�expression<br>(i.e. area contained within quotes) one line at a time.
* The expression can be within single quotes or double quotes.
* The expression contains an address (match condition) and an instruction (operation).
* If the line matches the address, then it will perform the instruction.
* Lines will display be default unless the '''–n''' option is used to suppress default display<br>
'''Address:'''
* Can use a line number, to select a specific line (for example: '''5''')* Can specify a range of line numbers (for example: '''5,7''')* Regular expressions are contained within forward slashes (e.g. /regular-expression/)* Can specify a regular expression to select all lines that match a pattern (e.g '''/^[0-9].*[0-9]$/''') * When using regular expressions, you must use forward slash(es) /
* If NO address is present, the instruction will apply to ALL lines
 [[Image:sed.png|thumb|right|500px|'''Common instructions''' to take action if text matches an address.]]
'''Instruction:'''
*'''Action''' to take for matched line(s)
*Refer to table on right-side for list of some <br>'''common instructions''' and their purpose
<br><br>
'''Usage:'''
<span style="color:blue;font-weight:bold;font-family:courier;">awk options [-F] 'selection-criteria {action}’ file-name</span>
'''How It Works:'''
* The '''awk ''' command reads all lines in the input file and will be exposed to the expression (contained within quotes) for processing.* The '''expression ''' (contained in quotes) represents '''selection criteria''', and '''action ''' to execute contained within braces '''{}'''* If if selection criteria is matched, then action (between braces) is executed.* The '''–F''' option can be used to specify the default '''field delimiter''' (separator) character<br>eg. '''awk –F”;”''' (would indicate a semi-colon delimited input file).<br>
'''Selection Criteria'''
* You can combine any of the patterns using the Boolean operators '''||''' (OR) and '''&&''' (AND).
* You can use built-in variables (like NR or "record number" representing line number) with comparison operators.<br>For example: '''NR >=1 && NR <= 5'''
<br>
'''Action (execution):'''
* The parameters '''$1''', '''$2''', '''$3''' … '''$9''' represent the first, second and third to the 9th fields contained within the record.
* Parameters greater than nine requires the value of the parameter to be placed within braces (for example: '''${10}''','''${11}''','''${12}''', etc.)
* There are builtYou can use built-in variables that can be used in the awk expression (for example: '''NRvariables''',  (such as '''NFNR''',  or "record number" representing line number)<br>eg. '''FILENAME{print NR,$0}''' (will print record number, etc.then entire record)* You can use the '''-F''' option with the awk command to specify the field delimiter.
=INVESTIGATION 1: USING THE SED UTILITY=
<span style="color:red;">'''ATTENTION''': Effective '''May 9, 2022''' - this online tutorial will be required to be completed by '''Friday in week 11 by midnight'''<br>to obtain a grade of '''2%''' towards this course</span><br><br> In this sectioninvestigation, you will learn how to manipulate text using the '''sed''' utility.
# Issue a Linux command to create a directory called '''sed'''<br><br>
# Issue a Linux command to <u>change</u> to the '''sed''' directory and confirm that you are located in the '''sed''' directory.<br><br>
# Issue the following linux Linux command to download the data.txt file<br>('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/data.txt</nowiki></span><br><br># Issue the '''more''' command to quickly view the contents of the '''data.txt''' file.<br>When finished, exit the more command by pressing the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span><br><br># The [[Image:sed-1.png|thumb|right|300px|Issuing the '''p''' instruction with without using the '''sed-n''' command is used option (to print or suppress original output) will display the contents of a text file.lines twice.]]<br>Issue the following linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">The '''p''' instruction with the '''sed 'p' data.txt</span><br>' command is used to<br>You should notice that each line appears '''twiceprint'''(i.<br>The reason why standard output appears twice is that e. ''display'') the sed commandcontents of a text file.<br><br>(without # Issue the '''-n option''') displays all lines regardless if they had been specified as a pattern.following Linux command:<br><br>We will use span style="color:blue;font-weight:bold;font-family:courier;">sed 'p''pipeline commands''' to both display stdout to the screen and save to filesdata.txt<br/span>for <ubr>confirmation</ubr> of running these pipeline commands when run a '''checking-scriptNOTE: You should notice that each line appears twice'''' later in this investigation.<br><br># Issue The reason why standard output appears twice is that the following linux pipeline sed command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -(without the '''-n option''p' data.txt | tee sed-1) displays all lines regardless of an address used.txt</span><br><br>What do you notice?We will use '''pipeline commands''' to both display stdout to the screen and save to files<br>for <bru>You can specify an address (confirmation</u> of running these pipeline commands when run a ''line #'checking-script', '''line #s''' or range of '''line #s''') when using the sed utility.later in this investigation.<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '1 p' data.txt | tee sed-21.txt</span><br><br>You What do you notice? You should see the first only one line of the text file displayed.<br><br># Issue the following linux pipeline command:You can specify an '''address''' to display lines using the sed utility<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n (eg. ''line #'', ''2,5 p' data.txt | tee sed-3line #s''' or range of '''line #s''').txt</spanbr><br># Issue the following Linux pipeline command:<br>What is displayed? How would you modify the <span style="color:blue;font-weight:bold;font-family:courier;">sed command to display the line range 2 to 5?-n '1 p' data.txt | tee sed-2.txt</span><br><br>The '''s''' instruction is used to substitute patterns (similar to method demonstratedin vi editor)You should see the first line of the text file displayed.<br><br># Issue What other command is used to only display the following linux pipeline command:first line in a file?<br><span style="colorbr>[[Image:blue;fontsed-weight:bold;font-family:courier;">2.png|thumb|right|500px|Using the sed command to display a '''range''2,5 s/TUTORIAL/LESSON/g' dataof lines.txt | tee sed-4.txt</span>]]# Issue the following Linux pipeline command:<br><br>What do you notice? View the original contents of lines span style="color:blue;font-weight:bold;font-family:courier;">sed -n '2 to ,5 in the ''p'data.txt''' file in another shell to confirm that the substitution occurred| tee sed-3.txt</span><br><br>What is displayed? How would you modify the sed command to display the line range 10 to 50?<br><br>The '''qs''' instruction terminates or is used to '''quitssubstitute''' the execution of the sed utility as soon as it read in a particular line or matching pattern.text<br><(a similar to method was demonstrated in the vi editor in tutorial 9).<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '11 q' data2,5 s/TUTORIAL/LESSON/g' data.txt | tee sed-54.txt| more</span><br><br>What did do you notice?<br><br>You can use regular expressions View the original contents of lines 2 to select lines that match a pattern. The rules remain the same for using regular expressions as demonstrated 5 in lab8 except the regular expression must be contained within delimiters such as the forward slash "/" when using the sed utility'''data.txt''' file<br><br># Issue in another shell to confirm that the following linux pipeline command:substitution occurred.<br><span style="colorbr>[[Image:blue;font-weight:bold;font-family:courier;">sed -n -3.png|thumb|right|500px|Using the sed command with the ''/^The/ p' data.txt | tee sed-6q''' option to display up to a line number, then quit.]]The '''q''' instruction terminates or '''quits''' the execution of the sed utility as soon as it is read in a particular line or matching pattern.txt</span><br><br>What do you notice?<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p'11 q' data.txt | tee sed-75.txt</span><br><br>What do did you notice?How many lines were displayed<br>before the sed command exited?<br>The ''<br>You can use 'sed''regular expressions''' utility can also be used as a filter to manipulate text select lines that was generated from linux commandsmatch a pattern.In fact,<br><br># Issue the following linux pipeline sed command:was one of the <bru>first<span style="color:blue;font-weight:bold;font-family:courier;"/u> Linux commands that used regular expression.<br><br>ls | sed -n The rules remain the same for using regular expressions as demonstrated in '''tutorial 9''/txt$/ p' | tee sed-8.txt</spanbr>except the regular expression must be contained within '''forward slashes'''<br><br>What did you notice?<br><br># Issue the following linux pipeline command:<br>(eg. <span style="color:blue;font-weightfamily:boldcourier;font-familyweight:courierbold;">who | sed -n '/^[a-m]regexp/</ p' | tee sed-9span> ).txt | more</span><br><br>What did you notice?<br><br>[[Image:sed-4.png|thumb|right|400px|Using the sed command using regular expressions with '''anchors'''.]]# Issue the following to run a checking scriptLinux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">bash sed -n '/home^The/murrayp' data.saul/scripts/week11txt | tee sed-check-16.txt</span><br>><br><br>What do you notice?<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p' data.txt | tee sed-7.txt</span><br><br>What do you notice?<br><br>The '''sed''' utility can also be used as a '''filter''' to manipulate text that<br>was generated from Linux commands.<br><br>[[Image:sed-5.png|thumb|right|400px|Using the sed command with '''pipeline''' commands.]]# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">who | sed -n '/^[a-m]/ p' | tee sed-8.txt | more</span><br><br>What did you notice?<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">ls | sed -n '/txt$/ p' | tee sed-9.txt</span><br><br>What did you notice?<br><br># Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-1</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script<br>If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.<br><br>
:In the next investigation, you will learn how to manipulate text using the '''awk''' utility.<br><br>
=INVESTIGATION 2: USING THE AWK UTILITY =
In this sectioninvestigation, you will learn how to use the awk utility to manipulate text and generate reports. 
'''Perform the Following Steps:'''
# Change to your '''home''' directory and issue a command to '''confirm''' <br>you are located<br>in your ''home'' directory.<br><br>
# Issue a Linux command to create a directory called '''awk'''<br><br>
# Issue a Linux command to <u>change</u> to the '''awk''' directory.<br><br># Issue a Linux command to <u>and confirm</u> you are located in the '''awk''' directory.<br><br>Let's download a database file that contains information regarding classic cars.<br><br>
# Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars.txt</nowiki></span><br><br>
# Issue the '''morecat''' command to quickly view the contents of the '''cars.txt''' file.<br>When finished, exit the more <br>The "'''print'''" action (command by pressing ) is the letter <span style="color:blue;font-weight:bold;font-family:courier;">q<u>default</spanu>action of awk to print<br><br>The "all selected lines that match a '''printpattern'''" action (command) is the .<ubr>default</ubr> This '''action of awk to print all selected lines that match a pattern.<br>This action ''' (contained in braces) can provide more options <br>such as printing '''specific fields ''' of selected lines (or records) from a database.<br><br>[[Image:awk-1.png|thumb|right|400px|Using the awk command to display matches of the pattern '''ford'''.]]# Issue the following linux command all to display all lines (i.e. records ) in the "'''cars.txt" ''' database that contain matches the pattern (or "make ") called '''ford"''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/ {print}' cars.txt</span><br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files<br>for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br># Issue the following linux pipeline command all to display records <br>in the "'''cars.txt" ''' database that contain the pattern (i.e. make ") '''ford"''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/' cars.txt | tee awk-1.txt</span><br><br>What do you notice?You should notice ALL lines displayed <bru>without<br/u>You can use variables with the "print" action for further processingusing '''search criteria'''. We will discuss the following variables in this tutorial:<br><br>You can use '''$0builtin''' - Current record (entire line)<br>''variables'$1''with the ' - First field in record<br>''print'$n''' - nth field command for further processing.<br>We will discuss the following variables in recordthis tutorial:<br><br>'''NR''' [[Image:awk- Record Number (order in database)<br> 2.png|thumb|right|400px|Using the awk command to print search results by '''NFfield number'''.]]'''$0''' - Number of fields in current Current record(entire line)<br>'''$1''' - First field in record<br>For a listing of more variables, please consult your course notes.<br><br>The '''tilde character''$n' '' - nth field in record<br>''~'NR''' is used to search for a pattern or display standard output for a particular field.- Record Number (order in database)<br> '''NF''' - Number of fields in current record<br><br># Issue the following linux For a listing of more variables, please consult your course notes.<br><br># Issue the following linux pipeline command to display the '''model''', year'''year''', '''quantity ''' and price <br>in the "'''cars.txt" ''' database for makes of "'''chevy"''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/chevy/ {print $2,$3,$4,$5}' cars.txt | tee awk-2.txt</span><br><br>Notice that a '''space " " ''' is the delimiter for the fields that appear as standard output.<br><br># Issue the following linux pipeline command to display all plymouths (plyms) by model name, price and quantity:<br>The '''tilde character''' '''~''' is used to search for a pattern or display standard output for a particular field.<span style="color:blue;font-weight:bold;font-family:courier;"br>awk <br># Issue the following linux pipeline command to display all '/chevy/ {print $2,$3,$4,$5}' cars.txt | tee awk-3.txt</span>'plymouths''' ('''plym''')<br><by '''model name''', '''price''' and '''quantity''':<br>You can also use comparison operators to specify conditions for processing with matched patterns when using the <span style="color:blue;font-weight:bold;font-family:courier;">awk command. Since they are used WITHIN the awk expression'$1 ~ /plym/ {print $2,$3, they are not confused with redirection symbols$4,$5}' cars.txt | tee awk-3.txt<br/span><br> Comparison Operators:<br><br>You can also use '''<comparison operators''' &nbsp;&nbsp;&nbsp;&nbsp;Less thanto specify conditions for processing with matched patterns<br>'''when using the awk command. Since they are used WITHIN the awk expression,<=''' &nbsp;&nbsp;Less than or equalbr>they are not confused with redirection symbols<br>'''<br>[[Image:awk-3.png|thumb|right|400px|Using the awk command to display results based on ''' comparison operators'''.]]'''<''' &nbsp;&nbsp;&nbsp;&nbsp;Greater Less than<br>'''><=''' &nbsp;&nbsp;Greater Less than or equal<br>'''==>''' &nbsp;&nbsp;Equal<br&nbsp;&nbsp;Greater than<br>'''!>=''' &nbsp;&nbsp;&nbsp;Not Greater than or equal<br>'''==''' &nbsp;&nbsp;Equal<br>'''!=''' &nbsp;&nbsp;&nbsp;Not equal<br><br># Issue the following linux pipeline command to display display the '''car make, ''', '''model number''', '''quantity ''' and '''price ''' of all vehicles that whose '''prices are prices less than $5,000:''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-4.txt</span><br><br>What do you notice?<br><br># Issue the following linux pipeline command to display display the car make, model number, quantity and '''price of all vehicles that are prices less than $5''',000:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1''quantity''',$2,$4,$5}' cars.txt | tee awk-5.txt</span><br><br>The symbol tilde ''model''' and '''car make'''~of vehicles whose ''' is used to match a pattern for a particular field number.prices are less than $5,000''':<br><br># Issue the following linux pipeline command to display the car make, year and quantity of all car makes that begin with the letter 'f':<br><span style="color:blue;font-weightspan style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ 5 < 5000 {print $15,$4,$2,$41}' cars.txt | tee awk-65.txt</span><br><br>Compound criteria symbols can be used # Issue the following linux pipeline command to join search statements together<br><br>Compound Operators:<br>display the '''car make''',<br>'''&&year''' and '''quantity''' &nbsp;&nbsp;&nbsp;&nbsp;(and)<br>of cars that '''begin'''with the '||''letter ' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)f'''':<br><br># Issue the following linux pipeline command to list all "fords" that are greater than $10,000 in pricespan style="color:blue;font-weight:<br><span style="color:blue;font-weight:bold;bold;font-family:courier;">awk '$1 ~ /ford^f/ && $5 > 10000 {print $01,$2,$4}' cars.txt | tee awk-76.txt</span><br><br># Issue [[Image:awk-4.png|thumb|right|400px|Using the following awk command to run a checking scriptdisplay combined search results based on '''compound operators'''.]]Combined pattern searches can be made<br>by using '''compound operator''' symbols:<br><span style="color:bluebr>'''&&''' &nbsp;&nbsp;font-weight:bold&nbsp;font-family:courier&nbsp;">bash /home/murray.saul/scripts/week11-check-2</span>(and)<br><br># If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.'||''' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)<br><br>:: After you complete # Issue the Review Questions sections following linux pipeline command to get additional practice, then work on your list all '''online assignment 3,fords'''<br>whose '''sections 4 to 6price is greater than $10,000''' labelled: <br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /ford/ && $5 > 10000 {print $0}''More Scripting (add)''', '''Yet More Scripting (oldfiles)cars.txt | tee awk-7.txt</span><br><br># Issue the following linux command (''', copy and paste'''sed And awk''to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars2.txt</nowiki></span><br><br># Issue the '''cat''' command to quickly view the contents of the '''cars2.txt''' file.<br><br># Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-8.txt</span><br><br>What did you notice?<br><br>The problem is that the '''cars2.txt''' database separates each field by a semi-colon (''';''') <u>instead</u> of '''TAB'''.<br>Therefore, it does not recognize the second and fourth fields.<br><br>You need to issue awk with the -F option to indicate that this file's fields are separated (delimited) by a semi-colorn.<br><br># Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk -F";" '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-9.txt</span><br><br>What did you notice this time?<br><br># Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-2</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.<br><br>
= LINUX PRACTICE QUESTIONS =

Navigation menu