Changes

Jump to: navigation, search

Tutorial11: Sed & Awk Utilities

3,720 bytes added, 10:23, 26 July 2023
no edit summary
{{Admon/caution|DO NOT USE THIS VERSION OF THE LAB. LOOK FOR LAB 10: SED & AWK|'''This is an archive version.'''}}
 
=USING SED & AWK UTILTIES=
<br>
:* Use the '''sed''' command to '''manipulate text''' contained in a file.
:* List and explain several '''addresses''' and '''instructions''' associated with the '''sed''' command.
:* Use the '''sed''' command as a '''filter''' with Linux pipeline commands.
:* Use the '''awk''' command to '''manipulate text''' contained in a file.
:* List and explain several '''comparison operators''', '''variables''' and variables '''actions''' associated with the '''awk''' command.
:* Use the '''awk''' command as a '''filter''' with Linux pipeline commands.
|- valign="top" style="padding-left:15px;"
|colspan="2" |Course Notes'''Slides''':<ul><li>Week 11 Lecture 1 Notes:<br> [[Media:ULI101-Week11.1.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murraychris.sauljohnson/ULI101/uli101ULI101-Week11.1.pptx PPTX]</li><li>Week 11 Lecture 2 Notes:<br> [[Media:ULI101-Week11.2.pdf | PDF]] | [https://ictmatrix.senecacollege.ca/~murrayjason.saulcarman/uli101slides/ULI101-Week11.2.pptx PPTX]<br></li></ul>
| style="padding-left:15px;" |'''Text Manipulation:'''
* [https://www.digitalocean.com/community/tutorials/the-basics-of-using-the-sed-stream-editor-to-manipulate-text-in-linux Purpose of using the sed utility]
* [https://www.digitalocean.com/community/tutorials/how-to-use-the-awk-language-to-manipulate-text-in-linux Purpose of using the awk utility]
| style="padding-left:15px;" |Man Pages'''Commands:'''
* [https://man7.org/linux/man-pages/man1/sed.1p.html sed]
* [https://man7.org/linux/man-pages/man1/awk.1p.html awk]
|colspan="1" style="padding-left:15px;" width="30%"|'''Brauer Instructional Videos:'''<ul><li>[https://www.youtube.com/watch?v=npU6S61AIko&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=14 Using the sed Utility]</li><li>[https://www.youtube.com/watch?v=OV3XzjDYgJo&list=PLU1b1f-2Oe90TuYfifnWulINjMv_Wr16N&index=13 Using the awk Utility]</ul>
|}
'''How it Works:'''
* The sed command reads all lines in the input file and will be exposed to the expression�expression<br>(i.e. area contained within quotes) one line at a time.
* The expression can be within single quotes or double quotes.
* The expression contains an address (match condition) and an instruction (operation).
* If the line matches the address, then it will perform the instruction.
* Lines will display be default unless the '''–n''' option is used to suppress default display<br>
'''Address:'''
* Can use a line number, to select a specific line (for example: '''5''')* Can specify a range of line numbers (for example: '''5,7''')* Regular expressions are contained within forward slashes (e.g. /regular-expression/)* Can specify a regular expression to select all lines that match a pattern (e.g '''/^[0-9].*[0-9]$/''') * When using regular expressions, you must use forward slash(es) /
* If NO address is present, the instruction will apply to ALL lines
 [[Image:sed.png|thumb|right|500px|'''Common instructions''' to take action if text matches an address.]]
'''Instruction:'''
*'''Action''' to take for matched line(s)
*Refer to table on right-side for list of some <br>'''common instructions''' and their purpose
<br><br>
'''Usage:'''
<span style="color:blue;font-weight:bold;font-family:courier;">awk options [-F] 'selection-criteria {action}’ file-name</span>
'''How It Works:'''
* The '''awk ''' command reads all lines in the input file and will be exposed to the expression (contained within quotes) for processing.* The '''expression ''' (contained in quotes) represents '''selection criteria''', and '''action ''' to execute contained within braces '''{}'''* If if selection criteria is matched, then action (between braces) is executed.* The '''–F''' option can be used to specify the default '''field delimiter''' (separator) character<br>eg. '''awk –F”;”''' (would indicate a semi-colon delimited input file).<br>
'''Selection Criteria'''
* You can combine any of the patterns using the Boolean operators '''||''' (OR) and '''&&''' (AND).
* You can use built-in variables (like NR or "record number" representing line number) with comparison operators.<br>For example: '''NR >=1 && NR <= 5'''
<br>
'''Action (execution):'''
* The parameters '''$1''', '''$2''', '''$3''' … '''$9''' represent the first, second and third to the 9th fields contained within the record.
* Parameters greater than nine requires the value of the parameter to be placed within braces (for example: '''${10}''','''${11}''','''${12}''', etc.)
* There are builtYou can use built-in variables that can be used in the awk expression (for example: '''NRvariables''',  (such as '''NFNR''',  or "record number" representing line number)<br>eg. '''FILENAME{print NR,$0}''' (will print record number, etc.then entire record)* You can use the '''-F''' option with the awk command to specify the field delimiter.
=INVESTIGATION 1: USING THE SED UTILITY=
<span style="color:red;">'''ATTENTION''': Effective '''May 9, 2022''' - this online tutorial will be required to be completed by '''Friday in week 11 by midnight'''<br>to obtain a grade of '''2%''' towards this course</span><br><br> In this sectioninvestigation, you will learn how to manipulate text using the '''sed''' utility.
# Issue a Linux command to create a directory called '''sed'''<br><br>
# Issue a Linux command to <u>change</u> to the '''sed''' directory and confirm that you are located in the '''sed''' directory.<br><br>
# Issue the following linux Linux command to download the data.txt file<br>('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/data.txt</nowiki></span><br><br># Issue the '''more''' command to quickly view the contents of the '''data.txt''' file.<br>When finished, exit the more command by pressing the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span><br><br># The [[Image:sed-1.png|thumb|right|300px|Issuing the '''p''' instruction with without using the '''sed-n''' command is used option (to print or suppress original output) will display the contents of a text file.lines twice.]]<br>Issue the following linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">The '''p''' instruction with the '''sed 'p' data.txt</span><br>' command is used to<br>You should notice that each line appears '''twiceprint'''(i.<br>The reason why standard output appears twice is that e. ''display'') the sed commandcontents of a text file.<br><br>(without # Issue the '''-n option''') displays all lines regardless if they had been specified as a pattern.following Linux command:<br><br>We will use span style="color:blue;font-weight:bold;font-family:courier;">sed 'p''pipeline commands''' to both display stdout to the screen and save to filesdata.txt<br/span>for <ubr>confirmation</ubr> of running these pipeline commands when run a '''checking-scriptNOTE: You should notice that each line appears twice'''' later in this investigation.<br><br># Issue The reason why standard output appears twice is that the following linux pipeline sed command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -(without the '''-n option''p' data.txt | tee sed-1) displays all lines regardless of an address used.txt</span><br><br>What do you notice?We will use '''pipeline commands''' to both display stdout to the screen and save to files<br>for <bru>You can specify an address (confirmation</u> of running these pipeline commands when run a ''line #'checking-script', '''line #s''' or range of '''line #s''') when using the sed utility.later in this investigation.<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '1 p' data.txt | tee sed-21.txt</span><br><br>You What do you notice? You should see the first only one line of the text file displayed.<br><br># Issue the following linux pipeline command:You can specify an '''address''' to display lines using the sed utility<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n (eg. ''line #'', ''2,5 p' data.txt | tee sed-3line #s''' or range of '''line #s''').txt</spanbr><br># Issue the following Linux pipeline command:<br>What is displayed? How would you modify the <span style="color:blue;font-weight:bold;font-family:courier;">sed command to display the line range 2 to 5?-n '1 p' data.txt | tee sed-2.txt</span><br><br>The '''s''' instruction is used to substitute patterns (similar to method demonstratedin vi editor)You should see the first line of the text file displayed.<br><br># Issue What other command is used to only display the following linux pipeline command:first line in a file?<br><span style="colorbr>[[Image:blue;fontsed-weight:bold;font-family:courier;">2.png|thumb|right|500px|Using the sed command to display a '''range''2,5 s/TUTORIAL/LESSON/g' dataof lines.txt | tee sed-4.txt</span>]]# Issue the following Linux pipeline command:<br><br>What do you notice? View the original contents of lines span style="color:blue;font-weight:bold;font-family:courier;">sed -n '2 to ,5 in the ''p'data.txt''' file in another shell to confirm that the substitution occurred| tee sed-3.txt</span><br><br>What is displayed? How would you modify the sed command to display the line range 10 to 50?<br><br>The '''qs''' instruction terminates or is used to '''quitssubstitute''' the execution of the sed utility as soon as it read in a particular line or matching pattern.text<br><(a similar to method was demonstrated in the vi editor in tutorial 9).<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '11 q' data2,5 s/TUTORIAL/LESSON/g' data.txt | tee sed-54.txt| more</span><br><br>What did do you notice?<br><br>You can use regular expressions View the original contents of lines 2 to select lines that match a pattern. The rules remain the same for using regular expressions as demonstrated 5 in lab8 except the regular expression must be contained within delimiters such as the forward slash "/" when using the sed utility'''data.txt''' file<br><br># Issue in another shell to confirm that the following linux pipeline command:substitution occurred.<br><span style="colorbr>[[Image:blue;font-weight:bold;font-family:courier;">sed -n -3.png|thumb|right|500px|Using the sed command with the ''/^The/ p' data.txt | tee sed-6q''' option to display up to a line number, then quit.]]The '''q''' instruction terminates or '''quits''' the execution of the sed utility as soon as it is read in a particular line or matching pattern.txt</span><br><br>What do you notice?<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p'11 q' data.txt | tee sed-75.txt</span><br><br>What do did you notice?How many lines were displayed<br>before the sed command exited?<br>The ''<br>You can use 'sed''regular expressions''' utility can also be used as a filter to manipulate text select lines that was generated from linux commandsmatch a pattern.In fact,<br><br># Issue the following linux pipeline sed command:was one of the <bru>first<span style="color:blue;font-weight:bold;font-family:courier;"/u> Linux commands that used regular expression.<br><br>ls | sed -n The rules remain the same for using regular expressions as demonstrated in '''tutorial 9''/txt$/ p' | tee sed-8.txt</spanbr>except the regular expression must be contained within '''forward slashes'''<br><br>What did you notice?<br><br># Issue the following linux pipeline command:<br>(eg. <span style="color:blue;font-weightfamily:boldcourier;font-familyweight:courierbold;">who | sed -n '/^[a-m]regexp/</ p' | tee sed-9span> ).txt | more</span><br><br>What did you notice?<br><br>[[Image:sed-4.png|thumb|right|400px|Using the sed command using regular expressions with '''anchors'''.]]# Issue the following to run a checking scriptLinux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">bash sed -n '/home^The/murrayp' data.saul/scripts/week11txt | tee sed-check-16.txt</span><br>br><br>What do you notice?<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p' data.txt | tee sed-7.txt</span><br><br>What do you notice?<br><br>The '''sed''' utility can also be used as a '''filter''' to manipulate text that<br>was generated from Linux commands.<br><br>[[Image:sed-5.png|thumb|right|400px|Using the sed command with '''pipeline''' commands.]]# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">who | sed -n '/^[a-m]/ p' | tee sed-8.txt | more</span><br><br>What did you notice?<br><br># Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">ls | sed -n '/txt$/ p' | tee sed-9.txt</span><br><br>What did you notice?<br><br># Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-1</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script<br># If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.<br><br>
:In the next investigation, you will learn how to manipulate text using the '''awk''' utility.<br><br>
=INVESTIGATION 2: USING THE AWK UTILITY =
In this sectioninvestigation, you will learn how to use the awk utility to manipulate text and generate reports. 
'''Perform the Following Steps:'''
# Change to your '''home''' directory and issue a command to '''confirm''' <br>you are located<br>in your ''home'' directory.<br><br>
# Issue a Linux command to create a directory called '''awk'''<br><br>
# Issue a Linux command to <u>change</u> to the '''awk''' directory.<br><br># Issue a Linux command to <u>and confirm</u> you are located in the '''awk''' directory.<br><br>Let's download a database file that contains information regarding classic cars.<br><br>
# Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars.txt</nowiki></span><br><br>
# Issue the '''morecat''' command to quickly view the contents of the '''cars.txt''' file.<br>When finished, exit the more <br>The "'''print'''" action (command by pressing ) is the letter <span style="color:blue;font-weight:bold;font-family:courier;">q<u>default</spanu>action of awk to print<br><br>The "all selected lines that match a '''printpattern'''" action (command) is the .<ubr>default</ubr> This '''action of awk to print all selected lines that match a pattern.<br>This action ''' (contained in braces) can provide more options <br>such as printing '''specific fields ''' of selected lines (or records) from a database.<br><br>[[Image:awk-1.png|thumb|right|400px|Using the awk command to display matches of the pattern '''ford'''.]]# Issue the following linux command all to display all lines (i.e. records ) in the "'''cars.txt" ''' database that contain matches the pattern (or "make ") called '''ford"''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/ {print}' cars.txt</span><br><br># Issue the following linux We will use '''pipeline command all commands''' to both display records in stdout to the "cars.txt" database that contain the make "ford":screen and save to files for <bru>confirmation<span style="color:blue;font-weight:bold;font-family:courier;"/u>awk of running these pipeline commands when run a ''/ford/' cars.txt | tee awkchecking-1script''' later in this investigation.txt</span><br><br>What do you notice?# Issue the following linux pipeline command all to display records<br><br>You can use variables with in the "print" action for further processing'''cars. We will discuss txt''' database that contain the following variables in this tutorial:<br><br>pattern (i.e. make) '''$0ford''' - Current record (entire line):<br>'''$1''' - First field in record<br>'''$n''' span style="color:blue;font-weight:bold;font- nth field in record<brfamily:courier;">awk '/ford/''NR''' cars.txt | tee awk- Record Number (order in database)1.txt</span><br> <br>What do you notice? You should notice ALL lines displayed <u>without</u> using '''''NFsearch criteria''' - Number of fields in current record.<br><br>For a listing of more variables, please consult your course notes.<br><br>The You can use ''builtin''tilde character''' variables'''~with the ''' is used to search print''' command for a pattern or display standard output for a particular fieldfurther processing.<br><br># Issue We will discuss the following linux pipeline command to display the model, year, quantity and price variables in the "cars.txt" database for makes of "chevy"this tutorial:<br><span style="colorbr>[[Image:blue;fontawk-weight:bold;font-family:courier;">2.png|thumb|right|400px|Using the awk '/chevy/ {command to print $2,$3,$4,$5}search results by '''field number''' cars.txt | tee awk]]'''$0''' -2.txt</span>Current record (entire line)<br>'''$1''' - First field in record<br>Notice that a space " " is the delimiter for the fields that appear as standard output.<br><'''$n''' - nth field in record<br># Issue the following linux pipeline command to display all plymouths '''NR''' - Record Number (plymsorder in database) by model name, price and quantity:<br>'''NF''' - Number of fields in current record<span style="color:blue;font-weight:bold;font-family:courier;"br><br>awk '/chevy/ {print $2For a listing of more variables,$3,$4,$5}' carsplease consult your course notes.txt | tee awk-3.txt</span><br><br>You can also use comparison operators <br># Issue the following linux pipeline command to specify conditions for processing with matched patterns when using the awk command. Since they are used WITHIN the awk expressiondisplay the '''model''', they are not confused with redirection symbols<br><br> Comparison Operators:<br><br>'''<year''', '''quantity'' &nbsp;&nbsp;&nbsp;&nbsp;Less than' and price<br>in the '''<=cars.txt''' &nbsp;&nbsp;Less than or equal<br>database for makes of '''chevy'>''' &nbsp;&nbsp;&nbsp;&nbsp;Greater than:<br>'''><span style=''' &nbsp"color:blue;&nbspfont-weight:bold;Greater than or equal<br>'''==''' &nbspfont-family:courier;&nbsp;Equal<br">awk '/chevy/ {print $2,$3,$4,$5}'cars.txt | tee awk-2.txt</span><br><br>Notice that a '!=''space''' &nbsp;&nbsp;&nbsp;Not equalis the delimiter for the fields that appear as standard output.<br><br># Issue the following linux pipeline command The '''tilde character''' '''~''' is used to search for a pattern or display display the car make, model number, quantity and price of all vehicles that are prices less than $5,000:standard output for a particular field.<br><span style="color:blue;font-weight:bold;font-family:courier;"br>awk '$5 # Issue the following linux pipeline command to display all '''plymouths''' ('''plym''')< 5000 {print $1,$2br>by '''model name''',$4,$5}' cars.txt | tee awk-4.txt</span>''price''' and '''quantity''':<br><br>What do you notice?<brspan style="color:blue;font-weight:bold;font-family:courier;"><br># Issue the following linux pipeline command to display display the car makeawk '$1 ~ /plym/ {print $2, model number$3, quantity and price of all vehicles that are prices less than $54,000:$5}' cars.txt | tee awk-3.txt</span><br><span style="color:blue;font-weight:bold;font-family:courier;">awk br>You can also use '''comparison operators''$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-5to specify conditions for processing with matched patterns<br>when using the awk command.txtSince they are used WITHIN the awk expression,</spanbr>they are not confused with redirection symbols<br><br>The symbol tilde [[Image:awk-3.png|thumb|right|400px|Using the awk command to display results based on '''~comparison operators''' is used to match a pattern for a particular field number.]]'''<''' &nbsp;&nbsp;&nbsp;&nbsp;Less than<br>'''<br># Issue the following linux pipeline command to display the car make, year and quantity of all car makes that begin with the letter =''f':<br><span style="color:blue&nbsp;font-weight:bold&nbsp;font-family:courier;"Less than or equal<br>awk '$1 ~ /^f/ {print $1,$2,$4}' cars.txt | tee awk-6.txt</span>'>''' &nbsp;&nbsp;&nbsp;&nbsp;Greater than<br><br'''>Compound criteria symbols can be used to join search statements together<br><br>Compound Operators:<br><br>'''&&'=''' &nbsp;&nbsp;&nbsp;&nbsp;(and)Greater than or equal<br>'''||==''' &nbsp;&nbsp;Equal<br>'''!=''' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)<brNot equal<br><br># Issue the following linux pipeline command to list all "fords" that are greater than $10display display the '''car make''',000 in '''model''', '''quantity''' and '''price:<br>''' of all vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1 ~ /ford/ && ,$2,$4,$5 > 10000 {print $0}' cars.txt | tee awk-74.txt</span><br><br>What do you notice?<br><br># Issue the following linux pipeline command to run a checking script:display display '''price''',<br><span style="color:blue;font-weight:bold;font-family:courier;"'''quantity''', '''model''' and '''car make''' of vehicles whose '''prices are less than $5,000''':<br>bash /home/murray.saul/scripts/week11<span style="color:blue;font-checkweight:bold;font-2</spanfamily:courier;">awk '$5 <br5000 {print $5,$4,$2,$1}' cars.txt | tee awk-5.txt</span><br><br># If you encounter errors, make corrections and Issue the following linux pipeline command to display the '''re-runcar make''' the checking script until you,<br>receive a congratulations message, then you can proceed.<br><br>:: After you complete the Review Questions sections to get additional practice, then work on your '''online assignment 3,year'''and ''<br>'quantity''sections 4 to 6'of cars that '' labelled: 'begin''More Scripting (add)'with the '', 'letter 'f'Yet More Scripting (oldfiles)''', and '''sed And :<br><span style="color:blue;font-weight:bold;font-family:courier;">awk'$1 ~ /^f/ {print $1,$2,$4}''cars.txt | tee awk-6.txt</span><br><br>[[Image:awk-4.png|thumb|right|400px|Using the awk command to display combined search results based on '''compound operators'''.]]Combined pattern searches can be made<br>by using '''compound operator''' symbols:<br><br>'''&&''' &nbsp;&nbsp;&nbsp;&nbsp;(and)<br>'''||''' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)<br><br># Issue the following linux pipeline command to list all '''fords'''<br>whose '''price is greater than $10,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /ford/ && $5 > 10000 {print $0}' cars.txt | tee awk-7.txt</span><br><br># Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars2.txt</nowiki></span><br><br># Issue the '''cat''' command to quickly view the contents of the '''cars2.txt''' file.<br><br># Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-8.txt</span><br><br>What did you notice?<br><br>The problem is that the '''cars2.txt''' database separates each field by a semi-colon (''';''') <u>instead</u> of '''TAB'''.<br>Therefore, it does not recognize the second and fourth fields.<br><br>You need to issue awk with the -F option to indicate that this file's fields are separated (delimited) by a semi-colorn.<br><br># Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk -F";" '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-9.txt</span><br><br>What did you notice this time?<br><br># Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~uli101/week11-check-2</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.<br><br>
= LINUX PRACTICE QUESTIONS =

Navigation menu