Changes

Jump to: navigation, search

Tutorial11: Sed & Awk Utilities

2,196 bytes added, 19:44, 10 May 2021
INVESTIGATION 1: USING THE SED UTILITY
* You can combine any of the patterns using the Boolean operators '''||''' (OR) and '''&&''' (AND).
* You can use built-in variables (like NR or "record number" representing line number) with comparison operators.<br>For example: '''NR >=1 && NR <= 5'''
<br>
'''Action (execution):'''
* The parameters '''$1''', '''$2''', '''$3''' … '''$9''' represent the first, second and third to the 9th fields contained within the record.
* Parameters greater than nine requires the value of the parameter to be placed within braces (for example: '''${10}''','''${11}''','''${12}''', etc.)
* You can use built-in '''variables''' (like NR or such as '''NR''' or "record number" �representing representing line number)<br>eg. '''{print NR,$0}''' (will print record number, then entire record)��.
=INVESTIGATION 1: USING THE SED UTILITY=
<span style="color:red;">'''ATTENTION''': Depending on your ULI101 instruction, you may be required to complete this tutorial for marks in the course.<br>Please refer to your instructor's course notes in terms of marked assigned to the course.<br><br>The due date for successfully completing this tutorial (i.e. '''tutorial 11''') is by '''Friday by midnight''' next week (i.e. '''Week 11''').<br>If your instructor has NOT assigned marks for completing this tutorial, you can perform it for practice purposes.</span><br><br> 
In this investigation, you will learn how to manipulate text using the '''sed''' utility.
# Issue a Linux command to create a directory called '''sed'''<br><br>
# Issue a Linux command to <u>change</u> to the '''sed''' directory and confirm that you are located in the '''sed''' directory.<br><br>
# Issue the following linux Linux command to download the data.txt file<br>('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/data.txt</nowiki></span><br><br># Issue the '''more''' command to quickly view the contents of the '''data.txt''' file.<br>When finished, exit the more command by pressing the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span><br><br>[[Image:sed-1.png|thumb|right|300px|Issuing the '''p''' instruction without using the '''-n''' option (to suppress original output) will display lines twice.]]# <br><br>The '''p''' instruction with the '''sed''' command is used to <br>'''print or display the ''' (i.e. ''display'') the contents of a text file. <br><br># Issue the following linux Linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed 'p' data.txt</span><br><br>'''NOTE: You should notice that each line appears '''twice'''.<br><br>The reason why standard output appears twice is that the sed command<br>(without the '''-n option''') displays all lines regardless of an address used.<br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files<br>for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n 'p' data.txt | tee sed-1.txt</span><br><br>What do you notice? You should see only one line.<br><br>You can specify an '''address ''' to display lines using the sed utility<br>(eg. ''line #'', '''line #s''' or range of '''line #s''') when using the sed utility.<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '1 p' data.txt | tee sed-2.txt</span><br><br>You should see the first line of the text file displayed.<br><brWhat other command is used to only display the first line in a file?<br><br>[[Image:sed-2.png|thumb|right|500px|Using the sed command to display a '''range''' of lines.]]# Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '2,5 p' data.txt | tee sed-3.txt</span><br><br>What is displayed? How would you modify the sed command to display the line range 10 to 50?<br><br>The '''s''' instruction is used to '''substitute''' text<br>(a similar to method was demonstrated in the vi editor in tutorial 9).<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '2,5 s/TUTORIAL/LESSON/g' data.txt | tee sed-4.txt | more</span><br><br>What do you notice? View the original contents of lines 2 to 5 in the '''data.txt''' file<br>in another shell to confirm that the substitution occurred.<br><br>[[Image:sed-3.png|thumb|right|500px|Using the sed command with the '''-q''' option to display up to a line number, then quit.]]The '''q''' instruction terminates or '''quits''' the execution of the sed utility as soon as it is read in a particular line or matching pattern.<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '11 q' data.txt | tee sed-5.txt</span><br><br>What did you notice?How many lines were displayed<br>before the sed command exited?<br><br>You can use '''regular expressions ''' to select lines that match a pattern.In fact,<br>The rules remain the same for using regular expressions as demonstratedsed command was one of the <bru>in tutorial9 except the first</u> Linux commands that used regular expression must be contained.<br><br>within The rules remain the same for using regular expressions as demonstrated in '''delimiterstutorial 9''' such as <br>except the regular expression must be contained within '''forward slash slashes'''<br>(eg. <span style="font-family:courier;font-weight:bold;">/regexp/</" when using the sed utilityspan> ).<br><br>[[Image:sed-4.png|thumb|right|400px|Using the sed command using regular expressions with '''anchors'''.]]# Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/^The/ p' data.txt | tee sed-6.txt</span><br><br>What do you notice?<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p' data.txt | tee sed-7.txt</span><br><br>What do you notice?<br><br>The '''sed''' utility can also be used as a '''filter''' to manipulate text that<br>was generated from linux Linux commands.<br><br>[[Image:sed-5.png|thumb|right|400px|Using the sed command with '''pipeline''' commands.]]# Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">who | sed -n '/^[a-m]/ p' | tee sed-8.txt | more</span><br><br>What did you notice?<br><br># Issue the following linux Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">ls | sed -n '/txt$/ p' | tee sed-9.txt</span><br><br>What did you notice?<br><br>
# Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">bash /home/murray.saul/myscripts/week11-check-1</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script<br>until you receive a congratulations message, then you can proceed.<br><br>
# Change to your '''home''' directory and issue a command to '''confirm'''<br>you are located in your ''home'' directory.<br><br>
# Issue a Linux command to create a directory called '''awk'''<br><br>
# Issue a Linux command to <u>change</u> to the '''awk''' directory and confirm you are located in the '''awk''' directory.<br><br>Let's download a database file that contains information regarding classic cars.<br><br>
# Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars.txt</nowiki></span><br><br>
# Issue the '''morecat''' command to quickly view the contents of the '''cars.txt''' file.<br>When finished, exit the more command by pressing the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span><br><br>The "'''print'''" action (command) is the <u>default</u> action of awkto print<br>to print all selected lines that match a '''pattern'''.<br><br>This '''action''' (contained in braces) can provide more options<br>such as printing '''specific fields ''' of selected lines (or records) from a database.<br><br>[[Image:awk-1.png|thumb|right|400px|Using the awk command to display matches of the pattern '''ford'''.]]
# Issue the following linux command all to display all lines (i.e. records) in the '''cars.txt''' database that matches the pattern (or "make") called '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/ {print}' cars.txt</span><br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br>
# Issue the following linux pipeline command all to display records<br>in the '''cars.txt''' database that contain the pattern (i.e. make) '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/' cars.txt | tee awk-1.txt</span><br><br>What do you notice? You should notice ALL lines displayed <u>without </u> using a '''search criteria'''.<br><br>You can use these ''builtin'' '''variables''' with the '''print''' command for further processing.<br>We will discuss the following variables in this tutorial:<br><br>[[Image:awk-2.png|thumb|right|400px|Using the awk command to print search results by '''field number'''.]]'''$0''' - Current record (entire line)<br>'''$1''' - First field in record<br>'''$n''' - nth field in record<br>'''NR''' - Record Number (order in database)<br> '''NF''' - Number of fields in current record<br><br>For a listing of more variables, please consult your course notes.<br><br>
# Issue the following linux pipeline command to display the '''model''', '''year''', '''quantity''' and price<br>in the '''cars.txt''' database for makes of '''chevy''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/chevy/ {print $2,$3,$4,$5}' cars.txt | tee awk-2.txt</span><br><br>Notice that a '''space''' is the delimiter for the fields that appear as standard output.<br><br>The '''tilde character''' '''~''' is used to search for a pattern or display standard output for a particular field.<br><br>
# Issue the following linux pipeline command to display all '''plymouths''' ('''plymsplym''')<br>by '''model name''', '''price''' and '''quantity''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /plymsplym/ {print $2,$3,$4,$5}' cars.txt | tee awk-3.txt</span><br><br>You can also use '''comparison operators''' to specify conditions for processing with matched patterns<br>when using the awk command. Since they are used WITHIN the awk expression,<br>they are not confused with redirection symbols<br><br>[[Image:awk-3.png|thumb|right|400px|Using the awk command to display results based on '''comparison operators'''.]]'''<''' &nbsp;&nbsp;&nbsp;&nbsp;Less than<br>'''<=''' &nbsp;&nbsp;Less than or equal<br>'''>''' &nbsp;&nbsp;&nbsp;&nbsp;Greater than<br>'''>=''' &nbsp;&nbsp;Greater than or equal<br>'''==''' &nbsp;&nbsp;Equal<br>'''!=''' &nbsp;&nbsp;&nbsp;Not equal<br><br>
# Issue the following linux pipeline command to display display the '''car make''', '''model''', '''quantity''' and '''price''' of all vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-4.txt</span><br><br>What do you notice?<br><br>
# Issue the following linux pipeline command to display display '''car make''',<br>'''model''', '''quantity''' and '''price''' of vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-5.txt</span><br><br>
# Issue the following linux pipeline command to display the '''car make''',<br>'''year''' and '''quantity''' of cars that '''begin''' with the '''letter 'f'''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $1,$2,$4}' cars.txt | tee awk-6.txt</span><br><br>[[Image:awk-4.png|thumb|right|400px|Using the awk command to display combined search results based on '''compound operators'''.]]Combined pattern searches can be made<br>by using '''compound operator''' symbols:<br><br>'''&&''' &nbsp;&nbsp;&nbsp;&nbsp;(and)<br>'''||''' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)<br><br>
# Issue the following linux pipeline command to list all '''fords'''<br>whose '''price is greater than $10,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /ford/ && $5 > 10000 {print $0}' cars.txt | tee awk-7.txt</span><br><br>
# Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/cars2.txt</nowiki></span><br><br>
# Issue the '''cat''' command to quickly view the contents of the '''cars2.txt''' file.<br><br>
# Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-8.txt</span><br><br>What did you notice?<br><br>The problem is that the '''cars2.txt''' database separates each field by a semi-colon (''';''') <u>instead</u> of '''TAB'''.<br>Therefore, it does not recognize the second and fourth fields.<br><br>You need to issue awk with the -F option to indicate that this file's fields are separated (delimited) by a semi-colorn.<br><br>
# Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk -F";" '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-9.txt</span><br><br>What did you notice this time?<br><br>
# Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">bash /home/murray.saul/myscripts/week11-check-2</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.<br><br>
:: After you complete the Review Questions sections to get additional practice,<br>then work on your '''online assignment 3, section 2: Awk &amp; Sed'''<br><br>
13,420
edits

Navigation menu