Changes

Jump to: navigation, search

Tutorial 11 - SED & AWK

17,156 bytes added, 12:16, 8 June 2023
m
Protected "Tutorial 11 - SED & AWK": OER transfer ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite))
Content under development
=USING SED & AWK UTILTIES=
===Main Objectives of this Practice Tutorial===
= INVESTIGATION 1: USING THE SED UTILITY =
<span style="color:red;">'''ATTENTION''': The due date for successfully completing this tutorial (i.e. tutorial 11) is by Friday, April 21 @ 11:59 PM (Week 14).</span><br>
 
In this investigation, you will learn how to manipulate text using the '''sed''' utility.
 
 
'''Perform the Following Steps:'''
 
# '''Login''' to your matrix account and confirm you are located in your '''home''' directory.<br><br>
# Issue a Linux command to create a directory called '''sed'''<br><br>
# Issue a Linux command to <u>change</u> to the '''sed''' directory and confirm that you are located in the '''sed''' directory.<br><br>
# Issue the following Linux command to download the data.txt file<br>('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">cp ~osl640/tutorial11/data.txt .</span><br><br>
# Issue the '''more''' command to quickly view the contents of the '''data.txt''' file.<br>When finished, exit the more command by pressing the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span>[[Image:sed-1.png|thumb|right|300px|Issuing the '''p''' instruction without using the '''-n''' option (to suppress original output) will display lines twice.]]<br><br>The '''p''' instruction with the '''sed''' command is used to<br>'''print''' (i.e. ''display'') the contents of a text file.<br><br>
# Issue the following Linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed 'p' data.txt</span><br><br>'''NOTE: You should notice that each line appears twice'''.<br><br>The reason why standard output appears twice is that the sed command<br>(without the '''-n option''') displays all lines regardless of an address used.<br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files<br>for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br>
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n 'p' data.txt | tee sed-1.txt</span><br><br>What do you notice? You should see only one line.<br><br>You can specify an '''address''' to display lines using the sed utility<br>(eg. ''line #'', '''line #s''' or range of '''line #s''').<br><br>
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '1 p' data.txt | tee sed-2.txt</span><br><br>You should see the first line of the text file displayed.<br>What other command is used to only display the first line in a file?<br><br>[[Image:sed-2.png|thumb|right|500px|Using the sed command to display a '''range''' of lines.]]
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '2,5 p' data.txt | tee sed-3.txt</span><br><br>What is displayed? How would you modify the sed command to display the line range 10 to 50?<br><br>The '''s''' instruction is used to '''substitute''' text<br>(a similar to method was demonstrated in the vi editor in tutorial 9).<br><br>
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '2,5 s/TUTORIAL/LESSON/g' data.txt | tee sed-4.txt | more</span><br><br>What do you notice? View the original contents of lines 2 to 5 in the '''data.txt''' file<br>in another shell to confirm that the substitution occurred.<br><br>[[Image:sed-3.png|thumb|right|500px|Using the sed command with the '''-q''' option to display up to a line number, then quit.]]The '''q''' instruction terminates or '''quits''' the execution of the sed utility as soon as it is read in a particular line or matching pattern.<br><br>
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed '11 q' data.txt | tee sed-5.txt</span><br><br>What did you notice? How many lines were displayed<br>before the sed command exited?<br><br>You can use '''regular expressions''' to select lines that match a pattern. In fact,<br>the sed command was one of the <u>first</u> Linux commands that used regular expression.<br><br>The rules remain the same for using regular expressions as demonstrated in '''tutorial 9'''<br>except the regular expression must be contained within '''forward slashes'''<br>(eg. <span style="font-family:courier;font-weight:bold;">/regexp/</span> ).<br><br>[[Image:sed-4.png|thumb|right|400px|Using the sed command using regular expressions with '''anchors'''.]]
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/^The/ p' data.txt | tee sed-6.txt</span><br><br>What do you notice?<br><br>
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">sed -n '/d$/ p' data.txt | tee sed-7.txt</span><br><br>What do you notice?<br><br>The '''sed''' utility can also be used as a '''filter''' to manipulate text that<br>was generated from Linux commands.<br><br>[[Image:sed-5.png|thumb|right|400px|Using the sed command with '''pipeline''' commands.]]
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">who | sed -n '/^[a-m]/ p' | tee sed-8.txt | more</span><br><br>What did you notice?<br><br>
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">ls | sed -n '/txt$/ p' | tee sed-9.txt</span><br><br>What did you notice?<br><br>
# Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~osl640/week11-check-1</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script<br>until you receive a congratulations message, then you can proceed.<br><br>
 
:In the next investigation, you will learn how to manipulate text using the '''awk''' utility.<br><br>
= INVESTIGATION 2: USING THE AWK UTILITY =
In this investigation, you will learn how to use the awk utility to manipulate text and generate reports.
 
'''Perform the Following Steps:'''
 
# Change to your '''home''' directory and issue a command to '''confirm'''<br>you are located in your ''home'' directory.<br><br>
# Issue a Linux command to create a directory called '''awk'''<br><br>
# Issue a Linux command to <u>change</u> to the '''awk''' directory and confirm you are located in the '''awk''' directory.<br><br>Let's download a database file that contains information regarding classic cars.<br><br>
# Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">cp ~osl640/cars.txt .<br><br>
# Issue the '''cat''' command to quickly view the contents of the '''cars.txt''' file.<br><br>The "'''print'''" action (command) is the <u>default</u> action of awk to print<br>all selected lines that match a '''pattern'''.<br><br>This '''action''' (contained in braces) can provide more options<br>such as printing '''specific fields''' of selected lines (or records) from a database.<br><br>[[Image:awk-1.png|thumb|right|400px|Using the awk command to display matches of the pattern '''ford'''.]]
# Issue the following linux command all to display all lines (i.e. records) in the '''cars.txt''' database that matches the pattern (or "make") called '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/ {print}' cars.txt</span><br><br>We will use '''pipeline commands''' to both display stdout to the screen and save to files for <u>confirmation</u> of running these pipeline commands when run a '''checking-script''' later in this investigation.<br><br>
# Issue the following linux pipeline command all to display records<br>in the '''cars.txt''' database that contain the pattern (i.e. make) '''ford''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/ford/' cars.txt | tee awk-1.txt</span><br><br>What do you notice? You should notice ALL lines displayed <u>without</u> using '''search criteria'''.<br><br>You can use ''builtin'' '''variables''' with the '''print''' command for further processing.<br>We will discuss the following variables in this tutorial:<br><br>[[Image:awk-2.png|thumb|right|400px|Using the awk command to print search results by '''field number'''.]]'''$0''' - Current record (entire line)<br>'''$1''' - First field in record<br>'''$n''' - nth field in record<br>'''NR''' - Record Number (order in database)<br> '''NF''' - Number of fields in current record<br><br>For a listing of more variables, please consult your course notes.<br><br>
# Issue the following linux pipeline command to display the '''model''', '''year''', '''quantity''' and price<br>in the '''cars.txt''' database for makes of '''chevy''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '/chevy/ {print $2,$3,$4,$5}' cars.txt | tee awk-2.txt</span><br><br>Notice that a '''space''' is the delimiter for the fields that appear as standard output.<br><br>The '''tilde character''' '''~''' is used to search for a pattern or display standard output for a particular field.<br><br>
# Issue the following linux pipeline command to display all '''plymouths''' ('''plym''')<br>by '''model name''', '''price''' and '''quantity''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /plym/ {print $2,$3,$4,$5}' cars.txt | tee awk-3.txt</span><br><br>You can also use '''comparison operators''' to specify conditions for processing with matched patterns<br>when using the awk command. Since they are used WITHIN the awk expression,<br>they are not confused with redirection symbols<br><br>[[Image:awk-3.png|thumb|right|400px|Using the awk command to display results based on '''comparison operators'''.]]'''<''' &nbsp;&nbsp;&nbsp;&nbsp;Less than<br>'''<=''' &nbsp;&nbsp;Less than or equal<br>'''>''' &nbsp;&nbsp;&nbsp;&nbsp;Greater than<br>'''>=''' &nbsp;&nbsp;Greater than or equal<br>'''==''' &nbsp;&nbsp;Equal<br>'''!=''' &nbsp;&nbsp;&nbsp;Not equal<br><br>
# Issue the following linux pipeline command to display display the '''car make''', '''model''', '''quantity''' and '''price''' of all vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-4.txt</span><br><br>What do you notice?<br><br>
# Issue the following linux pipeline command to display display '''car make''',<br>'''model''', '''quantity''' and '''price''' of vehicles whose '''prices are less than $5,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$5 < 5000 {print $1,$2,$4,$5}' cars.txt | tee awk-5.txt</span><br><br>
# Issue the following linux pipeline command to display the '''car make''',<br>'''year''' and '''quantity''' of cars that '''begin''' with the '''letter 'f'''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $1,$2,$4}' cars.txt | tee awk-6.txt</span><br><br>[[Image:awk-4.png|thumb|right|400px|Using the awk command to display combined search results based on '''compound operators'''.]]Combined pattern searches can be made<br>by using '''compound operator''' symbols:<br><br>'''&&''' &nbsp;&nbsp;&nbsp;&nbsp;(and)<br>'''||''' &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(or)<br><br>
# Issue the following linux pipeline command to list all '''fords'''<br>whose '''price is greater than $10,000''':<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /ford/ && $5 > 10000 {print $0}' cars.txt | tee awk-7.txt</span><br><br>
# Issue the following linux command ('''copy and paste''' to save time):<br><span style="color:blue;font-weight:bold;font-family:courier;">cp ~osl640/tutorial11/cars2.txt .</span><br><br>
# Issue the '''cat''' command to quickly view the contents of the '''cars2.txt''' file.<br><br>
# Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-8.txt</span><br><br>What did you notice?<br><br>The problem is that the '''cars2.txt''' database separates each field by a semi-colon (''';''') <u>instead</u> of '''TAB'''.<br>Therefore, it does not recognize the second and fourth fields.<br><br>You need to issue awk with the -F option to indicate that this file's fields are separated (delimited) by a semi-colorn.<br><br>
# Issue the following linux pipeline command to display the '''year'''<br>and '''quantity''' of cars that '''begin''' with the '''letter 'f'''' for the '''cars2.txt''' database:<br><span style="color:blue;font-weight:bold;font-family:courier;">awk -F";" '$1 ~ /^f/ {print $2,$4}' cars2.txt | tee awk-9.txt</span><br><br>What did you notice this time?<br><br>
# Issue the following to run a checking script:<br><span style="color:blue;font-weight:bold;font-family:courier;">~osl640/week11-check-2</span><br><br>If you encounter errors, make corrections and '''re-run''' the checking script until you<br>receive a congratulations message, then you can proceed.<br><br>
:: Complete the Review Questions sections to get additional practice.
= LINUX PRACTICE QUESTIONS =
 
The purpose of this section is to obtain '''extra practice''' to help with '''quizzes''', your '''midterm''', and your '''final exam'''.
 
Here is a link to the [https://matrix.senecacollege.ca/~osl640/questions/osl640_week11_practice.docx MS Word Document of ALL of the questions] displayed below but with extra room to answer on the document to
simulate a quiz:
 
Your instructor may take-up these questions during class. It is up to the student to attend classes in order to obtain the answers to the following questions. Your instructor will NOT provide these answers in any other form (eg. e-mail, etc).
 
 
'''Review Questions:'''
 
'''Part A: Display Results from Using the sed Utility'''
 
Note the contents from the following tab-delimited file called '''~osl640/stuff.txt''':
(this file pathname exists for checking your work)
 
<pre>
Line one.
This is the second line.
This is the third.
This is line four.
Five.
Line six follows
Followed by 7
Now line 8
and line nine
Finally, line 10
</pre>
 
 
Write the results of each of the following Linux commands for the above-mentioned file:
 
 
# <span style="font-family:courier;font-weight:bold">sed -n '3,6 p' ~osl640/stuff.txt</span><br><br>
# <span style="font-family:courier;font-weight:bold">sed '4 q' ~osl640/stuff.txt</span><br><br>
# <span style="font-family:courier;font-weight:bold">sed '/the/ d' ~osl640/stuff.txt</span><br><br>
# <span style="font-family:courier;font-weight:bold">sed 's/line/NUMBER/g' ~osl640/stuff.txt</span>
 
 
'''Part B: Writing Linux Commands Using the sed Utility'''
 
Write a single Linux command to perform the specified tasks for each of the following questions.
 
 
# Write a Linux sed command to display only lines 5 to 9 for the file: '''~osl640/stuff.txt'''<br><br>
# Write a Linux sed command to display only lines the begin the pattern “and” for the file: '''~osl640/stuff.txt'''<br><br>
# Write a Linux sed command to display only lines that end with a digit for the file: '''~osl640/stuff.txt'''<br><br>
# Write a Linux sed command to save lines that match the pattern “line” (upper or lowercase) for the file: '''~osl640/stuff.txt''' and save results (overwriting previous contents) to: '''~/results.txt'''<br><br>
 
 
'''Part C: Writing Linux Commands Using the awk Utility'''
 
Note the contents from the following tab-delimited file called '''~osl640/stuff.txt''':
(this file pathname exists for checking your work)
 
<pre>
Line one.
This is the second line.
This is the third.
This is line four.
Five.
Line six follows
Followed by 7
Now line 8
and line nine
Finally, line 10
</pre>
 
 
'''Write the results of each of the following Linux commands for the above-mentioned file:'''
 
 
# <span style="font-family:courier;font-weight:bold">awk ‘NR == 3 {print}’ ~osl640/stuff.txt</span><br><br>
# <span style="font-family:courier;font-weight:bold">awk ‘NR >= 2 && NR <= 5 {print}’ ~osl640/stuff.txt</span><br><br>
# <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $2}’ ~osl640/stuff.txt</span><br><br>
# <span style="font-family:courier;font-weight:bold">awk ‘$1 ~ /This/ {print $3,$2}’ ~osl640/stuff.txt</span><br><br>
 
 
'''Part D: Writing Linux Commands Using the awk Utility'''
 
 
Write a single Linux command to perform the specified tasks for each of the following questions.
 
 
# Write a Linux awk command to display all records for the file: '''~/cars''' whose fifth field is greater than 10000.<br><br>
# Write a Linux awk command to display the first and fourth fields for the file: '''~/cars''' whose fifth field begins with a number.<br><br>
# Write a Linux awk command to display the second and third fields for the file: '''~/cars''' for records that match the pattern “chevy”.<br><br>
# Write a Linux awk command to display the first and second fields for all the records contained in the file: '''~/cars'''<br><br>
 
 
[[Category:OSL640]]

Navigation menu