Changes

Jump to: navigation, search

Tutorial9: Regular Expressions

456 bytes added, 11:51, 7 March 2021
no edit summary
# Issue the following Linux pipeline command to display whole positive or negative integers:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "^[+-][0-9][0-9]*$" numbers1.dat | tee signed.txt</span><br><br>What did you notice? Positive and negative numbers display, not not '''unsigned''' numbers.<br><br>[[Image:regexps-8.png|thumb|right|300px|Simultaneous '''anchoring''' of regular expressions using '''character class''' and '''zero or more occurrences''' to display '''signed''' and '''unsigned''' integers.]]
# Issue the following Linux pipeline command to display only whole numbers<br>(with or without a positive or negative sign):<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "^[+-]*[0-9][0-9]*$" numbers1.dat | tee all.txt</span><br><br>Did this command work?<br><br>
# Issue the following command to check that you created those hard links: <br><span style="color:blue;font-weight:bold;font-family:courier;">bash /home/murray.saul/scriptsmyscripts/week9-check-1</span><br><br>If you encounter errors, then view the feedback to make corrections, and then re-run the checking script. If you receive a congratulation message that there are no errors, then proceed with this tutorial.<br><br>
: Although very useful, '''complex''' regular expressions do NOT entirely solve our problem of displaying only '''valid''' unsigned and signed numbers.<br>In the next investigation, you will learn how to use '''extended''' regular expressions that will completely solve this issue.<br>
# Issue the following Linux pipeline command using the repetition shortcuts <span style="font-weight:bold;font-family:courier;">"+"</span> and <span style="font-weight:bold;font-family:courier;">"?"</span>:<br><span style="color:blue;font-weight:bold;font-family:courier;">egrep "^[+-]?[0-9]+$" numbers2.dat | tee better-number2.txt</span><br><br>You should have seen the same results, but the extended regular expression required less typing.<br><br>
# Issue the following Linux pipeline command to display '''signed''', '''unsigned''', '''whole''', and '''decimal''' numbers:<br><span style="color:blue;font-weight:bold;font-family:courier;">egrep "^[+-]{0,1}[0-9]{1,}[.]{0,1}[0-9]*$" numbers2.dat | tee better-number3.txt</span><br><br>Were all signed and unsigned intergers and decimal numbers displayed?<br><br>
# Issue the follwoing command to check that you correctly issued<br>those ''Linux pipeline commands'': <br><span style="color:blue;font-weight:bold;font-family:courier;">bash /home/murray.saul/scriptsmyscripts/week9-check-2</span><br><br>If you encounter errors, then view the feedback to make corrections, and then re-run the checking script.<br>If you receive a congratulation message that there are no errors, then proceed with this tutorial.<br><br>You can also use extended regular expression symbols for '''grouping'''.<br>For example, you can search for repetitions of GROUPS of characters (like a word)<br>as opposed to just a single character or a GROUP of numbers as opposed to a single digit.<br><br>
# Issue the following linux pipeline command to download another data file called '''words.dat''':<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/words.dat</nowiki></span><br><br>
# View the contents of the '''words.dat''' file using the '''more''' command and quickly view the contents of this file.<br>You should notice valid and more invalid numbers contained in this file. When finished, exit the more command.<br><br>
# Reissue the previous pipeline command including a space in brackets:<br><span style="color:blue;font-weight:bold;font-family:courier;">egrep -i "(the ){2,}" words.dat | tee word-search2.txt</span><br><br>[[Image:eregexps-3.png|thumb|right|330px|Using '''extended''' regular expression symbols (such as '''grouping''') to refine matches of repetition of '''words''' (as opposed to ''characters'').]]The <span style="font-weight:bold;font-family:courier;">"|"</span> (or) symbol (same symbol as "pipe") can be used within the grouping symbols to allow matching of additional groups of characters.<br>Again, it is important to follow the character groupings with the space character<br><br>
# Issue the following linux pipeline command to search for 2 or more occurrences of the word "'''the '''" <u>or</u> the word "'''and '''":<br><span style="color:blue;font-weight:bold;font-family:courier;">egrep -i "(the |and ){2,}" words.dat | tee word-search3.txt</span><br><br>
# Issue the following Linux command to check that you correctly issued<br>those ''Linux pipeline commands'' using the '''tee''' command to create those text files:<br><span style="color:blue;font-weight:bold;font-family:courier;">bash /home/murray.saul/scriptsmyscripts/week9-check-3</span><br><br>If you encounter errors, then view the feedback to make corrections, and then re-run the checking script.<br>If you receive a congratulation message that there are no errors, then proceed with this tutorial.<br><br>
: The '''grep''' and '''egrep''' Linux commands are NOT the only Linux commands that use regular expressions.<br>In the next investigation, you will apply regular expressions to a number of Linux commands that you already learned in this course.
# Make certain that you are located in your '''~/regexps''' directory on your ''Matrix'' account.<br><br>
# Let's look at using regular expressions with the '''man''' command.<br>Issue the following linux command :<br><span style="color:blue;font-weight:bold;font-family:courier;">man ls</span><br><br>[[Image:other-re-1.png|thumb|right|300px|Entering '''/sort''' in the '''man, more and less commands ''' command can search for the string "'''sort'''".]]
# We want to search for an option that can sort the file listing.<br>Type the following regular expression below and press '''ENTER''':<br><span style="color:blue;font-weight:bold;font-family:courier;">/sort</span><br><br>'''FYI:''' The '''grep''' and '''egrep''' Linux commands contain the regular expressions within quotes,<br>but '''most''' other Linux commands specify regular expressions using forward slashes<br>(e.g. <span style="font-weight:bold;font-family:courier;">/regular expression</span> &nbsp; or &nbsp; <span style="font-weight:bold;font-family:courier;">/regular expression</span>).<br><br>
# Scroll throughout the man pages for the ls command to view matches for the pattern "'''sort'''"<br>(You can press '''SPACE''' or key combination '''alt-b''' to move forward and backwards one screen respectively).<br><br>
# Press the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span> to '''exit''' the ''man'' pages for '''ls'''.<br><br>Let's use regular expressions with the '''more''' command.<br><br>
# Issue the following Linux command to download another data file called '''large-file.txt''':<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/large-file.txt</nowiki></span><br><br>
# View the contents of the '''large-file.txt''' file using the '''more''' command and quickly view the contents of this file.<br><br>[[Image:other-re-2.png|thumb|right|300px|Entering '''/uli101''' in the '''more''' and '''less''' commands can search for the string "'''uli101'''".]]
# Issue the following Linux command to view the contents of the '''large-file.txt''':<br><span style="color:blue;font-weight:bold;font-family:courier;">more large-file.txt</span><br><br>
#We want to search for a pattern '''uli101''' within this text file.<br>Type the following regular expression and press ENTER:<br><span style="color:blue;font-weight:bold;font-family:courier;">/uli101</span><br><br>You should see the pattern "uli101" on the second line at the top.<br><br>
#Search for the next occurrence of the pattern '''uli101''' by '''re-typing'''<br>the following regular expression and pressing ENTER:<br><span style="color:blue;font-weight:bold;font-family:courier;">/uli101</span><br><br>you should now see the '''second occurrence''' of this pattern near the top.<br><br>
# Press the letter <span style="color:blue;font-weight:bold;font-family:courier;">q</span> to exit the '''more''' command.<br><br>
# Try the same search techniques with the '''less''' command.<br><br>Does it work the same for the ''less'' command as it did for the ''more'' command?<br><br>[[Image:other-re-3.png|thumb|right|300px|Entering '''/uli101''' in the '''vi''' command can search for the string "'''uli101'''".]]
#Let's learn how to perform a simple '''search and replace''' within the '''vi''' utility by using regular expressions.<br>Issue the following Linux command to edit the '''large-file.txt''' file:<br><span style="color:blue;font-weight:bold;font-family:courier;">vi large-file.txt</span><br><br>Let's first perform a simple search within this text file.<br><br>
# Press the '''ESC''' key to make certain you are in '''COMMAND''' mode.<br><br>
# Type the following and press '''ENTER''':<br><span style="color:blue;font-weight:bold;font-family:courier;">/uli101</span><br><br>You should move to the '''first occurrence''' of the pattern: '''uli101'''.<br><br>Let's search for the '''uli101''' pattern, but replace it in capitals (i.e '''ULI101''').<br><br>In vi, to issue a command, you need to enter '''LAST LINE''' mode then issue a command.<br>Let's issue a command from LAST LINE mode to search and replace '''uli101''' to '''ULI101'''.<br><br>[[Image:other-re-4.png|thumb|right|500px|In l'''ast line''' MODE in the '''vi''' text editor, issuing a command using regular expressions to convert '''uli101''' to '''ULI101'''.]]
# Making certain that you are command mode in vi, type the following and press ENTER:<br><span style="color:blue;font-weight:bold;font-family:courier;">:%s/uli101/ULI101/g</span><br><br>
# Navigate throughout the text file to confirm that ALL occurrences of uli101 have been replaced with ULI101.<br><br>
13,420
edits

Navigation menu