13,420
edits
Changes
→INVESTIGATION 1: SIMPLE & COMPLEX REGULAR EXPRESSIONS
# Issue the following linux Linux command to '''download''' a text file to your '''home''' directory:<br><span style="color:blue;font-weight:bold;font-family:courier;">wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/textfile1.txt</nowiki></span><br><br>
# View the contents of the '''textfile1.txt''' file using the '''more''' command see what data is contained in this file.<br><br>Although there are several Linux commands that use regular expressions,<br>we will be using the '''grep''' command for this investigation.<br><br>[[Image:regexps-1.png|thumb|right|250px|Output of '''grep''' command matching simple regular expression "'''the'''" (only lowercase). Notice the pattern matches larger words like "'''their'''" or "'''them'''".]]
#Issue the following Linux command to match the pattern "'''the'''" within '''textfile1.txt''':<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "the" textfile1.txt</span><br><br>Take a few moments to view the output and observe the matched patterns.<br><br>
# Now, issue the grep Linux command with the <span style="font-weight:bold;font-family:courier;">-i</span> option to ignore case sensitively:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep -i "the" textfile1.txt</span><br><br>What do you notice is different when issuing this command?<br><br>You will notice that the pattern "'''the'''" is matched including larger words like "'''them'''" and "'''their'''".<br>You can issue the '''grep''' command with the -w option to only match the pattern as a '''word'''.<br><br>
# Issue the following Linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep -w -i "the" textfile1.txt</span><br><br>You should now see only strings of text that match the word '''"the"''' (upper or lower case).<br><br>Matching literal or simple regular expressions can be useful, but are '''limited'''<br>in what pattens they can match. For example, you may want to<br>search for a pattern located at the '''beginning''' or '''end''' of the string.<br><br>There are other regular expression symbols that provide more '''precise''' search pattern matching.<br>These special characters are known as '''complex''' and '''extended''' regular expressions symbols.<br> In this investigation, we will focus on complex ''regular expressions'' and then discuss<br>''extended regular expressions'' in INVESTIGATION 2.<br><br><table align="right"><tr valign="top"><td>[[Image:regexps-2.png|thumb|right|280px|Anchoring regular expressions at the '''beginning''' of text.]]</td><td>[[Image:regexps-3.png|thumb|right|250px|Anchoring regular expressions at the '''ending''' of text.]]</td></tr></table>
# Issue the following Linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep -w -i "^the" textfile1.txt</span><br><br>The '''^''' symbol is referred to as an '''anchor'''.<br>In this case, it only matches<br>the word "'''the'''" (both upper or lowercase) at the <u>beginning</u> of the string.<br><br>
# Issue the following Linux command:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep -w -i "the$" textfile1.txt</span><br><br>The '''$''' symbol is used to anchor patterns at the <u>end</u> of the string.<br><br>