Changes

Jump to: navigation, search

Tutorial9: Regular Expressions

62 bytes removed, 12:30, 27 February 2021
INVESTIGATION 1: SIMPLE & COMPLEX REGULAR EXPRESSIONS
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep -w -i "^the" textfile1.txt</span><br><br>The '''^''' symbol is referred to as an '''anchor'''. In this case, it only matches<br>the word "'''the'''" (both upper or lowercase) at the <u>beginning</u> of the string.<br>The '''$''' symbol is used to anchor patterns at the end of strings.<br><br>
# Issue the following Linux pipeline command:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep -w -i "the$" textfile1.txt</span><br><br>What do you notice?<br><br>
# Issue the following Linux pipeline command to anchor the work "the" simultaneously at the beginning and the end of the string:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep -w -i "^the$" textfile1.txt | more</span><br><br>What do you notice?<br><br>Anchoring patterns at both the <u>beginning</u> and <u>ending</u> of strings can greatly assist for more robust search patterns.<br>We will now be demonstrating '''simultaneous anchoring''' with other complex regular expressions symbols.<br><br># Issue the following command to match strings that begin with 3 characters:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "^..." textfile1.txt | more</span><br><br>What do you notice?<br><br># Issue the following command to match strings that begin and end with 3 characters:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "^...$" textfile1.txt | more</span><br><br>What do you notice?<br><br># Issue the following command to match strings that begin with 3 digits:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "^[0-9][0-9][0-9]" textfile1.txt | more</span><br><br># Issue the following command to match strings that end with 3 uppercase letters:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "[A-Z][A-Z][A-Z]$" textfile1.txt | more</span><br><br># Issue the following command to match strings that consist of only 3 digits:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "^[0-9][0-9][0-9]$" textfile1.txt | more</span><br><br>The '''*''' complex regular expression symbol is often confused with filename expansion. In other words, it does NOT represent zero or more of '''any character''', but zero or or occurrences of the character that comes before the * symbol.<br><br># To demonstration, issue the following command to display zero or more occurrences of the letter x:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "x*" textfile1.txt | more</span><br><br>You will most likely notice most lines of the file is displayed.<br><br># Let's issue a command to display strings that contain more than one occurrence of the letter x:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "xx*" textfile1.txt | more</span><br><br>Why did this work? because the pattern indicates one occurrence of the letter x, followed by zero or MORE occurrences of the letter x.<br><br>If you combine the complex regular expression symbols .* it will act like zero or more occurrence of any character (like * did in filename expansion).<br><br># Issue the following command to match strings begin and end with a number with nothing or anything inbetween:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "^[0-9].*[0-9]$" textfile1.txt | more</span><br><br>Using '''simultaneous anchors''' combined with the .* symbol(s) can help you to refine your search patterns of strings.<br><br>
# Issue the following linux pipeline command to display strings that begin with a capital letter, ends with a number, and contains a capital X somewhere inbetween:<br><span style="color:blue;font-weight:bold;font-family:courier;">grep "^[A-Z].*X.*[0-9]$" textfile1.txt | more</span><br><br>Let's look at another series of examples involving '''filtering''' with numbers so only strings containing valid numbers are displayed.<br><br>
# Issue the following Linux command to create the '''regexps''' directory: <span style="color:blue;font-weight:bold;font-family:courier;">mkdir ~/regexps</span><br><br>
13,420
edits

Navigation menu