Changes

Tutorial9: Regular Expressions

49 bytes removed, 14:13, 27 February 2021

→‎INVESTIGATION 2: EXTENDED REGULAR EXPRESSIONS

# Issue the following linux pipeline command to download another data file called '''numbers2.dat''': wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/numbers2.dat</nowiki>

# View the contents of the '''numbers2.dat''' file using the '''more''' command and quickly view the contents of this file. You should notice valid and more invalid numbers contained in this file. When finished, exit the more command.

# Issue the following linux pipeline command to display only whole numbers (with or without a positive or negative sign): grep "^[+-]*[0-9][0-9]*$" numbers2.dat ~~| more~~ You should notice '''multiple''' '''+''' or '''-''' '''signs''' appear prior to some numbers. This occurs since you are searching or one or MORE occurrences of a + or - sign. Using '''extended regular expression''' symbols to specify '''minimum''' and '''maximum''' repetitions: '''{min,max}''' can solve that problem. # Issue the following linux pipeline command (using extended regular expression symbols) to display only whole numbers (with or without a positive or negative sign): grep "^[+-]{0.1}[0-9]{1,}$" numbers2.dat ~~| more~~ '''NOTE:''' most likely, there were '''NO results'''. This is due to the fact that the '''grep command was NOT issued correctly to use extended regular expression symbols'''. You would need to issue either '''grep -E''' (or more simply) issue the '''egrep''' command. The egrep command works with all regular expression symbols, and should be used in the future instead of the older grep command. # Reissue the above pipeline command using '''egrep''' instead of ''grep'': egrep "^[+-]{0,1}[0-9]{1,}$" numbers2.dat | tee better-number1.txt ~~| more~~ You should have noticed that the command worked correctly this time because you used the '''egrep''' command. '''NOTE:''' In extended regular expressions, the '''?''' symbol can be used to represent the '''{0,1}''' repetition symbols and the '''+''' symbol can be used to represent the '''{1,}''' repetition symbols # Reissue the above pipeline command using '''egrep''' instead of ''grep'': egrep "^[+-]?[0-9]+$" numbers2.dat | tee better-number2.txt ~~| more~~ You should have seen the same results, but the extended regular expression required less typing. # Issue the following linux pipeline command to display signed, unsigned, whole, and decimal numbers: egrep "^[+-]{0,1}[0-9]{1,}[.]{0,1}[0-9]*$" numbers2.dat | tee better-number3.txt ~~| more~~

# Issue the following Linux command to check that you correctly issued those ''Linux pipeline commands'' using the '''tee''' command to create those text files: bash /home/murray.saul/scripts/week9-check-2 If you encounter errors, then view the feedback to make corrections, and then re-run the checking script. If you receive a congratulation message that there are no errors, then proceed with this tutorial. You can also use extended regular expression symbols for '''grouping'''. For example, you can search for repetitions of GROUPS of characters (like a word) as opposed to just a single character or a GROUP of numbers as opposed to a single digit.

# Issue the following linux pipeline command to download another data file called '''words.dat''': wget <nowiki>https://ict.senecacollege.ca/~murray.saul/uli101/words.dat</nowiki>

# View the contents of the '''numbers2.dat''' file using the '''more''' command and quickly view the contents of this file. You should notice valid and more invalid numbers contained in this file. When finished, exit the more command.

# Issue the following linux pipeline command to display two or more occurrences of the word "the": egrep -i "(the){2,}" words.dat | tee word-search1.txt more '''NOTE:''' You should NOT see any output due to the fact that a space should be included at the end of the word "'''the'''". Usually words are separated by spaces; therefore, there were no matches since there were not occurrences of "thethe" as opposed to "'''the the'''" (i.e. no space after repetition of the pattern).

# Reissue the previous pipeline command including a space in brackets: egrep -i "(the ){2,}" words.dat | tee word-search2.txt ~~| more~~ The "|" (or) symbol can be used within the grouping symbols to allow matching of additional groups of characters. Again, it is important to follow the character groupings with the space character # Issue the following linux pipeline command to search for 2 or more occurrences of the word "'''the'''" or the word "'''and'''": egrep -i "(the |and ){2,}" words.dat | tee word-search3.txt ~~| more~~

# Issue the following Linux command to check that you created those hard links: bash /home/murray.saul/scripts/week9-check-3 If you encounter errors, then view the feedback to make corrections, and then re-run the checking script. If you receive a congratulation message that there are no errors, then proceed with this tutorial.

Msaul

Administrators

13,420

edits

Changes

Tutorial9: Regular Expressions

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools