Difference between revisions of "ULI101 Week 9"
(Created page with "= Regular Expressions = Define set of characters using a simple expression or a pattern. Used mainly for searching and/or replacing strings. Used by various Unix utilities su...") |
(No difference)
|
Revision as of 09:10, 31 August 2017
Regular Expressions
Define set of characters using a simple expression or a pattern. Used mainly for searching and/or replacing strings. Used by various Unix utilities such as:
- vi
- grep
- awk
- sed
Regular expressions match input within a line. Regular expressions are very different than shell meta-characters.
Literal Matching
Contains no special characters. Matches only itself: hence the term literal matching. Matches entire words or parts of it. Some examples of literal matching are:
String literal | Will match |
/disk/
|
diskette, disk, disks |
/my book/
|
my book, dummy book |
The slash character (/
) at the start and end of the regular expression patterns used above and below are not part of the string pattern, rather the first slash is used to show where the string to match begins and the end slash is used to show where the string to match ends.
Regular Expression Delimiters
Each regular expression should be delimited. This is particularly important to prevent the shell from interpreting special characters.
Delimiters mark beginning and end of the regular expression. Depending on a situation and utility used different characters can be used as a delimiter, for example in grep the delimiter is usually the double quote
Special Characters
Special characters and expressions can be used to build regular expressions for pattern matching. Standard special characters include:
.
, *
, []
, ^
, $
Depending on the utility and its version, some versions support standard regular expressions and some support extended regular expressions. Although some of them may look like shell expansion characters they usually mean something else. Whenever you wish to match a special character literally it must be quoted.
Period '.'
Period is the only wildcard in regular expressions (RE). The period character in a regular expression can be viewed as a placeholder (match) for any single character. Some examples of period being used in regular expressions:
Period (. ) in RE
|
Will match |
/.nix/
|
Unix, unix |
/leaf./
|
leafs, leafy |
/a.t/
|
a t, ant, act |
Asterisk '*'
Represents zero or more occurrences of regular expression directly preceeding the asterisk (*
). By itself, it does not match anything - it is NOT a wildcard. It is used in conjunction with literal matches, a period or other special characters. Some examples of asterisk in regular expressions:
Asterisk (* ) in RE
|
Will match |
/cart*/
|
car, carpool, cart, caret |
/of.*ice/
|
office, off ice |
Square brackets '[]'
Enclose a character class or group, similar to the shell. Any single character within the brackets will be matched. Hyphen can be used for defining a range of characters. Most special characters lose their special meaning. The caret sign at the beginning of the list means exclusion ([^a]
means do not match a). Some examples of square brackets in regular expressions are:
Square brackets ([] ) in RE
|
Will match |
/practi[cs]ing/
|
practicing and practising but not practicsing |
/file[12s]/
|
file1, file2, and files but not file12s |
Caret '^'
Matches strings at the beginning of the line (anchoring it). Special only if the beginning of the regular expression, otherwise means a literal match. Inside square brackets means character exclusion. Some examples of caret in regular expressions are:
Caret (^ ) in RE
|
Will match |
/^[0-9]/
|
any line that begins with a digit |
/1[^0-3]/
|
1a, 1., 145, but not 10 and 13 |
Dollar '$'
Matches strings at the end of the line, (anchoring matches to the end of the line). Examples of dollar in regular expressions are:
Dollar ($ ) in RE
|
Will match |
/A$/
|
only match lines that end with capital letter A |
/50$/
|
only lines ending in 50 like: 50, 150, |
Grouping '()' and grouping with alteration '(|)'
Parentheses can be used to create bracketed regular expressions. The parentheses group the regular expression inside. The parentheses are not matched, only what is inside. Grouping offers alteration/choice represented by the pipe (|). When a grouped expression is followed by a quantifier such as the asterisk, the quantifier applies to the entire group Examples of grouping with and without alteration are:
Grouping (() ) in RE
|
Will match |
/“(Mr\vertMrs) Smith”/
|
“Mr Smith” and “Mrs Smith” |
/a(abc)*z/
|
az, aabcz, aabcabcz |
grep
requires the-E
option to enable grouping
Search and Replace in vi
Utilities such as vi are able to perform string substitution. Such substitution is done using regular expressions. You need to be careful when using non alpha-numeric characters - quote them if necessary. Examples of substitution syntax in vi:
:[address]s/original-string/replacement-string[/g]
where
-
[address]
- specifies a line range (optional). If address range is not given, then only current line is used for substitution
-
[/g]
- specifies global substitution on the line (optional). If
/g
is not given, then only the first line occurence is substituted.
Example vi Substitution Ranges
In the following examples, substitute the first occurrence of 'a' with 'A'.
Where to replace 'a' with 'A' | vi command needed |
on line 7 only | :7s/a/A/
|
on lines 7 to 10 (inclusive) | :7,10s/a/A/
|
in the entire document (range is %) | :%s/a/A/
|
from the beginning of document to current line (1,.) | :1,.s/a/A/
|
between current and next 5 lines (inclusive) (.,.+5) | :.,.+5s/a/A/
|