6.2.5 Regular expressions

String variables can be modified using the search-and-replace string operator1, =$\sim $, which takes a regular expression with a syntax similar to that expected by the shell command sed and applies it to the specified string variable.2 In the following example, the first instance of the letter s in the string variable twister is replaced with the letters th:

pyxplot> twister="seven silver soda syphons"
pyxplot> twister =$\sim $ s/s/th/
pyxplot> print twister
theven silver soda syphons

Note that only the s (substitute) command of sed is implemented in Pyxplot. Any character can be used in place of the / characters in the above example, for example:

twister =~ s's'th'

Flags can be passed, as in sed or perl, to modify the precise behaviour of the regular expression. In the following example the g flag is used to perform a global search-and-replace of all instances of the letter s with the letters th:

pyxplot> twister="seven silver soda syphons"
pyxplot> twister =$\sim $ s/s/th/g
pyxplot> print twister
theven thilver thoda thyphonth

Table 6.3 lists all of the regular expression flags recognised by the =$\sim $ operator.

g

Replace all matches of the pattern; by default, only the first match is replaced.

i

Perform case-insensitive matching, such that expressions like [A-Z] will match lowercase letters, too.

l

Make $\backslash $w, $\backslash $W, $\backslash $b, $\backslash $B, $\backslash $s and $\backslash $S dependent on the current locale.

m

When specified, the pattern character \^{} matches the beginning of the string and the beginning of each line immediately following each newline. The pattern character $ matches at the end of the string and the end of each line immediately preceding each newline. By default, \^{} matches only the beginning of the string, and $ only the end of the string and immediately before the newline, if present, at the end of the string.

s

Make the . special character match any character at all, including a newline; without this flag, . will match anything except a newline.

u

Make $\backslash $w, $\backslash $W, $\backslash $b, $\backslash $B, $\backslash $s and $\backslash $S dependent on the Unicode character properties database.

x

This flag allows the user to write regular expressions that look nicer. Whitespace within the pattern is ignored, except when in a character class or preceded by an un-escaped backslash. When a line contains a #, neither in a character class nor preceded by an un-escaped backslash, all characters from the left-most such # through to the end of the line are ignored.

Table 6.3: A list of the flags accepted by the =$\sim $ operator. Most are rarely used, but the g flag is very useful.

Footnotes

  1. Programmers with experience of perl will recognise this syntax.
  2. Regular expression syntax is a massive subject, and is beyond the scope of this manual. The official GNU documentation for the sed command is heavy reading, but there are many more accessible tutorials on the web.