Techopedia Explains Regular Expression
Utilities, text editors and programming languages use regular expressions to manipulate and search patterns of text. While some languages integrate regular expressions into the core of the language syntax, like TCL, Awk, PERL and RUBY, others use regular expressions through libraries, such as Java, C++ and C. This means there are implementation differences so a regular expression that works well with one application might or might not work with another. Subtle differences do exist.
Regular expressions can be incredibly powerful. Essentially, if the pattern can be defined, a regular expression can be created. A simple pattern might be something as simple as finding all situations where a sentence ends in "that" and is replaced with "which". The pattern could get more complex by doing the same replacement but only on the 3rd and 5th occurrence of a match. Or it could get even more complicated by using different sets of matching characters depending on the frequency and location of previous matching characters.
The three main components of a regular expression are anchors that are used to specify the position of a pattern in relation to a line of text, character sets that match one or more characters in a single position, and modifiers that specify the number of times the previous character set is repeated.
The operations that help in building regular expressions are:
- Quantification: Quantifiers dictate how often the preceding element is allowed to occur.
- Grouping: Operators can have their scope and precedence specified using parentheses.
- Boolean Conditions: An OR or AND condition can be stated for operators and groups.
Regular expressions use algorithms such as Deterministic Finite Automation (DFA) and Non-deterministic Finite Automation (NFA) to match a string. In an NFA, for each pair of state and input symbol there are several possible next states, while a DFA accepts a finite string of symbols.