C# Professional - Processing Text

talent-agile
63.1K views

Regular Expressions

Regular expressions are a power tool to work with text. They use patterns to apply different operations on text.

With regular expressions, you can:

• Parse text to find specific character patterns
• Edit, replace or delete substrings of a text
• Extract text matching specific character patterns

Pattern definition

The basic pattern syntax will match any character.

You can define a class of multiple characters using [ and ]. [aeiou] will match one character that can be any vowel. You can use the - to include a range of consecutive characters in

Note that by default, regular expressions are case-sensitive. The .Net Regex class can accept an option when creating a new Regex to specify that the case should be ignored, however, it is better to specify in the pattern that all cases can be accepted.

Here are the most common characters attribute for simple regular expression patterns.

PatternMatching characters
tSingle character t
[aei]A single character of: a, e or i
[a-z]A single character in the range from a to z
[^a-z]A single character not in the range from a to z
\dA decimal character (digit), equivalent to [0-9]
\wA word character, equivalent to [a-ZA-Z_0-9]

For any character, you can use quantifiers to specify how many repetitions of the character should be matched.

QuantifierDefinition
*Will match zero or more repetitions
?Will match zero or one repetition
+Will match one or more repetition
{N}Will match exactly N repetitions
{N,}Will match at least N repetitions
{M,N}Will match between M and N repetitions

You can define anchors to match the beginning or the end of the text or a word.

AnchorDefinition
^Will match the beginning of the text
\$Will match the end of the text
\bWill match the boundary of a word (beginning or end)