C# Professional - Processing Text

talent-agile
43.8K views

Regular Expressions - Groups

When building a regular expression pattern, you can specify groups that will match one value between multiple possible values. This is done using the parenthesis : (banana|ananas|apple).

This example will match any text containing banana, ananas or apple.

// { autofold
using System;
using System.Text.RegularExpressions;
class Example
{
static void Main()
{
// }
// This pattern will match any string containing banana, ananas or apple
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Groups are very handy when working with a regular expression where you need to specify multiple options for a specific work.

Quantifiers

As with individual characters, groups can use quantifiers to specify the number of occurence of the group.

// { autofold
using System;
using System.Text.RegularExpressions;
class Example
{
static void Main()
{
// }
// This pattern will match any string that contains 3 occurence of either abc, def or ghi
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Capture & Backreference

When using groups in the pattern, by default, the regular expression will capture the value corresponding to that group. This is often used when using regular expressions to extract a specific substring from a larger text.

In .Net, the value captured can be retrieved using the Groups property of a Match from a regular expression.

Note: the first element in the Groups enumeration is the whole match, captured groups start at the 1 index

// { autofold
using System;
using System.Text.RegularExpressions;
class Example
{
static void Main()
{
// }
// Capture the user name using a regular expression
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Values captured from a group can also be used as backreference in the pattern, allowing to ensure that the first captured value is the same in another part of the regular expression.

The backreference is done with the \N syntax, where N is the number of the referenced group in the pattern.

Example: user_id: (\d+) - validating email for user \1

This will match text when the first user_id is the same than the one at the end of the text.

// { autofold
using System;
using System.Text.RegularExpressions;
class Example
{
static void Main()
{
// }
// Match when a user is validating its own email
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Naming groups

Groups can be given a name with a specific syntax in the pattern.

user_name: (?<username>\w+)

Here, the capturing group is named username. This name can be used for backreferences using the \k<username> syntax, and can be used when retrieving groups on a Match object in .Net.

// { autofold
using System;
using System.Text.RegularExpressions;
class Example
{
static void Main()
{
// }
// Capture the user name using a regular expression
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX