Coding Castles: August 2012

Regexes (regular expressions) are an extremely useful tool, but I find myself getting tripped up by differences in their APIs and capabilities in different languages. Here's a regex Rosetta Stone, covering how to use regexes in various programming language.

Supported syntax

Different languages and libraries support different syntaxes (supported special characters and so on).

Perl: Basic syntax, extended patterns
Python: Library reference, HOWTO
Raw strings are useful to cut down on the number of backslashes you need.
C++: Boost.Regex syntax (Perl-compatible by default, including support for Perl extended patterns)
Raw string literals are useful to cut down on the number of backslashes you need, if your compiler supports them.
C#: Regular Expression Language Elements
@-quoted string literals are useful to cut down on the number of backslashes you need.
JavaScript: “Writing a Regular Expression Pattern” on the Mozilla Developer Network

Supported modifiers

Regexes generally support modifiers to control case sensitivity, etc.

Perl: Modifiers
Python: re module constants
C++: boost::regex_constants::syntax_option_type
C#: System.Text.RegularExpressions.RegexOptions
JavaScript: See the “flags” section of the parameters to the RegExp object.

At the top of the file

To make regex functionality available in your module or source file:

Perl

Nothing necessary

Python

import re

C++

#include <boost/regex.hpp>

// Optionally:
using namespace boost::regex;

C#

using System.Text.RegularExpressions;

JavaScript

Nothing necessary

Matching an entire string

Perl

if ($text =~ /^hello \d+$/) { ... }

Perl regexes must be explicitly anchored using ^ and $ to match the entire string.

Python

if re.match(r'hello \d+$', text):
    ...

re.match starts at the beginning of the string but requires $ to anchor the match to the end of the string.

C++

if (regex_match(text, boost::regex("hello \\d+"))) { ... }

Use boost::regex_match if you want to require that the entire string match.

C#

if (Regex.isMatch(text, @"^Hello \d+$")) { ... }

C# regexes must be explicitly anchored using ^ and $ to match the entire string.

JavaScript

if (/^Hello \d+$/.test(text)) { ... }

JavaScript regexes must be explicitly anchored using ^ and $ to match the entire string.

Matching a substring

Perl

if ($text =~ /Hello \d+/) { ... }

Python

if re.search(r'Hello \d+', text):
    ...

Use re.search instead of re.match to search for a substring anywhere within the string.

C++

if (regex_search(text, boost::regex("Hello \\d+"))) { ... }

Use boost::regex_match if you want to search for a substring anywhere within the string.

C#

if (Regex.isMatch(text, @"Hello \d+")) { ... }

JavaScript

if (/Hello \d+/.test(text)) { ... }

Performing a case-insensitive match

Perl

if ($text =~ /hello \d+/i) { ... }

Python

if re.search(r'hello \d+', text, re.I):
    ...

C++

if (regex_search(text, boost::regex("hello \\d+",
    boost::regex_constants::icase))) { ... }

C#

if (Regex.isMatch(text, @"hello \d+", RegexOptions.IgnoreCase)) { ... }

JavaScript

if (/hello \d+/i.test(s)) { ... }

Storing a regex for later use

Perl

$r = qr/hello \d+/i;
if ($text =~ $r) { ... }

Python

r = re.compile(r'hello \d+', re.I)
if r.search(text):
    ...

Note that Python automatically caches the most recently used patterns (see here and here), so you won't necessarily see a performance gain by compiling a regex.

C++

const boost::regex r("hello \\d+", boost::regex_constants::icase);
if (regex_search(text, r)) { ... }

C#

Regex r = new Regex(@"hello \d", RegexOptions.IgnoreCase);
if (r.IsMatch(s)) { ... }

Note that NET automatically caches the most recently used patterns, so you won't necessarily see a performance gain by storing a regex for later use. Also note that, while most other languages define “compiling” a regex as interpreting it, .NET supports compiling regexes to actual IL, as described here and here.

JavaScript

var r = /hello \d+/i;
// or var r = new RegExp("hello \\d+", "i");
if (r.test(s)) { ... }

Replacing part of a string

Perl

# Replace all occurrences:
$test =~ s/Hello/Goodbye/g;
# Replace the first occurrence only:
$text =~ s/Hello/Goodbye/;

Python

# Replace all occurrences:
text = re.sub('Hello', 'Goodbye', text)
# Replace the first occurrence only:
text = re.sub('Hello', 'Goodbye', text, count=1)

C++

// Replace all occurrences:
text = regex_replace(text, boost::regex("Hello"), "Goodbye");
// Replace the first occurrence only:
text = regex_replace(text, boost::regex("Hello"), "Goodbye",
    boost::regex_constants::format_first_only);

C#

// Replace all occurrences:
text = Regex.Replace(text, "Hello", "Goodbye");
// Replace the first occurrence only:
Regex r = new Regex("Hello");
text = r.Replace(text, "Goodbye", 1);

JavaScript

// Replace all occurrences:
text = text.replace(/Hello/g, 'Goodbye');
// Replace the first occurrence only:
text = text.replace(/Hello/, 'Goodbye');

Extracting parts of a string

Perl

if (($title, $name) = $text =~ /(Mr\.|Mrs\.|Dr\.) (\w+)/) { ... }

Python

m = re.search(r'(Mr\.|Mrs\.|Dr\.) (\w+)', text)
if m:
    title, name = m.groups()
    ...

C++

boost::smatch m;
if (regex_search(text, m, boost::regex("(Mr\\.|Mrs\\.|Dr\\.) (\\w+)"))) {
    const std::string& title = m[1].str();
    const std::string& name = m[2].str();
    ...
}

C#

Match m = Regex.Match(text, @"(Mr\.|Mrs\.|Dr\.) (\w+)");
if (m.Success) {
    string title = m.Groups[1].Value;
    string name = m.Groups[2].Value;
    ...
    ...
}

JavaScript

var match = /(Mr\.|Mrs\.|Dr\.) (\w+)/.exec(text);
if (match !== null) {
    var title = match[1];
    var name = match[2];
    ...
}

Coding Castles

Tuesday, August 28, 2012

Regex Cheat Sheet

Supported syntax

Supported modifiers

At the top of the file

Matching an entire string

Matching a substring

Performing a case-insensitive match

Storing a regex for later use

Replacing part of a string

Extracting parts of a string

Blog Archive

About Me