Regexes (regular expressions) are an extremely useful tool, but I find myself getting tripped up by differences in their APIs and capabilities in different languages. Here's a regex Rosetta Stone, covering how to use regexes in various programming language.
Supported syntax
Different languages and libraries support different syntaxes (supported special characters and so on).
- Perl
- Basic syntax, extended patterns
- Python
- Library reference, HOWTO
Raw strings are useful to cut down on the number of backslashes you need.
- C++
- Boost.Regex syntax (Perl-compatible by default, including support for Perl extended patterns)
Raw string literals are useful to cut down on the number of backslashes you need, if your compiler supports them.
- C#
- Regular Expression Language Elements
@-quoted string literals are useful to cut down on the number of backslashes you need.
- JavaScript
- “Writing a Regular Expression Pattern” on the Mozilla Developer Network
Supported modifiers
Regexes generally support modifiers to control case sensitivity, etc.
- Perl
- Modifiers
- Python
- re module constants
- C++
- boost::regex_constants::syntax_option_type
- C#
- System.Text.RegularExpressions.RegexOptions
- JavaScript
- See the “flags” section of the parameters to the RegExp object.
At the top of the file
To make regex functionality available in your module or source file:
- Perl
- Nothing necessary
- Python
import re
- C++
#include <boost/regex.hpp> // Optionally: using namespace boost::regex;
- C#
using System.Text.RegularExpressions;
- JavaScript
- Nothing necessary
Matching an entire string
- Perl
if ($text =~ /^hello \d+$/) { ... }
Perl regexes must be explicitly anchored using ^ and $ to match the entire string.
- Python
if re.match(r'hello \d+$', text): ...
re.match starts at the beginning of the string but requires $ to anchor the match to the end of the string.
- C++
if (regex_match(text, boost::regex("hello \\d+"))) { ... }
Use boost::regex_match if you want to require that the entire string match.
- C#
if (Regex.isMatch(text, @"^Hello \d+$")) { ... }
C# regexes must be explicitly anchored using ^ and $ to match the entire string.
- JavaScript
if (/^Hello \d+$/.test(text)) { ... }
JavaScript regexes must be explicitly anchored using ^ and $ to match the entire string.
Matching a substring
- Perl
if ($text =~ /Hello \d+/) { ... }
- Python
if re.search(r'Hello \d+', text): ...
Use re.search instead of re.match to search for a substring anywhere within the string.
- C++
if (regex_search(text, boost::regex("Hello \\d+"))) { ... }
Use boost::regex_match if you want to search for a substring anywhere within the string.
- C#
if (Regex.isMatch(text, @"Hello \d+")) { ... }
- JavaScript
if (/Hello \d+/.test(text)) { ... }
Performing a case-insensitive match
- Perl
if ($text =~ /hello \d+/i) { ... }
- Python
if re.search(r'hello \d+', text, re.I): ...
- C++
if (regex_search(text, boost::regex("hello \\d+", boost::regex_constants::icase))) { ... }
- C#
if (Regex.isMatch(text, @"hello \d+", RegexOptions.IgnoreCase)) { ... }
- JavaScript
if (/hello \d+/i.test(s)) { ... }
Storing a regex for later use
- Perl
$r = qr/hello \d+/i; if ($text =~ $r) { ... }
- Python
r = re.compile(r'hello \d+', re.I) if r.search(text): ...
Note that Python automatically caches the most recently used patterns (see here and here), so you won't necessarily see a performance gain by compiling a regex.
- C++
const boost::regex r("hello \\d+", boost::regex_constants::icase); if (regex_search(text, r)) { ... }
- C#
Regex r = new Regex(@"hello \d", RegexOptions.IgnoreCase); if (r.IsMatch(s)) { ... }
Note that NET automatically caches the most recently used patterns, so you won't necessarily see a performance gain by storing a regex for later use. Also note that, while most other languages define “compiling” a regex as interpreting it, .NET supports compiling regexes to actual IL, as described here and here.
- JavaScript
var r = /hello \d+/i; // or var r = new RegExp("hello \\d+", "i"); if (r.test(s)) { ... }
Replacing part of a string
- Perl
# Replace all occurrences: $test =~ s/Hello/Goodbye/g; # Replace the first occurrence only: $text =~ s/Hello/Goodbye/;
- Python
# Replace all occurrences: text = re.sub('Hello', 'Goodbye', text) # Replace the first occurrence only: text = re.sub('Hello', 'Goodbye', text, count=1)
- C++
// Replace all occurrences: text = regex_replace(text, boost::regex("Hello"), "Goodbye"); // Replace the first occurrence only: text = regex_replace(text, boost::regex("Hello"), "Goodbye", boost::regex_constants::format_first_only);
- C#
// Replace all occurrences: text = Regex.Replace(text, "Hello", "Goodbye"); // Replace the first occurrence only: Regex r = new Regex("Hello"); text = r.Replace(text, "Goodbye", 1);
- JavaScript
// Replace all occurrences: text = text.replace(/Hello/g, 'Goodbye'); // Replace the first occurrence only: text = text.replace(/Hello/, 'Goodbye');
Extracting parts of a string
- Perl
if (($title, $name) = $text =~ /(Mr\.|Mrs\.|Dr\.) (\w+)/) { ... }
- Python
m = re.search(r'(Mr\.|Mrs\.|Dr\.) (\w+)', text) if m: title, name = m.groups() ...
- C++
boost::smatch m; if (regex_search(text, m, boost::regex("(Mr\\.|Mrs\\.|Dr\\.) (\\w+)"))) { const std::string& title = m[1].str(); const std::string& name = m[2].str(); ... }
- C#
Match m = Regex.Match(text, @"(Mr\.|Mrs\.|Dr\.) (\w+)"); if (m.Success) { string title = m.Groups[1].Value; string name = m.Groups[2].Value; ... ... }
- JavaScript
var match = /(Mr\.|Mrs\.|Dr\.) (\w+)/.exec(text); if (match !== null) { var title = match[1]; var name = match[2]; ... }
No comments:
Post a Comment