10stripe's Perl-Centric Regular Expression Cheat-Sheet

Regular expressions are a compact notation system used for searching through and making changes to strings. Perl happens to have one of the more powerful, complete regular expression engines. Many other languages imitate the behavior of Perl regular expressions, such as PHP's preg family of functions (contrast with PHP's ereg functions).

Regular expressions are both powerful and complex. To help make sense of them, you may want to use this Perl-centric regular expression (often shortened to regex or regexp) cheat-sheet. Although it is targeted toward Perl (particularly the section of modifiers, which are very language-dependent), it will work for most other languages. This is not a comprehensive overview of Perl's regex syntax and it will not teach you Perl's regex syntax. But it is a pretty handy reference.

If you would like a print version, we strongly recommend the PDF version of this page. We have made some adjustments so that printing this HTML document should also be a workable solution (results will look best in a modern browser with strong CSS support). It won't be quite as nice as the PDF version.

Anchors

^ Start of string (equivalent: $A unless /m is used)

$ End of string (equivalent: $Z unless /m is used)

\b Word boundary, similar to: (\w\W|\W\w)

\B Anything but a word boundary

Subexpressions

( ) Define a subexpression

$a ath subexpression in or after substitution

\a ath subexpression inside match operation

(?:a) Non-capturing parentheses (match a)

Case Conversion

\l Make next character lowercase

\u Make next character uppercase

\L Make entire string (up to \E) lowercase

\U Make entire string (up to \E) uppercase

\E End \L or \U (so they only apply before \E)

\u\L Capitalize first char, lowercase rest (sentence)

Look-around

?= Look-ahead

?<= Look-behind

?! Negative look-ahead

?<! Negative look-behind

?(a)b Conditional; if a then b

?(a)b|c Conditional; if a then b else c

Modifiers

/i Case-insensitive

/g Global match (return list of matches)

/m Multiline mode (^ and $ match internal \n)

/s Line-agnostic (. matches \n)

/x Allow arbitrary whitespace and comments

Basic Metacharacters

. Match any single character except \n (unless /s)

| OR; (ab|ac) matches ab or ac

[abc] Match one out of a set of characters

[^abc] Match one character not in set

[a-z] Match one character from range, often [a-zA-Z]

\ Escape next character, such as \/ or \( or \)

Quantifiers

* Match zero or more of previous character/subexpression

+ Match one or more of previous character/subexpression

? Match 0 or 1 of previous character/subexpression

{n} Match exactly n of previous character/subexpression

{m,n} Match m to n (inclusive) of previous character/subexp.

{n,} Match n or more of previous character/subexpression

*?, ?? Lazy version of same (works for any quantifier)

*+, ?+ Possessive version (works for any quantifier)

Specific Characters

\w Word character (alphanumeric, underscore)

\W Opposite of \w

\s Whitespace character (space, tab, etc.)

\S Opposite of \s

\d Digit

\D Opposite of \d

[\b] Backspace (any use of \b in a character set)

\n Newline

\c Control character

\f Form feed

\r Carriage return

\t Tab

\v Vertical tab

\x Hexadecimal number; \xf0 matches hex f0

\0 Octal number; \021 matches octal 21

  • RSS feed StumbleUpon del.icio.us Digg Yahoo! My Web 2.0