Click me

Regex Cheat Sheet

133HS IV3HD )3938 CHEAT SHEET THE ULTIMATE CHEAT SHEET FOR REGULAR EXPRESSIONS A super quick reference guide for regular expressions (regex), including symbols, ranges, grouping, assertions and some sample patterns to get you started. Anchors, or atomic zero-width assertions, specify a position in the string where a match must occur. When you use an anchor in your search expression, the regular expression engine does not advance through the string or consume characters; it looks for a match in the specified position only. For example, ^ specifies that the match must start at the beginning of a line or string. Therefore, the regular expression Ahttp: matches "http:" only when it occurs at the beginning of a line. The following table lists the anchors supported by the regular expressions in the .NET Framework. ANCHORS Start of string, or start of line in multi-line pattern Start of string End of string, or end of line in multi-line pattern End of string Word boundary Not word boundary LA $ IZ \b \B Start of word 1> End of word ZINITH data Isystems CHARACTER CLASSES Character Classes in regular expressions match a selection of characters at once. For example, "Id" will match any digit from 0 to 9 inclusive. "w" will match letters and digits, and "W" will match everything and digits. \c Control character White space Not white space A pattern to indentify letters, numbers or whitespace could be: \w\s \s IS Digit Not digit Word ld \D Iw IW Not word Hexadecimal digit Octal digit POSIX Portable Operating System Interface for unix" is a [:upper:] [:lower:] [talpha:] [:alnum:] [:digit:] [:xdigit:] [:punct:] [:blank:] [space:] [:cntrl:] [:graph:] [:print:] [:word:] Upper case letters collection of standards that define some of the Lower case letters functionality that a (UNIX) operating system should support. One of these standards defines two flavors of regular expressions. Commands involving regular expressions, such as grep and egrep, implement these flavors on POSIX-compliant UNIX systems. Several database systems also use POSIX regular expressions All letters Digits and letters Digits Hexadecimal digits Punctuation Space and tab Blank characters Control characters Printed characters Printed characters and spaces Digits, letters and underscore ASSERTIONS IBM Assertions are tricky to get to grips with, but once you are familiar with them, you will use them alarmingly often. They provide a way to say "I want to find out every word in this document with a q in it, as long as that q isn't followed by werty". The above code starts by matching non-whitespace characters ([^\s]*), then a q (err .. q). Then the parser reaches the lookahead assertion. This makes the q conditional. The q will only be matched if the assertion is true. In this case, the assertion is a negative assertion. It ?= Lookahead assertion ?! Negative lookahead [^\s]"q{?!werty)[^\s]* ?<= Lookbehind assertion Negative lookbehind Once-only Subexpression Condition [if then] Condition [if then else] ?!= or ? ?0 ?01 will be true if what it checks for is not found. ?# Comment Quantifiers allow you to specify a part of a pattem that must be matched a certain number of times. For example, if you QUANTIFIERS wanted to find out if a document contained between 10 and 20 (inclusive) of the letter "a" in a row, you could use this pattern: a(10,20) Quantifier are "greedy" by default. So the quantifier "+", which means "one or more", will match as many items as possible. This can be a problem on occasion, so you can tell a quantifier to not be greedy (to be "lazy"), using a modifier. Consider the following code: O or more {3} Exactly 3 1 or more {3,} 3 or more O or 1 {3,5} 3, 4 or 5 ? Add a ? to a quantifier to make it ungreedy. ESCAPE Regex use symbols to represent certain things. However, that presents a problem if you want to detect a character in a string where that character is a symbol. A period (".") for example, in a regular expression, represents "any character except the new line character". If you want to find a period in a string, you can't just use "." as a pattern - it will match just about everything. So, you need to tell the parser to treat the period as a literal period rather than a special character. Do with an escape character. S3DUBNÒ3S Escape following character Begin literal sequence End literal sequence \Q \E "Escaping" is a way of treating characters which have a special meaning in regular expressions literally, rather than as special characters. SPECIAL CHARACTERS Special characters in regular expressions represent unusual elements in text. New lines and tabs, for example, can be typed using a keyboard, but are likely to trip up programming languages. The special characters use the escape character as well, to tell the regular expression parser that the following character is to be treated as a special character rather than a normal letter or number. In New line \r Carriage return It Tab Iv Vertical tab \f Form feed XxX Octal character xxX \xhh Hex character hh 001000000I 9 B POVER A metacharacter is a special character in a COMMON META program or data field that provides information about other characters. It can express ideas on how to process the characters that follow the metacharacter, as the backslash character CHARACTERS sometimes is used to indicate that the charac- ters following it are to be treated in a special way. A common metacharacter usage is the wildcard character , which can represent any one character or any string of characters. 2$ { The escape character is usually \ R T V N GROUPS AND RANGES Groups and ranges are very very useful. Ranges are perhaps the easiest place to begin. They allow you to specify a selection of characters to match, Groups are essential to regular expressions, and are most often used when you want to use "or" in a pattern, or you want to reference part of a pattern later in the same pattem, or where using regular expression string replacement. Any character except new line (\n) a or b (alb) (...) (?:.) [abc] [^abc] [a-q] [A-Q] [0-7] Group Passive (non-capturing) group Range (a or b or c) Not a or b or c Lower case letter from a to q Upper case letter from A to Q Digit from 0 to 7 Group/subpattern number "x" \x PATTERN MODIFIERS Pattern modifiers are used in several languages, most notably Perl. These allow you to change how the parser works. For example, the "i" modifier will tell the parser to ignore case. In Perl, regular expressions contain the same character at the beginning and end. This can be any character at all (often "/"), and is used like so: g Global match i* Case-insensitive Multiple lines Treat string as single line Allow comments and whitespace in pattern Evaluate replacement Ungreedy pattern m /pattern/ Modifiers would be added at the end of this, like so: s* e * /pattern/i U* PCRE modifier ZENITN data ISYstems STRING REPLACEMENT String replacement has already been covered above, however one small addition to note is the existence of "passive" groups. These are groups that are ignored for the purposes of replacement. This is very useful when you want to match something that requires an "or" section, but don't want it in the replacement. $n nth non-passive group "xyz" in /^(abc(xyz))$/ "xyz" in /^(?:abc)(xyz)$/ Before matched string After matched string Last matched string Entire matched string $2 $1 $ $ $+ $& Some regex implementations use \ instead of $. INFOGRAPHIC BY se up blog How to Build A Money Making Blog In 8 hours b_5511553.html

Regex Cheat Sheet

shared by LanaMc on Jan 10
1 share
We’ve put together a really simple to use Regex Cheat Sheet in the form of an infographic. Regular Expressions (Regex) is an extremely powerful and useful tool set, but so many people find their unu...




Did you work on this visual? Claim credit!

Get a Quote

Embed Code

For hosted site:

Click the code to copy


Click the code to copy
Customize size