RegExp (Regular Expresions) is a pattern describing the set of possible strings that can be formed with that pattern, following certain rules. These regular expressions use parenthesis (round, square, braces) and special characters that form rules for forming words, or any string.
In ActionScript, the RegExp pattern is write within two forward slash characters "/" ( /regexp/ ).
To start, let's see some simple patterns.
- The fallowing regular expression:
/s[ak]y/ can form the fallowing words:
say and
sky (an expression added within square brackets "
[ak]", is called
class pattern)
- A pattern for strings that may contain only vowels:
/[aeiou]/ (
by placing the letters that you want to match inside square brackets:).
- If you wish to allow uppercase vowels, add them too,
/[aeiouAEIOU]/ (or you can use the "i" modifier,
/[aeiou]/i -
modifiers are presented below).
- For strings that may contain any letters written in lower case, you can write:
/[abcdefghijklmnopqrstuvwxyz]/. Or you can use ranges with the dash character (-):
/[a-z]/ (this expression means "
a series of consecutive characters from 'a' to 'z'").
- Similarly, the pattern
/[0-9]/ represent strings that contain only numbers.
To match a certain number of characters, put the quantity between curly braces, adding the minimum and maximum number of allowed characters.
- For example, the regular expression:
/[aeiou]{2,4}/, matches any string that contain only vowels and has 2, 3 or 4 characters ("
ai", "
oue", "
auio", etc.).
To specify that the characters within square brackets may be repeated in the string, use "+" or "*" after square brackets.
- For example,
/s[ak]+y/ would match:
sky, saay, saakyy, etc.
To specify the repetition of a subpattern of a regular expression, place that subpattern between round brackets.
- The fallowing RegExp,
/(s[ak]y ){2,3}/ corresponds to a number of two or three repetitions of any of the strings: "
say " and "
sky ". This pattern would match:
"say sky ", "say sky say ", etc. (Notice the space character after "y" in this RegExp, it must be the same in the matching strings, with a space after "y").
There are several special characters that are used in regular expressions.
If a circumflex accent (^) is the first symbol added inside square brackets, it has the effect of reversing the regular expression placed between those parentheses.
- So,
/[^aeiou]/ will match any non-vowel string.
-
/[^a-z]/ matches any character that is not a lowercase letter.
When this character (^) is placed outside the square brackets, it represents the beginning of the string or line.
- The regular expression
/^s[ak]y/ corresponds to sub-string "say" or "sky" only if they are at the beginning of the string subject.
There is also the dollar sign ($), which marks the conclusion of a pattern, the end of the string or line.
-
/s[ak]y$/ will correspond to "say" or "sky" only if they are at the end of the string subject.
• Here is a list of more special characters and their role in regular expressions:
- ^ - Indicates the beginning of a string
- $ - Indicates the end of a string
- . - Any single character except newline
- () - subpattern
- [] - class pattern (a character of the ones within square parentheses)
- [^] - any character except those in square brackets
- / - Escape character (disable the special role of the character in front of which is added)
- + - The character (or expression) before this sign should repeat at least one time (to infinite)
- * - The character (or expression) before this sign can repeat it 0 to infinite
- ? - The character (or expression) before this sign may repeat it 0 or 1 time
- | - Alternatives (or)
- {x} - Exactly "x" occurrences
- {x,y} - Between "x" and "y" occurrences
- {x,} - At least x occurrences
- \r - new row ("\r\n" for windows)
- \t - Tab
- For example,
/[ho|ca]me/ corresponds to
home and
came words.
To put these characters (+ , * , ? , < , > ( , { , [ , | ...) in a regexp pattern, disabling their special role, you must prefix them with a backslash character "\".
For example,
/[0-9]\*[0-9]/ matches a multiplication between two numbers ( "*" is no longer a repetition factor).
• Besides these characters there are special formulas for shortening regexp expressions:
- \w - Alphanumeric characters plus "_". Equivalent: [a-zA-Z_]
- \W - Non-word characters. Equivalent: [^a-zA-Z_]
- \s - Whitespace characters. Equivalent: [ \t\r\n\v\f]
- \S - Non-whitespace characters. Equivalent: [^ \t\r\n\v\f]
- \d - Decimal digit characters. Equivalent: [0-9]
- \D - Non-digits. Equivalent: [^0-9]
- For example:
/[\d\s]+/ matches strings that contain only numbers and white spaces.
• Here are some examples of regular expressions:
- (.*) - represents all characters (by ".") repeated as often as possible (by "*")
- (fa|te)rms - matches "farms" and "terms"
- ^www.+net$ - strings that beginns with "www" and ends with "net"
- ^www\.[a-z0-9]+\.com$ - matches the "www.__.com" strings, the "__" can be any word that contains lowercase letters and numbers
- (^-\+[0-9]*) - any number that starts with "-" or "+"
- \<tag\>(.*?)\<\/tag\> - represents the content within <tag>...</tag>
- \<tag\>(.[^\<]+) - The string from <tag> till the first "</"
- ^([a-zA-Z0-9]+[a-zA-Z0-9._%-]*@([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,4})$ - Regular expression for email addresses
- ^(http://|https://)?([^/]+) - Regular expression for domain name of a URL
• Besides the special characters and formulas used for shortening the regular expression, there are also other special letters called modifiers. They have a special role only if they are placed after the closing delimiter ("
/regexp/mods"), and alter the behavior of a regular expression.
The most used RegExp modifiers are listed below:
- g - (global) - allows the expression to be used repeatedly on the source text until there are no more matches. When it is not set, the expression will return the first match.
- i - (ignore-case) - letters in the pattern match both upper and lower case letters (for case-insensitive comparisons).
- m - (multiline) - change the role of "^" and "$". If "multiline" is not specified, they indicate the beginning and end of the text of the regexp, but when this modifier is added, they indicate the beginning and the end of the whole line.
- s - (dotall) - makes the dot metacharacter in the pattern matches all characters, including newlines.
- x - (extended) - If this modifier is set, whitespace in a RegExp pattern is ignored except when escaped or inside a character class.
You can add one or more modifiers at the end of the pattern.
- Example:
/\d{3}-[a-z]+/gi - searches for all "
nnn-word" sub-strings ("nnn" is a 3-digit number and "word" can contain uppercase letters too).
• Usually, regular expressions are used in ActionScript for string matching and string replacing. ActionScript has special functions for these operations.
Applying Regular Expressions
Regular expressions in ActionScript 3.0 are instances of the RegExp class.
Regular expressions can also be written as literals, between two slashes /.../.
- The general form of a regular expression is:
var reg1:RegExp = /regular_expression/g;
var reg2:RegExp = new RegExp("regular_expression", "g");
There are two ways to use regular expressions in ActionScript: using methods of the String class ("match()", "search()" "replace()"), or functions of the RegExp class ("test()" and "exec()").
• test() - is used to test if some text matches a certain pattern. Returns
true, or
false if no match is found.
- Syntax:
RegExp.test("string")
- Example:
// RegExp with expression that coresponds to the pattern: "word-nr3" (nr3 = a number with 3 digits)
var reg:RegExp = /[a-z]+-\d{3}/gi;
// Strings that will be checked
var str1:String = 'CoursesWeb - Courses-008 and Tutorials-137';
var str2:String = 'AS3 Lessons-37';
// checks with test() if there is a substring in "str1" and "str2" that matches the pattern in 'reg'
trace(reg.test(str1)); // true
trace(reg.test(str2)); // false
As you can notice, the test() method returns true if in the tested string there is a substring that matches the pattern specified in "reg" variable; if no match is found, returns false.
• search() - (belongs to String class) searches for the specifed pattern and returns the index of the first matching substring. If there is no matching substring, it returns -1.
- Syntax:
String.search(RegExp)
- Example:
// RegExp with expression that coresponds to the pattern: "word-nr3" (nr3 = a number with 3 digits)
var reg:RegExp = /[a-z]+-\d{3}/gi;
// Strings that will be checked
var str1:String = 'CoursesWeb - Courses-008 and Tutorials-137';
var str2:String = 'AS3 Lessons-37';
// searches for the index of the substring that matches the pattern in 'reg'
trace(str1.search(reg)); // 13
trace(str2.search(reg)); // -1
• exec() - returns an object with the substring that matches a RegExp, and its
index location. The location is stored in a property
index. If there is no matching substring, it returns
null.
- Syntax:
RegExp.exec("string")
To find all the substrings that match a regular expression, use the exec() method with a
while() statement.
- Example:
// RegExp with expression that coresponds to the pattern: "word-nr3" (nr3 = a number with 3 digits)
var reg:RegExp = /[a-z]+-\d{3}/gi;
// the string to be checked
var str1:String = 'CoursesWeb - Courses-008 and Tutorials-137';
// the object which will store the substrings and their location, returned by exec()
var exc:Object;
// with while() and exec() checks the whole string in 'sir1'
while(exc=reg.exec(str1))
{
trace(exc[0]+ ' = '+ exc.index); // Returns: substring = location
}
/* Displays:
Courses-008 = 13
Tutorials-137 = 29
*/
If you not use a while() statement, exec() returns only the first substring (and its location) which matches the RegExp pattern.
• match() - returns an array with all substrings that match the specified pattern. If the global (g) is set, it returns all matches; otherwise it returns the first one. If no match is found, it returns
null.
- Syntax:
String.match(RegExp)
- Example:
// RegExp with expression that coresponds to the pattern: "word-nr3" (nr3 = a number with 3 digits)
var reg:RegExp = /[a-z]+-\d{3}/gi;
// the string to be checked
var str1:String = 'CoursesWeb - Courses-008 and Tutorials-137';
// adds in an Array the substrings returned by match(), that match "reg"
var ar_matc:Array = str1.match(reg);
// if 'ar_matc' contains at least an item, displays the first substring found
if(ar_matc.length>0) trace(ar_matc[0]); // Courses-008
• replace() - replaces the substring (or substrings) that matches a specified RegExp, with another content. If the global (g) is set, the method searches and replaces all substrings that match the pattern. If the global (g) is not set, the method stops after the first substring.
Returns the new replaced string, but without affecting the initial string.
- Syntax:
String.replace(RegExp, 'new_content')
- Example:
// RegExp with expression that coresponds to the pattern: "word-nr3" (nr3 = a number with 3 digits)
var reg:RegExp = /[a-z]+-\d{3}/gi;
// the string to be checked
var str1:String = 'CoursesWeb - Courses-008 and Tutorials-137';
// adds in a String type variabila the new string returned by replace()
var str1_mod = str1.replace(reg, 'another_text');
// uses trace() to check the initial string and the new string
trace(str1); // CoursesWeb - Courses-008 and Tutorials-137
trace(str1_mod); // CoursesWeb - another_text si another_text
- To download the FLA file with the examples presented in this lesson, click:
RegExp - Regular Expressions in ActionScript.