RegExp - Regular Expressions

RegExp (Regular Expresions) is a pattern string describing the set of possible strings that can be formed with that pattern, following certain rules. These regular expressions use parenthesis (round, square, braces) and special characters that form rules for forming words.
To start, let's see some simple patterns.
In PHP, the RegExp pattern is usually write as a string within two forward slash characters "/" ("/regexp/"), but you can use any other nonalphanumeric character (other than the backslash), as long as you use the same characters on both ends of the pattern and they are not among the character pattern for which you are looking (e.g #regexp#)


  - The fallowing regular expression: /s[ak]y/ can form the fallowing words: say and sky (an expression added within square brackets "[ak]", is called class pattern)
  - A pattern for strings that may contain only vowels it can be made using the expression: /[aeiou]/ (by including the possible values ​​in class pattern).
  - If you wish to allow uppercase vowels, add them too, /[aeiouAEIOU]/ (or you can use the "i" modifier, /[aeiou]/i - modifiers are presented below).
  - For strings that may include any letters written in lower case, you can write: /[abcdefghijklmnopqrstuvwxyz]/. Or a more compact form: /[a-z]/, this expression means "a series of consecutive characters from 'a' to 'z'".
  - Similarly, the pattern /[0-9]/ represent strings that contain only numbers.

To match a certain number of characters, put the quantity between curly braces, adding the minimum and maximum number of allowed characters.
  - For example, the regular expression: /[aeiou]{2,4}/, matches any string that contain only vowels and has 2, 3 or 4 characters ("ai", "oue", "auio", etc.).

To specify that the characters within square brackets may be repeated in the string, use "+" or "*" after square brackets.
  - As an example, /s[ak]+y/ would match: sky, saay, saakyy, etc.

To specify the repetition of several parts of a regular expression, include those parts between round brackets. (an expression added within round brackets, is called subpattern)
  - The fallowing RegExp, /(s[ak]y ){2,3}/ corresponds to a number of two or three repetitions of any of the strings: "say " and "sky ". This pattern would match: "say sky ", "say sky say ", etc. (Notice the space character after "y" in this RegExp, must be the same in the matching strings, with a space after "y").

There are several special characters that are used in forming regular expressions.
If a circumflex accent (^) is the first symbol added inside square brackets, it has the effect of reversing the regular expression placed between those parentheses.
  - So, /[^aeiou]/ will match any non-vowel string.
  - /[^a-z]/ matches any character that is not a lowercase letter.
When this character (^) is placed outside the square brackets, it represents the beginning of the string or line.
  - Regular expression /^s[ak]y/ corresponds to sub-string "say" or "sky" only if they are at the beginning of the string subject.
There is also the dollar sign ($), which marks the conclusion of a pattern, the end of the string or line.
  - /s[ak]y$/ will correspond to "say" or "sky" only if they are at the end of the string subject.

• Here is a list of more special characters and their role in regular expressions: - For example, /[ho|ca]me/ corresponds to home and came words.
To put these characters (+ , * , ? , < , > ( , { , [ , ...) in a regexp pattern, disabling their special role, you must prefix them with a backslash character "\".
    For example, /[0-9]\*[0-9]/ matches a multiplication between two numbers ( "*" is no longer a repetition factor).

• Besides these characters there are special formulas for shortening regexp expressions: - For example: /[\d\s]+/ match strings that contain only numbers and white spaces.

• Here are some examples of regular expressions:
• Besides the special characters and formulas used for shortening the regular expression, there are also other special letters called modifiers. They have a special role only if they are placed after the closing delimiter ("/regexp/mods"), and alter the behavior of a regular expression.
The most used RegExp modifiers are listed below: You can add one or more modifiers at the end of the pattern.
- Example: /\d{3}-[a-z]+/i - searches for "nnn-word" sub-strings, "nnn" is a 3-digit number and "word" can contain uppercase letters too.

More details and examples about RegExp can be found in the manual, Regular Expressions (Perl compatible).

• Usually, regular expressions are used in PHP for string matching and string substituting. PHP has special functions for these operations.

String matching - preg_match

preg_match function searches a string for a match to the regular expression given in pattern.
  - Syntax:
preg_match("pattern", "string", $matches)
- "pattern" - The RegExp pattern to search for.
- "string" - The input string.
- $matches - It is optional. If it's added, will contain the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
preg_match() returns the number of times pattern matches, 0 times (no match) or 1 time because preg_match() will stop searching after the first match.

• Let's see some examples with preg_match():
  1) Looking for the string "courses" anywhere within the overall provided string.
<?php
$regexp = '/courses/i';                    // the pattern
$str = 'Free Courses and tutorials';       // the subject string

if (preg_match($regexp, $str)) {
  echo 'A match was found';
}
else {
  echo 'No match';
}
?>
- The "i" after the pattern delimiter indicates a case-insensitive search.
Output:
A match was found

  2) Validate an Email address.
<?php
$regexp = '/^([a-zA-Z0-9]+[a-zA-Z0-9._%-]*@([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,4})$/';         // an e-mail address pattern
$email = 'some_name789@emailserver.net';             // the e-mail address

if (preg_match($regexp, $email)) {
  echo 'Correct email address';
}
else {
  echo 'Incorrect email address';
}
?>
Output:
Correct email address

  3) Getting the URL out of a HTML link.
<?php
$regexp = '/href=["|\'](.[^"|\']+)/i';
$url = '<a href="http://coursesweb.net/php-mysql" title="PHP MySQL">coursesweb.net</a>';

if (preg_match($regexp, $url, $matches)) {
  $href = $matches[1];
  echo $href;
}
else {
  echo 'No match found';
}
?>
Output:
http://coursesweb.net/php-mysql

preg_match() stops searching after the first match. If you want to get all matching data in a string, use preg_match_all(), this function will continue searching until it reaches the end of subject, and puts all matches in an Array.
  - Example with preg_match_all(). Getting the content of all <li> tags that have class="cls":
<?php
$regexp = '#\<li class="cls"\>(.*?)\</li\>#i';
$html = '<ul>
  <li class="cls">www.marplo.net</li>
  <li class="cls">Courses and tutorials</li>
  <li>www.google.com</li>
  <li class="cls">coursesweb.net</li>
 </ul>';

// get and print the array with all matches
if (preg_match_all($regexp, $html, $matches)) {
  $li_cls = $matches[1];
  print_r($li_cls);
}
else {
  echo 'No match found';
}
?>
Output:
Array ( [0] => www.marplo.net [1] => Courses and tutorials [2] => http://coursesweb.net )

String substituting - preg_replace

To perform pattern searching and replacing, use the preg_replace function.
  - Syntax:
preg_replace($pattern, $replacement, $subject)
- $pattern - The RegExp pattern to search for. It can be either a string or an array with strings.
- $replacement - The string or an array with strings to replace.
- $subject - The string or an array with strings to search and replace
If both $pattern and $replacement parameters are arrays, each pattern will be replaced by the replacement counterpart.
preg_replace() returns an array if the $subject parameter is an array, or a string. If matches are found, the new $subject will be returned, otherwise $subject will be returned unchanged, or NULL on error.

• Let's see two examples with preg_replace():
  1) Replacing a string (coursesweb) with another string (marplo).
<?php
$regexp = '/coursesweb/i';
$replacement = 'marplo';
$str = 'Free PHP courses and tutorials: <a href="http://coursesweb.net/php-mysql" title="PHP MySQL">coursesweb.net</a>';

$new_str = preg_replace($regexp, $replacement, $str);

echo $new_str;
?>
Output:
Free PHP courses and tutorials: <a href="http://www.marplo.net/php-mysql" title="PHP MySQL">www.marplo.net</a>

  2) Using an Array with RegExp paterns to replace two diferent values same time.
<?php
// first replace 7 with 9, then 10 or 15 with 7
$regexp = array('/7/', '/(10|15)/');          // array with patterns
$replacements = array('9', '7');              // array with replacements
$data = '1976-10-15';

$new_data = preg_replace($regexp, $replacements, $data);

echo $new_data;
?>
Output:
1996-7-7