The cleanLongWords() function presented in this page can be used to
clean text with multiple consecutive characters and long words in PHP. Useful to clean texts that users add in input text fields or textarea.
By default, the function allows maximum 2 consecutive repetitions of characters in words, but not affects the numbers, and split the word to maximum 23 characters. See the comments in code, and example below.
Code of the cleanLongWords() function
-
Click to select it.
function cleanLongWords($text, $nrchr=2, $maxword=23) {
// function to clean text with multiple consecutive characters and very long words ( https://coursesweb.net/ )
// $nrchr = allowed number of consecutive character
// $maxword = maximum word length
$paterns = array();
$chr = array('q','w','e','r','t','y','u','i','o','p','a','s','d','f','g','h','j','k','l',';','z','x','c','v','b','n','m',',','!','@','#','%','&','_','=',':','"','`','~',';','â','á','é','í','ó','ú','ý','ø','č','ć','đ','š','ž','ā','ä','ǟ','ḑ','ē','ī','ļ','ņ','ō','ȯ','ȱ','õ','ȭ','ŗ','š','ț','ū','ž','ş','î','ă',"'",'$','\^','\*','\(','\)','\{','\}','\|','\?','\.','\[','\]','\/','\\\\','\>','\<');
// uncomment next line if you want to clean consecutive numbers too
/// $chr = array_merge($chr, array('1','2','3','4','5','6','7','8','9','0'));
$n_chr = count($chr);
for($i=0; $i<$n_chr; $i++) {$paterns[$i] = '/(['. $chr[$i] .']{'. $nrchr .',}){2,}/i'; }
$text = preg_replace($paterns, '$1', $text);
// if $maxword > 0, split the word to specified number of characters
return ($maxword > 0) ? wordwrap($text, $maxword, ' ', true) : $text;
}
Example usage
<?php
// Here Add the cleanLongWords() function
$text = '\\\\\\\\\ A ttttttteeeeeeeeexxxxxxt with loooooonnnnnngggggg woooooooooooords and coooooonnnnnnsssssseeeeeeccccccuuuuuuttttttiiiiiivvvvvveeeeeee characteeeeeerrrrrrssssss [[[[[]]]]]] ///////......';
$text1 = cleanLongWords($text);
$text2 = cleanLongWords($text, 3); // allow 3 consecutive repetitions of characters
$text3 = cleanLongWords($text, 1, 0); // not consecutive characters, and not split the word
echo '$text1 - '. $text1;
echo '<br>$text2 - '. $text2;
echo '<br>$text3 - '. $text3;
?>
- Result:
$text1 - \\ A tteexxt with loonngg woords and coonnsseeccuuttiivvee characteerrss [[]] //..
$text2 - \\\\\ A ttteeexxxt with looonnnggg wooords and cooonnnssseeecccuuuttti iivvveee characteeerrrsss [[[[[]]] ///...
$text3 - \ A text with long words and consecutive characters [] /.