search_simplify

Versions
4.7 – 7
search_simplify($text)

Simplifies a string according to indexing rules.

▾ 2 functions call search_simplify()

search_index_split in modules/search/search.module
Splits a string into tokens for indexing.
search_parse_query in modules/search/search.module
Parse a search query into SQL conditions.

Code

modules/search/search.module, line 336

<?php
function search_simplify($text) {
  // Decode entities to UTF-8
  $text = decode_entities($text);

  // Lowercase
  $text = drupal_strtolower($text);

  // Call an external processor for word handling.
  search_preprocess($text);

  // Simple CJK handling
  if (variable_get('overlap_cjk', TRUE)) {
    $text = preg_replace_callback('/['. PREG_CLASS_CJK .']+/u', 'search_expand_cjk', $text);
  }

  // To improve searching for numerical data such as dates, IP addresses
  // or version numbers, we consider a group of numerical characters
  // separated only by punctuation characters to be one piece.
  // This also means that searching for e.g. '20/03/1984' also returns
  // results with '20-03-1984' in them.
  // Readable regexp: ([number]+)[punctuation]+(?=[number])
  $text = preg_replace('/(['. PREG_CLASS_NUMBERS .']+)['. PREG_CLASS_PUNCTUATION .']+(?=['. PREG_CLASS_NUMBERS .'])/u', '\1', $text);

  // The dot, underscore and dash are simply removed. This allows meaningful
  // search behaviour with acronyms and URLs.
  $text = preg_replace('/[._-]+/', '', $text);

  // With the exception of the rules above, we consider all punctuation,
  // marks, spacers, etc, to be a word boundary.
  $text = preg_replace('/['. PREG_CLASS_SEARCH_EXCLUDE .']+/u', ' ', $text);

  return $text;
}
?>
Login or register to post comments
 
 

All source code and documentation on this site is released under the terms of the GNU General Public License, version 2 and later. Drupal is a registered trademark of Dries Buytaert.