I am trying to create a product search for my site, where a user can search for products in multiple languages and (hopefully) get fuzzy search results if there is no exact match.
- I have a
pro_search
table that has columnsid, pro_id, en, de, es, fr, it
. - The
pro_id
column refers to the id of the products in their own table. - The
en, de, es, fr, it
columns have the translated meta of each product in various languages. - The meta is just keywords seperated by spaces
$term
is the search term.$lang
refers to the users chosen language
So first I do a basic 'LIKE' SQL query to see if there are matches, if there are no results from this I query all the products and create an array sorted by their similarity using the similar_text()
function
For example I search 'shirt' this is fine if the meta for this product just includes the word 'shirt', but if the meta includes 'blue branded tshirt' this is being more descriptive and gives the user a chance to search by brand but means that the search will more than likely go fuzzy rather than be found with a LIKE
SQL query.
This is kind of working but I was wondering how this could be improved, is there a better way of searching or how do people normally do it? Should I be splitting the meta into each individual keywords and try to see how many words match rather than matching the term to the whole meta?
$ids = [];
$params = ['%'.$term.'%'];
$sql = "SELECT * FROM pro_search WHERE $lang LIKE ?";
$stmt = DB::run($sql,$params);
$count = $stmt->rowCount();
if($count > 0){
// product search
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)){
$id = $row["pro_id"];
array_push($ids,$id);
}
show_products($ids);
}else{
// product fuzzy search
$sql = "SELECT * FROM pro_search";
$stmt = DB::run($sql);
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)){
$id = $row["pro_id"];
$result = $row[$lang];
similar_text($term,$result,$similarity);
$similar_array[$similarity][] = $id;
}
$closest_match = array_keys($similar_array);
rsort($closest_match);
$match_count = count($closest_match);
for($i=0; $i<$match_count; $i++){
foreach($similar_array[$closest_match[$i]] as $id){
array_push($ids,$id);
}
}
show_products($ids);
}
I have asked similar questions before and people have pointed me to different ways of comparing the term against the meta (such as levenshtein), but everything I've seen has been comparing two simple words (like apples and oranges) and this just isn't good enough for a real life application with thousands of products and a user could search for literally anything (as in $term='literally anything';
)
Key Questions:
- Should my meta have just the product name or multiple relevant keywords (too many keywords means an individual word is less similar to the whole)?
- If I have multiple keywords in the meta should I be taking each individual keyword and comparing it against the search term?
- Also would it be possible to maybe have negative keywords for individual products.