2
votes

I've created a .net MVC application in C# that lists organisations, there are currently 6000 records in the database (SQLServer). The fields in the organisation table are:

  • Title (Alcohol support group)
  • ContactPerson (James Bond)
  • Content (We provide support to people with alcohol problems)
  • Keywords (Alcohol, Addiction, Alcoholic)

The current search is done using linq, e.g. :

iList<Organisation> orglist = myOrgs.Where(x => x.Title.Contains('abc') || 
                                                x.ContactPerson.Contains('abc') || 
                                                x.Details.Contains('abc') || 
                                                x.Keywords.Contains('abc'))
                                    .OrderBy(x => x.Title).ToList();

The results are then ordered by title. Which is not suitable.

If someone searches for 'Alcohol Support' I want the above result at the top of the list.

I would like the results to be ranked on the following:

  1. Full sentence matches in organisation titles.
  2. All search terms in organisation titles.
  3. Any search terms in organisation titles.
  4. Any search terms in organisation keywords.
  5. Full sentence matches in organisation content.

Looking for advice on the best way to implement this, or if anyone knows of any algorithms/libraries out there that do this already?

** Update ** Im looking at a simpler solution now, see this link:

Sorting collection based on keywords with Linq

1
The results aren't sorted by anything, they are returned in the order they were found in the source list. Your code will have to specify the sort order based on a field or ranking method with OrderBy. If no ranking field exists, your code will have to provide the comparer used to order the results. - Panagiotis Kanavos
You need a custom IComparer : docs.microsoft.com/en-us/dotnet/api/… - jdweng
If you want the database to rank results you'll have to use the database's Full-Text-Search features for that. Performance will be a lot faster than wildcard searches across multiple fields in all table rows. SQL Server for example offers FTS with ranking, accelerated by free text search indexes. This example how to perform various queries and filder by rank - Panagiotis Kanavos
BTW what database are you using? - Panagiotis Kanavos
@PanagiotisKanavos thanks, its a SQL Server DB - Dez79

1 Answers

0
votes

Summary :

  • Requirement's

    • R01 | titles | Full matches |in order
    • R02 | titles | Full matches |in any order
    • R03 | titles | Any matches |
    • R04 | keywords | Any matches |
    • R05 | content | Full matches |
  • For each Requirement We will do SQL Call

  • Each SQl Call we will return Only rows id's

  • Then we Group the Id's in order

  • We will do a final SQL call


Step 01 : R01

  • Here we will use EF

    db.Orgs.Where(w => w.Title.Contains(search_query))
    .Select(s => s.Id).ToList();
    

This use of linq2Sql Contains will be translated to sql WHERE IN

Step 02 : R02

  • here we will use plain sql WHERE + LIKE + AND

    Select Id From Orgs where 
    Title LIKE '%' + @param0 +'%' 
    and Title LIKE '%' + @param1 +'%' 
    

Step 03 : R03

  • here we will use plain sql WHERE + OR + And

    Select Id From Orgs 
    where Title LIKE '%' + @param0 +'%'  
    or  Title LIKE '%' + @param1 +'%' 
    

Step 04 : R04

  • here we will use plain sql WHERE + OR + And

    Select Id From Orgs 
    Where Keywords LIKE '%' + @param0 +'%' 
    or  Keywords LIKE '%' + @param1 +'%' 
    

Step 05 : R05

  • Here we will use EF

        db.Orgs
        .Where(w => w.Content.Contains(search_query)).
        Select(s => s.Id).ToList();
    

This use of linq2Sql Contains will be translated to sql WHERE IN

Step 06 - Group The Id's and ignore duplicable one's

  • using the row id's form step 1 to 5

we will sort the id's base on order retrieval

        var ids = new Dictionary<int, int>();

        foreach (var id in Ids1)
        {
            int val;
            if (!ids.TryGetValue(id, out val))
            {
                ids.Add(id, ids.Count());
            }

        };
        .
        .

Step 07 - Re-order

         ids.OrderByDescending(o => o.Value)
        .Select(s => s.Key) .ToArray();

Step 08 - Now we wi use The Oredred Id's to Get The data

  • here we will use plain sql ORDER BY + CASE WHEN THEN ELSE END

    Select * from Orgs 
    where  Id in ( 2 , 1 )  
    ORDER BY CASE id  
    WHEN 2 THEN 0 
    WHEN 1 THEN 1 
    ELSE 2 END
    

Step 09 Full Code


    using System;
    using System.Collections.Generic;
    using System.Linq;

    namespace ConsoleApp9
    {
    class Program
    {
    static void search(string search_query)
    {
    //////////////////////////////////////////////////
    var terms = search_query.Split(' ');
    //////////////////////////////////////////////////
    var Ids1 = db.Orgs.
    Where(w => w.Title.Contains(search_query))
    .Select(s => s.Id).ToList();

    var Ids2 = db.Database
    .SqlQuery<int>(getWhere("Title", "AND"), terms)
    .ToList();

    var Ids3 = db.Database
    .SqlQuery<int>(getWhere("Title", "OR"), terms)
    .ToList();

    var Ids4 = db.Database
    .SqlQuery<int>(getWhere("Keywords", "OR"), terms)
    .ToList();

    var Ids5 = db.Orgs
    .Where(w => w.Content.Contains(search_query))
    .Select(s => s.Id).ToList();

    var ordered_ids = reorderList(Ids1, Ids2, Ids3, Ids4, Ids5);

    var OrderedData = db.Database.SqlQuery<Org>(getOrdered(ordered_ids)).ToList();

    //////////////////////////////////////////////////

    foreach (var item in OrderedData)
    {
        Console.WriteLine($"{item.Id} - {item.Title} - {item.ContactPerson } - {item.Keywords } - {item.Content  }");

    }

    //////////////////////////////////////////////////
    Console.ReadLine();
    //////////////////////////////////////////////////
    string getWhere(string _column, string _oprator)
    {
        var val = "Select Id From Orgs where ";
        for (int i = 0; i < terms.Length; i++)
        {
            if (i > 0) val += @" " + _oprator + " ";
            val += @" " + _column + " LIKE '%' + {" + i + "} +'%'  ";
        }
        return val;
    }
    //////////////////////////////////////////////////
    string getOrdered(int[] _ids_ordered)
    {
        var val = "Select * From Orgs where ";
        val += " Id in ";
        for (int i = 0; i < _ids_ordered.Length; i++)
        {
            if (i == 0) val += "( ";
            if (i > 0) val += " , ";
            val += _ids_ordered[i];
            if (i == terms.Length - 1) val += " ) ";
        }
        val += " ORDER BY CASE id ";
        for (int i = 0; i < _ids_ordered.Length; i++)
        {
            val += " WHEN " + _ids_ordered[i] + " THEN " + i;
        }
        val += " ELSE " + _ids_ordered.Length + " END ";

        return val;
    }
    //////////////////////////////////////////////////
    int[] reorderList(List<int> _Ids1, List<int> _Ids2,
        List<int> _Ids3, List<int> _Ids4, List<int> _Ids5)
    {
        var idsDic = new Dictionary<int, int>();

        var idsArr = new List<int>[5] { Ids1, Ids2, Ids3, Ids4, Ids5 };
        for (int i = 0; i < 5; i++)
        {
            idsArr[i].ForEach(id =>
            {
                if (!idsDic.TryGetValue(id, out int val))
                    idsDic.Add(id, idsDic.Count());
            });
        };
        var o_ids = idsDic.OrderByDescending(o => o.Value)
                .Select(s => s.Key).ToArray();
        return o_ids;
    }
    }

    static Model1 db = new Model1();

    static void Main(string[] args)
    {
    string search_quer = "Alcohol Support";
    Console.WriteLine($"searching for {search_quer}");
    search("Alcohol Support");
    }
    }


    }

Note 01 : Sql Injection

  • What is Sql Injection

    SQL injection is a code injection technique, used to attack data-driven applications, in which nefarious SQL statements are inserted into an entry field for execution

Note 01.01 :The Problem

Note 01.02 : Microsoft documentation

  • From Microsoft documentation :How to: Directly Execute SQL Queries | Microsoft Docs

    The parameters are expressed in the query text by using the same curly notation used by Console.WriteLine() and String.Format(). In fact, String.Format() is actually called on the query string you provide, substituting the curly braced parameters with generated parameter names such as @p0, @p1 …, @p(n). enter image description here

Note 01.03 : In this project

  • When using EF 6.2

    var sql2 = " Select Id From Orgs where ";
    for (int i = 0; i < terms.Length; i++)
    {
        if (i > 0) sql2 += @" and ";
        sql2 += @" Title LIKE '%' + {" + i + "} +'%'  ";
    }
    
  • Will generate :

    Select Id From Orgs where  
    Title LIKE '%' + {0} +'%'   
    and  Title LIKE '%' + {1} +'%'  
    
  • In sqlserver using SQL Server Profiler

    exec sp_executesql N' Select Id From Orgs where  
    Title LIKE ''%'' + @p0 +''%''   and  Title 
    LIKE ''%'' + @p1 +''%''  ',N'@p0 nvarchar(7)
    ,@p1 nvarchar(7)',@p0=N'Alcohol',@p1=N'Support'
    

Note 01.04 : Another format

  • we can also use SqlParameter class

    var sql4 = " Select Id From Orgs where  ";
    var sql4_parameter = new List<SqlParameter>();
    for (int i = 0; i < terms.Length; i++)
    {
        if (i > 0) sql4 += @" or ";
        sql4 += @" Keywords LIKE '%' + @param" + i + " +'%'  ";
        sql4_parameter.Add(new SqlParameter("param" + i, terms[i]));
    }
    
  • and here is sql

     Select Id From Orgs 
     Where Keywords LIKE '%' + @param0 +'%' 
     or  Keywords LIKE '%' + @param1 +'%'