1
votes

I am looking for a way to custom sort my Lucene.Net results where I place null-values (field does not exist on document) at the bottom no matter the direction (ascending or descending) of the sort.

The following data sums up the situation and the wanted results:

data in index    wanted sort result
data             desc    asc
----             ----    ----
100              400     100
400              300     300
null             100     400
300              null    null

My situation is that I have some products where not all products have a price. When sorting ascending, I want the cheapest products first, not the products with no price (as is the expected default behavior). The products with no price should still be in the result, but at the end, since these are least relevant when sorting on price.

I've looked quite a bit around with google and I haven't really found any answer to how you implement custom sorting in Lucene.Net 3.0.3.

The best example I've found is this answer that seems to point me in the direction I'm looking for. But the answer is old and the ScoreDocComparator it is refering to, seems to be deprecated in the original source, and thereby also in the current version 3.0.3 of Lucene.Net. The original project refers to FieldComparator as replacement, but this seems to be highly more complex to implement than the ScoreDocComparator (a lot of methods that needs to be implemented/overridden and many which could benefit of inheritance instead of duplicate implementations), and I get in doubt that this is the right path to go with?

Ideally I want to implement something generic for int/long fields where it can take fieldname in account like the SortField object, since I expect to have more fields in the future that would benefit of this custom sorting behavior.

I would think that the implementation is done somewhere around the usage of Sort/SortField class, so my ending usage code could be something like:

var sort = new Sort(new MyNullLastSortField("Price", SortField.INT, reverse));

But maybe that is also the wrong way? SortField has a constructor which takes a FieldComparator as parameter, but I can't seem to wrap my head around how this is constructed and implemented and where the actual data values from the index flows in and out.

Any help pointing me in the right direction (preferably with sample code) is much appreciated.

My failover solution (not prefered) will be to add two fields to the index that is only used to do the sorting, manually handling the null values on insert time and set them to -1 in the descending case and to 9999999 in ascending case. Then sort normally by the field with the specific fieldname for the price and the direction.

1
Have you found a solution? I probably have the time to get into it... OOTB NumericFields are not nullable. Do you have some sentinel value or a custom field type. A version of ValueComparitor (and an associated FieldComparatorSource) would be needed.AndyPook

1 Answers

1
votes

Curiosity got the best of me. Here's a solution (with caveats)

The full source is at https://github.com/AndyPook/SO_CustomSort-40744865

Extension method to add nullable ints. NumericField uses an encoding to store values, which I didn't want to get into, so I've just used a sentinel value.

public static class NumericFieldExtensions
{
    public static NumericField SetIntValue(this NumericField f, int? value)
    {
        if (value.HasValue)
            f.SetIntValue(value.Value);
        else
            f.SetIntValue(int.MinValue);

        return f;
    }
}

A custom compatitor which "understands" the sentinel. It's just a copy of lucene's IntComparator which is sealed, hence to copy. Look for int.MinValue to see the differences.

public class NullableIntComparator : FieldComparator
{
    private int[] values;
    private int[] currentReaderValues;
    private string field;
    private IntParser parser;
    private int bottom; // Value of bottom of queue
    private bool reversed;

    public NullableIntComparator(int numHits, string field, Parser parser, bool reversed)
    {
        values = new int[numHits];
        this.field = field;
        this.parser = (IntParser)parser;
        this.reversed = reversed;
    }

    public override int Compare(int slot1, int slot2)
    {
        // TODO: there are sneaky non-branch ways to compute
        // -1/+1/0 sign
        // Cannot return values[slot1] - values[slot2] because that
        // may overflow
        int v1 = values[slot1];
        int v2 = values[slot2];

        if (v1 == int.MinValue)
            return reversed ? -1 : 1;
        if (v2 == int.MinValue)
            return reversed ? 1 : -1;

        if (v1 > v2)
        {
            return 1;
        }
        else if (v1 < v2)
        {
            return -1;
        }
        else
        {
            return 0;
        }
    }

    public override int CompareBottom(int doc)
    {
        if (bottom == int.MinValue)
            return reversed ? -1 : 1;

        // TODO: there are sneaky non-branch ways to compute
        // -1/+1/0 sign
        // Cannot return bottom - values[slot2] because that
        // may overflow
        int v2 = currentReaderValues[doc];

        if (v2 == int.MinValue)
            return reversed ? 1 : -1;

        if (bottom > v2)
        {
            return 1;
        }
        else if (bottom < v2)
        {
            return -1;
        }
        else
        {
            return 0;
        }
    }

    public override void Copy(int slot, int doc)
    {
        values[slot] = currentReaderValues[doc];
    }

    public override void SetNextReader(IndexReader reader, int docBase)
    {
        currentReaderValues = FieldCache_Fields.DEFAULT.GetInts(reader, field, parser);
    }

    public override void SetBottom(int bottom)
    {
        this.bottom = values[bottom];
    }

    public override IComparable this[int slot] => values[slot];
}

Lastly a FieldComparatorSource to define the custom sort

public class NullableIntFieldCompatitorSource : FieldComparatorSource
{
    public override FieldComparator NewComparator(string fieldname, int numHits, int sortPos, bool reversed)
    {
        return new NullableIntComparator(numHits, fieldname, FieldCache_Fields.NUMERIC_UTILS_INT_PARSER, reversed);
    }
}

Some tests. See how the Sort is created for how this plugs together.

    private class DataDoc
    {
        public int ID { get; set; }
        public int? Data { get; set; }
    }

    private IEnumerable<DataDoc> Search(Sort sort)
    {
        var result = searcher.Search(new MatchAllDocsQuery(), null, 99, sort);

        foreach (var topdoc in result.ScoreDocs)
        {
            var doc = searcher.Doc(topdoc.Doc);
            int id = int.Parse(doc.GetFieldable("id").StringValue);
            int data = int.Parse(doc.GetFieldable("data").StringValue);

            yield return new DataDoc
            {
                ID = id,
                Data = data == int.MinValue ? (int?)null : data
            };
        }
    }

    [Fact]
    public void SortAscending()
    {
        var sort = new Sort(new SortField("data", new NullableIntFieldCompatitorSource()));

        var result = Search(sort).ToList();

        Assert.Equal(4, result.Count);
        Assert.Equal(new int?[] { 100, 300, 400, null }, result.Select(x => x.Data));
    }


    [Fact]
    public void SortDecending()
    {
        var sort = new Sort(new SortField("data", new NullableIntFieldCompatitorSource(),true));

        var result = Search(sort).ToList();

        Assert.Equal(4, result.Count);
        Assert.Equal(new int?[] { 400, 300, 100, null }, result.Select(x => x.Data));
    }

Note

  • Each doc MUST contain a "data" field with a valid int. You can't just omit the field
  • You'll need to make the NullableIntFieldCompatitorSource more sophisticated so that it returns the correct comparitor for your field names.
  • you'll need to create comparitors for the other numeric types. See https://github.com/apache/lucenenet/blob/3.0.3/src/core/Search/FieldComparator.cs
  • if you don't want to use sentinel values, you'll need to get into NumericField and figure out how to encode null. But that'll mean getting into several other classes