2
votes

My lucene index has got latitude and longitudes fields indexed as follows:

doc.Add(new Field("latitude", latitude.ToString() , Field.Store.YES, Field.Index.UN_TOKENIZED));

doc.Add(new Field("longitude", longitude.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED));

I want to retrieve a set of documents from this index whose lat and long values are in a given range.

As you already know, Lat and long could be negative values. How do i correctly store signed decimal numbers in Lucene? Would the approach mentioned below give correct results or is there any other way to do this?

 Term lowerLatitude = new Term("latitude", bounds.South.ToString() );
                Term upperLatitude = new Term("latitude", bounds.North.ToString());
                RangeQuery latitudeRangeQuery = new RangeQuery(lowerLatitude, upperLatitude, true);
                findLocationQuery.Add(latitudeRangeQuery, BooleanClause.Occur.SHOULD);



                Term lowerLongitude = new Term("longitude", bounds.West.ToString());
                Term upperLongitude = new Term("longitude", bounds.East.ToString());
                RangeQuery longitudeRangeQuery = new RangeQuery(lowerLongitude, upperLongitude, true);
                findLocationQuery.Add(longitudeRangeQuery, BooleanClause.Occur.SHOULD);

Also,I wanted to know how Lucene's ConstantScoreRangeQuery is better than RangeQuery class.

Am facing another problem in this context: I've one of the documents in the index with the following 3 cities:

  • Lyons, IL

    Oak Brook, IL

    San Francisco, CA

If i give input as "Lyons, IL" then this record comes up. But if i give San Francisco, CA as input, then it does not.

However, if i store the cities for this document as follows:

  • San Francisco, CA

    Lyons, IL

    Oak Brook, IL

    and when i give San Francisco, CA as input, then this record shows in the search results.

What i want here is that if i type any of the 3 cities in input,I should get this document in the search results.

Please help me achieve this.

Thanks.

3
This is really 3 separate questions. Why don't you split it?itsadok
Here. I did the first step for you: stackoverflow.com/questions/1054719itsadok

3 Answers

3
votes

Following up on skaffman's suggestion, you can use the same tile coordinate system used by all the popular map apps. Choose whatever zoom level is granular enough for your needs, and don't forget to pad with leading zeros.

Regarding RangeQuery, it's slower than ConstantScoreRangeQuery and limits the range of values.

Regarding the city-state problem, we can only speculate. But the first things to check are that the indexed terms and the parsed query are what you expect them to be.

1
votes

I think the best way is to convert/normalize the coordinates as suggested in the previous post. This article does exactly this. It's actually quite nice object orientated code.

Regarding your second problem. I would assume you have some sort of Analyzer problem. Are you using the same Analyzer for indexing and querying? Which tokenizers do you use?

I recommend to use Luke to inspect your generated index to see what tokens are actually searchable.

--Hardy

0
votes

One option here is to convert the coordinates into a system that doesn't have negative numbers. For example, I've had a similar problem for a google maps webapp for the UK, and I stored UK Easting/Northings (which range from 0 to 7 digits) fields in Lucene alongside the lat/long values. By formatting these eastings/northings with left-padded zeroes, I could do lucene range queries.

Is there a similar coordinate system for the US?