I have recently upgraded my search code from lucene.net 2.9.4 to 3.0.3. I have noticed a change in the spatial packages and have updated my code accordingly. One drawback from the upgrade that I have noticed is much slower index times. Through process of elimination, I have been able to narrow the slowness down to the new spatial code that indexes the lat/long coordinates:
public void AddLocation (double lat, double lng)
{
try
{
string latLongKey = lat.ToString() + "," + lng.ToString();
AbstractField[] shapeFields = null;
Shape shape = null;
if (HasSpatialShapes(latLongKey))
{
shape = SpatialShapes[latLongKey];
}
else
{
if (this.Strategy is BBoxStrategy)
{
shape = Context.MakeRectangle(DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat), DistanceUtils.NormLatDEG(lat));
}
else
{
shape = Context.MakePoint(DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat));
}
AddSpatialShapes(latLongKey, shape);
}
shapeFields = Strategy.CreateIndexableFields(shape);
//Potentially more than one shape in this field is supported by some
// strategies; see the javadocs of the SpatialStrategy impl to see.
foreach (AbstractField f in shapeFields)
{
_document.Add(f);
}
//add lat long values to index too
_document.Add(GetField("latitude", NumericUtils.DoubleToPrefixCoded(lat), Field.Index.NOT_ANALYZED, Field.Store.YES, 0f, false));
_document.Add(GetField("longitude", NumericUtils.DoubleToPrefixCoded(lng), Field.Index.NOT_ANALYZED, Field.Store.YES, 0f, false));
}
catch (Exception e)
{
RollingFileLogger.Instance.LogException(ServiceConstants.SERVICE_INDEXER_CONST, "Document",string.Format("AddLocation({0},{1})", lat.ToString(), lng.ToString()), e, null);
throw e;
}
}
With 2.9.4, I was able to index about 300,000 rows of data with lat/lng points in about 11 minutes. With this new spatial package it takes upwards of 5 hours (I've killed the test before it finishes so I don't have an exact timing for it). Here is the spatial context/strategy I am using:
public static SpatialContext SpatialContext
{
get
{
if (null == _spatialContext)
{
lock (_lockObject)
{
if(null==_spatialContext) _spatialContext = SpatialContext.GEO;
}
}
return _spatialContext;
}
}
public static SpatialStrategy SpatialStrategy
{
get
{
if (null == _spatialStrategy)
{
lock (_lockObject)
{
if (null == _spatialStrategy)
{
int maxLength = 9;
GeohashPrefixTree geohashPrefixTree = new GeohashPrefixTree(SpatialContext, maxLength);
_spatialStrategy = new RecursivePrefixTreeStrategy(geohashPrefixTree, "geoField");
}
}
}
return _spatialStrategy;
}
}
Is there something I am doing wrong with my indexing approach? I have cached the shapes that get created by the lat/lng points since I don't need a new shape for the same coordinates. It appears to be the CreateIndexableFields() method that is taking the most time during indexing. I've tried to cache the fields generated by this method to reuse but I can't create a new instance of the TokenStream from the cached field to use in a new Document (in lucene.net 3.0.3 the constructor for TokenStream is protected). I've lowered the maxLevels int to 4 in the spatial strategy but I haven't seen an improvement in indexing times. Any feedback would be greatly appreciated.
user AT lucenenet.apache.org
(lucenenet.apache.org) – I4V