0
votes

My question is related to Solr and facet queries.

I am new to using Solr and using it with tweet records. My aim is to plot number of tweets originating from a unique Point(Latitude,Longitude). I am getting individual facet count results for unique Point ("pgeom"), but the returned value is garbled as it's hash representation of the stored Point data. How can this hash be converted back to usable Point? Please see the details below.

Existing fields in the dataset:

pgeom : Point geo spacial. Example: "pgeom":"POINT(13.13735209 -4.2170403)"

lon : Longitude. Example: "lon":13.13735209

lat : Latitude. Example: "lat":-4.2170403}

Example of query parameters:

Here i am trying to get individual tweet counts using facet field query on all three "lat", "lon" and "pgeom".

?q=%3A&facet=true&fl=lat,lon,pgeom&facet.field=pgeom&facet.field=lat&facet.field=lon

Json Result:

"pgeom" facet query results in hash values of ingested Points with associated counts. Where as "lat" and "lon" results in individual tweet counts associated with each of latitudes and longitudes. I would like to use this "pgeom" hash to represent tweets from a location on google map.

point geo spacial:

"pgeom":[
    "s",5931,
    "sfju",361,
    "sx",336,
        .. and so on

longitude:

"lon":[
    "9.6017436",361,
    "6.807174",195,
    "9.28786844",167,
    "5.4770747",169,
    "9.03439492",112,
         .. and so on

latitude:

"lat":[
    "4.450025",361,
    "9.420721",195,
    "1.29138702",167,
    "8.6851517",169,
    "0.97996991",157,
        .. and so on

Response Header:

"responseHeader":{
"status":0,
"QTime":990,
"params":{
  "facet":"true",
  "fl":"lat,lon,pgeom",
  "indent":"on",
  "start":"200",
  "q":"*:*",
  "facet.field":["lat",
    "lon",
    "pgeom"],
  "wt":"json",
  "rows":"200"}},

Response:

"response":{"numFound":2034074,"start":200,"docs":[
  {
    "pgeom":"POINT(13.13735209 -4.2170403)",
    "lon":13.13735209,
    "lat":-4.2170403},
  {
    "pgeom":"POINT(18.284989 -8.731565)",
    "lon":18.284989,
    "lat":-8.731565},
  {
        .. and so on

How to convert values like "s", "sxp", "sfju" to readable/usable format say Point(12.041015625, 42.01171875) for "sfju"?

Thanks a lot for your time. lalan

2

2 Answers

1
votes

The answer to your specific question is for you to index full-length geohashes to the precision you desire. No matter what your programming language of choice is, I'm sure you can find a library of code snippet to convert back & forth. Index it as string and facet on it.

You are then faced with how to plot what could be a ridiculous number of points on a map in a scalable manner. You'll have to use spatial clustering / heat-map. See http://wiki.apache.org/solr/SpatialClustering

1
votes

This answer is based on David's input and a follow up discussion with my colleagues. We found that the Solr field containing geo-location, in our case "pgeom", has to be configured to use a PrefixTree based class. As explained in the following page:

Solr Spatial Search - PrefixTree

Once the Solr field, in this case "pgeom" field, is configured to use "location_rpt" type, which uses PrefixTree (class="solr.SpatialRecursivePrefixTreeFieldType").

<field name="pgeom"  type="location_rpt"  indexed="true" stored="true"  multiValued="true" />

Once we have the list that contains all of the "pgeom" facet results, each of the geohash values can be decoded into individual lat lon pairs using some of the libraries listed under 'External Links' on Geohash. I have used one of the unlisted libraries python-geohash

>>> import geohash
>>> print 'geohash for 42.5, -4.0:', geohash.encode(42.5,-4.0)
geohash for 42.5, -4.0: ezt1ubzk3npz
>>> print 'coordintate for geohash s', geohash.decode('s')
coordintate for geohash s (22.5, 22.5)
>>> print 'coordintate for geohash sfju', geohash.decode('sfju')
coordintate for geohash sfju (12.041015625, 42.01171875)
>>>

Cross-check geohash decoding quickly: Example1 Example2

Also, a new find was to use Facet.limit to limit the facet field count in the response.

Thanks a lot David. :)