2
votes

I am interested in performing Big Data Geospatial analysis on Apache Spark. My data is stored in Azure data lake, and I am restricted to use Azure Databricks. Is there anyway to download Geomesa on Databrick? Moreover, I would like to use the python api; what should I do?

Any help is much appreciated!!

3
Discussions like this might be easier to have on Gitter or one of the GeoMesa email lists. See github.com/locationtech/geomesa#join-the-community for more info!GeoMesaJim

3 Answers

2
votes

You can install GeoMesa Library directly into your Databricks cluster.

1) Select the Libraries option then a new window will open.

enter image description here

2) Select the maven option and click on 'search packages' option next images

3) Search the required library and select the library/jar version and choose the 'select' option.
Thats it. search the jar/library in maven repository

After the installation of the library/jar, restart your cluster. Now import the required classes in your Databricks notebook.
I hope it helps. Happy Coding..

1
votes

As a starting point, without knowing any more details, you should be able to use the GeoMesa filesystem data store against files stored in WASB.

0
votes

Running GeoMesa within Databricks is not straightforward:

  • GeoMesa’s artifacts are published on Maven Central, but require dependencies that are only available on third-party repositories, which is cumbersome given Databricks’ library import mechanism.
  • GeoMesa conflicts with an older version of the scalalogging library present in the Databricks runtime (the infamous JAR Hell problem).

Reference: Use GeoMesa in Databricks

Hope this helps.