1
votes

I have so far been using the Search Api in my App Engine Project (Java, Eclipse). But since Search Api cannot do partial or misspelled matches (among other things), I am trying to switching to LAE (i.e. Lucene for App Engine) based on the advice of a response here on SO. Will someone please settle the following concerns?

  1. How many documents can a Lucene index have? How many indexes can a project have? For Search Api: There is no limit to the number of documents in an index, or the number of indexes you can use. However, the total size of all the documents in a single index cannot be more than 10GB.
  2. Where do I go to see my Lucene data? For Search Api: app engine page >> Data >> Text Search
  3. Can I do id-only queries and do I save anything for doing so? For Search Api it’s a lot cheaper:

    Query query = Query.newBuilder() .setOptions(QueryOptions.newBuilder().setLimit(RESULT_SIZE).setReturningIdsOnly(true).build()) .build(req.getSearchTerm());

  4. How does App-Engine charge for Lucene? I know it’s not a GAE service, but it certainly cannot be free. So how am I being charged? For Search Api: https://cloud.google.com/appengine/docs/python/search/#Python_Search_API_quotas

  5. Can I myself set a unique id per document in Lucene, which can serve as a key for the document? For Search Api

    Document.newBuilder().setId(myUniqueId);

Please at the very least address the five points above. From my perspective this is a tough question. But there seems to have no authoritative reference for this crucial comparison (or any reference really). And why not have one here on SO? This is where coders come for answers.

1
Do you understand that if you want to use Lucene, you'll need to install it on your own machine, on Compute Engine, there is no standard lucence service. Gooogle Cloud have an example of Elasticsearch, though, as click-to-deploy solution, but it's just a basic installIgor Artamonov
@IgorArtamonov what about code.google.com/p/luceneappengine. do I have to install it on my own machine? I thought I would simply have to import it as a project in my java build path: similar to how I use other jars and projects in my code for app engine.Katedral Pillon
hm, that's interesting project. as I see it should work on gae, not sure about performance btw. and it's better to rephrase question for this project, rather than abstract luceneIgor Artamonov
I have made edits to reference that project from the get goKatedral Pillon
questions 2-4 are meaningless as it non-standard service. other two is better to split into two separate questionsIgor Artamonov

1 Answers

2
votes

1) How many documents can a Lucene index have? How many indexes can a project have? For Search Api: There is no limit to the number of documents in an index, or the number of indexes you can use. However, the total size of all the documents in a single index cannot be more than 10GB.

As far as i know LAE stores everything on the datastore, using memcache for some performance boost. So you should not have any limits regarding size/amount of docs.

Where do I go to see my Lucene data? For Search Api: app engine page / Data / Text Search

You are out of luck there, right now there is no integration allowing you to see index data in realtime coming from LAE. You'll have to build that yourself.

Can I do id-only queries and do I save anything for doing so? For Search Api it’s a lot cheaper:

How does App-Engine charge for Lucene? I know it’s not a GAE service, but it certainly cannot be free. So how am I being charged?

You can't think of LAE in the same terms on pricing. The search API is an GAE feature that charges you different because it consumes a different amount of resources, with LAE it's all the same to GAE it's completely transparent. You won't be charged by the queries you do, you'll get charged by whatever resources LAE consumes to get that info out of the datastore.

Can I myself set a unique id per document in Lucene, which can serve as a key for the document?

You can assign an ID field to docs, but you wont get the benefits you get in the Search API because of the reasons mentioned above.