3
votes

I see that google app engine has now added text search: https://developers.google.com/appengine/docs/python/search/overview

Does this include searching for sub-strings within strings?

The reason I ask is because I had previously written some code that would allow substring search for fields like names and phone numbers. For example, you could search for "San" and it would find results like "Mike DaSantos". This was awesome for stuff like auto-complete.

I ran into problems with cost though because of the tremendous amount of write operations that this required. Each field that I did this for required roughly O((n*n+1)/2) write operations because it involved a write operation for each subset of letters in a string. This added up to a few dollars of app engine costs when it came to indexing phone numbers, names, e-mail addresses, and addresses for 6000 customers.

I'm wondering if using the search API could provide this functionality for less cost?

Thanks so much!

2

2 Answers

4
votes

No it doesn't.

The only "wildcard" we can search with is for plurals.

~"car"  # searches for "car" and "cars"

What it can do though is save multiple tokens in the same field. See their example at TextSearchServlet

  StringTokenizer tokenizer = new StringTokenizer(tagStr, ",");
  while (tokenizer.hasMoreTokens()) {
    docBuilder.addField(Field.newBuilder().setName("tag")
        .setAtom(tokenizer.nextToken()));
  }

So you could query a "nametag" field for example, and assuming you tokenized the name into it get "Mike DaSantos" back

  Results<ScoredDocument> results = getIndex().search("nametag:San"); 

I am not crystal clear on the costs and quotas here.

1
votes

By the way, you shouldn't need O((n*n+1)/2) write operations for your own substring search solution.

You should only need 1.

I.e., instead of creating O((n*n+1)/2) objects, you create ONE object with O((n*n+1)/2) list elements in a ndb.StringProperty(repeated=True)