0
votes

I want to manipulate doc and change the token value for field(s) by prepending some value to each token. I am doing bulk update through DIH and also posting Documents through SOLRJ. I have replication factor as 2, so Replication should also work. The value that I want to prepend is there in the document as a separate field. I am interested to know the place where I can intercept the document before the indexing so that I can manipulate it. One of the option I can think of overriding DirectUpdateHandler2. Is this the right place?

I can do it by externally processing the document and passing it to SOLR But I want to do it inside SOLR.

Document fields are :

  1. city:mumbai
  2. RestaurantName:Talk About
  3. Keywords:Cofee, Chines, South Indian, Bar

I want to index keywords as

  1. mumbai_cofee
  2. mumbai_Chines
  3. mumbai_South Indian
  4. mumbai_Bar
1
Are you having pattern where you would like to apply this. I would suggest adding "PatternReplaceFilterFactory", to solve the problem.Abhijit Bashetti
@AbhijitBashetti - that won't work because you can't pull in data from a different field within a Solr field analysis.frances
The JDBC driver is probably DIH's most popular <datasource>. If you're using that, then you can accomplish this in your embedded SQL queries. If so, you could use CONCAT(RestaurantName,"_",city) AS restaurant, CONCAT(Keyword,"_",city) AS keyword in your SQL query within your Solr's data import config.frances
Is there a reason why you don't set up city as a filter/facet field instead of combining it with all the individual terms? Then your query could look something like: fq=city:mumbai&q=keyword:bar. If this would meet your needs it seems like an easier way to index the data and leaves you more flexible in how you query.frances
I have XML file as data source So one option is using XSLT transform the data such way that each of the field carry the first token as the city name. Then write a filter to do the manipulation. But with this I have to make sure that when I prepend it to the value it is resulted as separate token which might be a difficult task as each field will have different Tokenizer logic. I am trying to use term component for index browsing. If a city is selected then I should allow the user to see all the terms available for that city on that particular field. Hope this helpsAvaya Sahu

1 Answers

0
votes

the right place is an Update Request Processor, you make sure you plug that in sorlconfig.xml into all udpate handlers you are using (including DIH), and the single URP will cover all updates.

In your java code in the URP you can easily get the value of a field and then prepend it to all the others in another field etc. This happens before the doc is indexed.