Solr/Jetty confusion - how to get persistent service?

Question

I'm on Ubuntu 12.04, using jetty (9_M4), solr (4.0.0) through django-haystack (2.0beta) installed in a django 1.4.2 site.

I've had to make a number of jumps through hoops to get this up and running, as there is very little documentation for getting solr 4.0 up and running in Ubuntu with django-haystack. But how hard could it be?

My main confusion is between what Jetty is doing, and what Solr is doing.

So, I installed Jetty via this tutorial making a small adjustment to the init file as I note in the comment on that tutorial. Jetty is now running, I can see it in browser, even after a reboot.

Great.

Move onto installing Solr via this tutorial again with adjustments. Instead of:

cp -R apache-solr-4.0.0/example/solr /opt

I use:

cp -R apache-solr-4.0.0/example/* /opt/solr/

and therefore add the following to /etc/default/jetty:

JAVA_OPTIONS="-Dsolr.solr.home=/opt/solr/solr $JAVA_OPTIONS"

I can't exactly remember why I did that, but there was a reason at the time. I stop using that tutorial at that point, as I don't understand the solr concept of core very well, and I'm already flustered at how annoyingly difficult this is.

(For context, when I set up django-haystack 2.0 with solr 3.5 about 6 months ago it was terrifyingly easy and didn't require a separate jetty installation - all up took me about two hours)

Anyway, I go back to my Django installation, create the schema.xml, make the stopwords-en.txt changes, copy it across to /opt/solr/solr/collection1/conf.

I edit /opt/solr/solr/collection1/conf/solrconfig.xml to remove the reference to updateLog since any attempt I made to add version field to schema.xml failed dismally with some sort of character error. See here (lucene -solr-user mailing list) and here (django-haystack github) for more info on this.

Finally, I cd into /opt/solr and run it:

sudo java -jar start.jar

Ba-da-boom! I get some results (when I go to my django site and use the search I've set up). Fantastic. This is really great. Now I just need to make the starting of solr persistent.

I create an /etc/init/solr that looks like this:

description     "Solr Search Server"

# Make sure the file system and network devices have started before
# we begin the daemon
start on (filesystem and net-device-up IFACE!=lo)

# Stop the event daemon on system shutdown
stop on shutdown

# Respawn the process on unexpected termination
respawn

# The meat and potatoes
exec /usr/bin/java -jar /opt/solr/start.jar >> /var/log/solr.log 2>&1

I restart the server and nothing - I can see solr running, but I'm not getting any results in my django search.

I remove the init file and try running from the cli again - yep, sweet.

So, my questions are:

What the hell have I done wrong?
How do I get solr to start at boot and respawn if it dies accidentally AND produce results through my Django/haystack interface
Why do I need jetty and solr running simultaneously, and what is the relationship of /opt/jetty/webapps/solr.war to my /opt/solr? Am I creating causing conflicts?
Why was this so easy with solr 3.5 and so difficult now? I ask this honestly - I don't want a list of excuses or explanations from solr developers - I want to know how my understanding can be so limited in the first instance (solr 3.5) and get it running in two hours and why I now need to have a comprehensively deeper understanding of jetty/solr architecture and cli/shell script hacking to get it to run?

Alexandre Rafalovitch Alexandre Rafalovitch · Accepted Answer · 2013-01-23T13:17:29

I am not promising to get all your things, but (numbers do not match questions):

1) Jetty is a web-server. Solr runs as a (web) application inside that web server, however:

2) Jetty can also run an embedded webserver, which is how Solr download works. When you do java -jar start.jar that runs Jetty with everything preconfigured. In which case you do not need a standalone Jetty. I suggest start with embedded Jetty, then switch to external one. However, if only your local app talks to local Solr server, you may be able to get quite far without needing full Jetty.

3) You don't need all the stuff you find in example directory - it has multiple confugurations and support files and is somewhat nested (which is confusing)

4) To start you need two things: Running solr; your configuration directory

5) The easiest way to get Solr running is to put the whole distrubution directory (I know - large) somewhere (e.g. /opt/solr).

6) Your configuration directory is very simple. All you need is two files to start, three if you are picky about names: - (wherever, but make sure Solr can read/write there) -- solr.xml (if you are picking about collection name, otherwise you can skip it) -- collection1/ (that's default name, you can change that in solr.xml) -- collection1/conf/ (this is configuration directory, Solr will add data directory on the same level once you start right) schema.xml -- collection1/conf/shema.xml -- collection1/conf/solrconfig.xml

7) Then, you need to be in the example directory and run java -Dsolr.solr.home= start.jar . This will get all the pieces up and running on port :8983 . Solr 4 has a pretty new admin interface, so visit it with your browser, maybe do the tutorial, etc.

If you need help with minimal functioning schema/solrconfig files, ask separately, but you cannot just use ones from the example directory as it has all the other file references in the fieldType analysers (though you could just comment those lines out).

Solr/Jetty confusion - how to get persistent service?

1 Answers