I'm about tool write a small web-scraping program in Clojure / ClojureScript. It's quite a simple command-line app (for Linux), which visits a webpage, filters the results and prints it to the console.
However, this raises a few questions - not least because I come from a JS/Node.js background and Clojure is quite new to me.
(1) First of all: Is this a good task for a Clojure program, which will be delivered for the JVM as a .jar file. Starting the JVM is slow, but the program needs to be started and stoped quickly, since it's for everyday use. But I guess there are ways to keep one JVM running in the background, which is waiting then to execute jar files on demand. (?)
(2) The other approach would be to Use ClojureScript and compile it to node-friendly JavaScript. This would certainly solve the point of the previous paragraph. But I'm not sure if it's necessary.
(3) The other question is, which library to use. And this is also of course related to the previous points. Is there a good Clojure/ClojureScript library for this purpose? Basically for querying the DOM with CSS selectors. In JS I would use JsDom, which reads HTML Strings and creates a "Shadow DOM" from it. Which are the equivalents in the Clojure world?
(4) A plus would certainly be a library, that deals with common web-scraping tasks. Such as: Handling information which is spread over several numbered pages. (like e.g. the results of a search engine)
Anyone has some hints for me?