I am trying to convert some PDF files into answer units with Watson's Document Conversion service. These files are all zipped up into one big .zip file, which is uploaded to my Bluemix server running a Node.js application. The application unzips the files in memory and tries to send each one in turn to the conversion service:
var document_conversion = watson.document_conversion(dcCredentials);
function createCollection(res, solrClient, docs)
{
for (var doc in docs) //docs is an array of objects describing the pdf files
{
console.log("Converting: %s", docs[doc].filename);
//make a stream of this pdf file
var rs = new Readable; //create the stream
rs.push(docs[doc].data); //add pdf file (string object) to stream
rs.push(null); //end of stream marker
document_conversion.convert(
{
file: rs,
conversion_target: "ANSWER_UNITS"
},
function (err, response)
{
if (err)
{
console.log("Error converting doc: ", err);
.
.
.
etc...
Every time, the conversion service returns error 400 with the description "Error in the web application".
After scratching my head for two days trying to figure out the cause of this rather unhelpful error message, I have pretty much decided that the problem must be that the conversion service can't figure out what type of file is being sent, since there's no filename associated with it. This of course is just a guess on my part, but I can't test this theory because I don't know how to provide that information to the service without actually writing each file to disk and reading it back.
Can anyone help?