0
votes

Without Docker the scripts are able to parse the pdf files using tika.

But however when I'm trying with Docker..I get the following error for the tika server not running: with some reading I tried the following - but the error persists.

Can some please help?

I'm attaching the Dockerfile in the end and listing the docker containers that are running -

  1. docker pull apache/tika
  2. docker run -d -p 9998:9998 apache/tika
  3. cat Dockerfile (listing in the end)
  4. docker build -t docker_parser .
  5. docker run docker_parser

  6. docker ps -a


    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                     PORTS                    NAMES

    8ff9fd3d0a84        docker_parser       "python ./scripts/..."   2 days ago          Exited (0) 4 minutes ago                            adoring_mestorf

    fdf132926c61        apache/tika         "/bin/sh -c 'java ..."   2 days ago          Up 6 minutes               0.0.0.0:9998->9998/tcp   optimistic_ride
  1. Dockerfile:

    FROM python:3

    RUN pip3 install --upgrade pip requests
    RUN pip3 install python-docx tika numpy pandas

    RUN mkdir scripts
    RUN mkdir pdfs
    RUN mkdir output

    ADD runner.py /scripts/
    ADD header_parser.py /scripts/
    ADD keyword_parser.py /scripts/

    ADD *.pdf /pdfs/

    CMD [ "python", "./scripts/runner.py" ]

8. Error in the code: sentence_parser Oops! Error Type: occured. Details: Unable to start Tika server. Error Type: at line: 156

1
The Apache Tika Server is written in Java, do you have that in your docker image too? (Looks not...) - Gagravarr

1 Answers

0
votes

Looks like you haven't specified a link between the containers, so tika-python isn't able to connect to port 9998. You could add Java in the docker_parser container and let it host Tika Server, otherwise you'll need to link the containers.

If you want to use the two images, you can either use the --link option on Docker CLI at run time, or build a network (docker network create) and attach the two containers together (docker network connect). I normally use docker-compose to make these kind of things easier and specify the links there.