4
votes

I following the tutorial steps as show in https://docs.aws.amazon.com/glue/latest/dg/dev-endpoint-tutorial-local-notebook.html

There's no issue connection between local zepplin to AWS Glue. However when I run test command on zepplin it gives me error

%pyspark
import sys
org.apache.thrift.transport.TTransportException
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
    at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
    at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Log from spark server

ERROR [2019-09-24 12:45:09,757] ({pool-2-thread-8} Job.java[run]:188) - Job failed
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:401)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
    at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
    at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
    ... 11 more
ERROR [2019-09-24 12:45:09,774] ({pool-2-thread-8} RemoteScheduler.java[getStatus]:281) - Unknown status
java.lang.IllegalArgumentException: No enum constant org.apache.zeppelin.scheduler.Job.Status.UNKNOWN
    at java.lang.Enum.valueOf(Enum.java:238)
    at org.apache.zeppelin.scheduler.Job$Status.valueOf(Job.java:51)
    at org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:271)
    at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:342)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
ERROR [2019-09-24 12:45:09,775] ({pool-2-thread-8} NotebookServer.java[afterStatusChange]:2056) - Error
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:401)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
    at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
    at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
    at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
    ... 11 more
 WARN [2019-09-24 12:45:09,775] ({pool-2-thread-8} NotebookServer.java[afterStatusChange]:2064) - Job 20190924-110806_1690051421 is finished, status: ERROR, exception: org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException, result: org.apache.thrift.transport.TTransportException
 INFO [2019-09-24 12:45:09,783] ({pool-2-thread-8} SchedulerFactory.java[jobFinished]:137) - Job paragraph_1569294486122_1120959405 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreterexisting_process1435285773

Any idea where's going wrong?

5

5 Answers

7
votes

I solved it by selecting older version Spark 2.2 when creating dev endpoint

enter image description here

5
votes

AWS seems to have updated their endpoints to support the newer versions of the Zepplin notebook and the newer glue versions--but they have not updated their AWS Glue dev endpoint documentation to state as such. AWS Glue dev endpoint documentation

If you're running Zepplin version 0.7.3 you need to initialize your endpoint with Glue Version Spark 2.2 (Glue Version 0.9)--as Conandor stated.

If however you want to run the recent Zepplin version 0.8.2 it will work with development endpoint Spark 2.4 Python 3 (Glue Version 1.0).

1
votes

I am getting the same exact issue, and have not resolved it. However I did go through some troubleshooting steps:

  • Restart zeppelin with:
sudo bash
./zeppelin-daemon stop
lsof -t -i tcp:8080 | xargs kill # Had to use this when zeppelin wouldn't die
./zeppelin-daemon start
  • Reinstall zeppelin entirely, ensure previous config is not loaded

  • Check java version. Zeppelin only supports Java 8 with:

java -version
  • Check zeppelin version. AWS Glue only supports 0.7.3

If you happen to find the solution, PLEASE POST IT!!!!!

I can't comment on your answer but if that solves the issue, you're a God. You saved us hours of dev time (running these jobs in python scripts, waiting for cold start times).

0
votes

If you faced with TTransportException that is means that Zeppelin is not able to connect to a metastore.

You need to specify "Catalog options".

Glue

Now you can use Spark 2.4, Python 3 and Zeppelin 0.8

0
votes

This cost me hours of time - for me it was another process that was tying up my tunnel port 9907. I used lsof -i :9907 and kill -9 PID to kill the process on 9907, then reconnected to the dev endpoint and it started behaving as expected.