2
votes

My Apache Spark application handles giant RDDs and generates EventLogs through the History Server. How can I export these logs and import them to another computer to view them through History Server UI?

2
AFAIK, Spark History Server just reads log files dumped in a specific directory (e.g. on HDFS). No need to "export" anything. Ah, also, there is no purge mechanusm... You've got to script it by yourself. - Samson Scharfrichter
I have the log files stored my directory "/tmp/spark-events", but when I transfer them to another computer and start the History Server, the logs do not appear in the web interface. What I want to know is how show the logs in the web interface of another computer. - Bruno

2 Answers

1
votes

My cluster uses Windows 10 and for some reason, with this OS, the log files don't load if they aren't generated on the machine itself. Using another OS like Ubuntu, I was able to view History Server's logs on the browser.

0
votes

The spark while running applications writes events to the spark.eventLog.dir (for eg HDFS - hdfs://namenode/shared/spark-logs) as configured in the spark-defaults.conf.

These are then read by the spark history server based on the spark.history.fs.logDirectory setting. Both these log directories need to be the same and spark history server process should have permissions to read those files. So these would be json files in the event log directory for each application. These you can access using appropriate filesystem commands.