2
votes

I would like to know if there is a way of using the information provided by the function of the spark api RDD.toDebugString() to a more structured format, so it can be used to automatically get a graphical representation, for example with graphviz.

It seems that there is some activity around this going on: https://issues.apache.org/jira/browse/SPARK-1015

But I would like to get the info from toDebugString() to a structured format, and later decide which graph format to use for representation.

2
that issue won't be fixed. - eliasah

2 Answers

0
votes

A more detailed and formatted visual representation can be seen using the spark UI which run by default on 4040 port. Here it the screenshot showing all the details:

enter image description here

1
votes

toDebugString() internally iterates through the recursive structure of an RDD, building a displayable string.

Instead of making toDebugString() return a more structured output, read its inner implementation (which does rely on structured data), and modify it to save the data the way suitable for you.

You don't have to wait for any issue on JIRA, just DIY :)