I am a newbie to Apache Flink and distributed processing as well. I have already went through Flink quick setup guide and understand the basics of MapFunctions. But I couldnt find a concrete example for XML processing. I have read about Hadoops XmlInputFormat, but unable to understand how to use it.
My need is, I have huge(100MB) xml file of format as below,
<Class>
<student>.....</student>
<student>.....</student>
.
.
.
<student>.....</student>
</Class>
The flink processor would read the file from HDFS and start processing it(basically iterate through all the student element)
I want to know(in layman's terms), how can I process the xml and creata list of student object.
A simpler layman's explanation would be much appreciated