0
votes

xml data coming in text files. ingesting them flume and kafka to hdfs and saving them in .txt file format.

exiting use case : xml files are ingesting through flume →kafka→flumeintercepter (to verify the valid schema or not ) —-> sends to valid or invalid kafka tpoic →hdfs sink (valid and invalid ) saving in .txt files

new one is:

i need to take the valid kafka topic and need to write own flume interceptor to convert the xml data to avro format and send to →hdfs sink (hdfs valid location ) final output need to be in avro file format..

any help would be appreciated

Thanks in advance ...

2

2 Answers

0
votes

You might be interested converting your XML to AVRO using apache avro JAVA API - http://avro.apache.org/docs/1.8.2/gettingstartedjava.html

Once converted you can use this code in your flume interceptor and write the AVRO files, but you also need to have avro schema which you can create from your XSD schema.

I did the similar stuff in springxd stream.

Hope this helps