I have 2 csv files in the same storage location directory.
1st csv File:
id name age
1 Hi 20
2 Hello 21
2nd csv File:
id name age country
3 hi1 20 India
When I read through spark
spark.read.format("csv").option("inferschema","true").load("<location>")
I can see all the data and for id 1 and 2 the country is NULL, but I am getting both the headers.
Current Output:
_c0 | _c1 | _c2 | _c3 | _c4
id |name |country| age | lastname
3 |dfg |US | 45 | HI
4 |ghj |US1 | 33 | Hello
id | name |country|age | null
1 |asd | India |21 | null
2 |sdf |Australia|20 | null
How to get the dataframe with all the column as header and corresponding data in spark.
Expected Output:
id |name |country| age | lastname
3 |dfg |US | 45 | HI
4 |ghj |US1 | 33 | Hello
1 |asd | India |21 | null
2 |sdf |Australia|20 | null