I have two dataframes one with columns of X year,month and measure, and
columns with x1, x2 which corresonds to the first day and the second
day . The first dataframe is:
X year month measure X1 X2
1 1 2014 12 Max.TemperatureF 64 42
2 2 2014 12 Mean.TemperatureF 52 38
3 3 2014 12 Min.TemperatureF 39 33
The second dataframe where only I have the days.
X3 X4 X5 X6 X7
1 51 43 42 45
2 44 37 34 42
3 37 30 26 38
So I want to join the two dataframes and obtain in pyspark
X year month measure X1 X2 X3 X4 X5 X6
'1 1 2014 12 Max.TemperatureF 64 42 1 51 43 42
'2 2 2014 12 Mean.TemperatureF 52 38 2 44 37 34
'3 3 2014 12 Min.TemperatureF 39 33 3 37 30 26
I have joined them but they get one dataframe above the another dataframe instead of that they remain in the same rows
from functools import reduce
from pyspark.sql import DataFrame
def unionAll(*dfs):
return reduce(DataFrame.unionAll, dfs)
td = unionAll(*[weather1, weather2])
X year month measure X1 X2
1 1 2014 12 Max.TemperatureF 64 42
2 2 2014 12 Mean.TemperatureF 52 38
3 3 2014 12 Min.TemperatureF 39 33
X3 X4 X5 X6
1 51 43 42 45
2 44 37 34 42
3 37 30 26 38
So this is a wrong joining .