I am doing join of two data frames having same name and values.
data is:
+----------------+------------------+
|id |name |
+----------------+------------------+
|2 |G2 |
|1 |H2 |
|2 |H2 |
|1 |G2 |
+----------------+------------------+
Code is:
res = data.alias(
"C1"
).join(
data.alias(
"C2"
),
on='id',
how="inner"
).select(
F.col("C1.{0}".format(name)).alias("C1"),
F.col("C2.{0}".format(name)).alias("C2"),
'id'
)
I am getting the result as:
+----+----+----------------+
|C1 |C2 |id |
+----+----+----------------+
|G2 |H2 |2 |
|G2 |G2 |2 |
|H2 |G2 |1 |
|H2 |H2 |1 |
|H2 |H2 |2 |
|H2 |G2 |2 |
|G2 |G2 |1 |
|G2 |H2 |1 |
+----+----+----------------+
Whereas with inner join I expected the result as:
+----------------+------------------+
|id |C1 | C2 |
+----------------+------------------+
|2 |G2 | G2 |
|1 |H2 | H2 |
|2 |H2 | H2 |
|1 |G2 | G2 |
+----------------+------------------+
Why am i getting the result as union with 8 rows and not the above 4 rows.