0
votes

I have 2 dataframes df1 and df2 . I am joining both df1 and df2 based on columns col1 and col2. However the datatype of col1 is string in df1 and type of col2 is int in df2. When I try to join like below,

val df3 = df1.join(df2,df1("col1") === df2("col2"),inner).select(df2("col2"))

The join does not work and return empty datatype. Will it be possible to get proper output without changing type of col2 in df2

1
looks like you have a valid answerVB_
@VB_ : means is this a valid scenario?techie
I mean that @_mvasyliv provided a working solution IMHOVB_

1 Answers

2
votes
  val dDF1 = List("1", "2", "3").toDF("col1")
  val dDF2 = List(1, 2).toDF("col2")

  val res1DF = dDF1.join(dDF2, dDF1.col("col1") === dDF2.col("col2").cast("string"), "inner")
      .select(dDF2.col("col2"))
  res1DF.printSchema()
  res1DF.show(false)
  //      root
  //      |-- col2: integer (nullable = false)
  //
  //      +----+
  //      |col2|
  //      +----+
  //      |1   |
  //      |2   |
  //      +----+

get schema DataFrame

val sch1 = dDF1.schema
sch1: org.apache.spark.sql.types.StructType = StructType(StructField(col1,StringType,true))
// public StructField(String name,
//               DataType dataType,
//               boolean nullable,
//               Metadata metadata)