I have a dataframe df1 of the format
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| A | z | m |
| B | w | n |
| C | x | o |
| A | z | n |
| A | p | o |
+------+------+------+
and another dataframe df2 of the format
+------+------+
| Col1 | Col2 |
+------+------+
| 0-A | 0-z |
| 1-B | 3-w |
| 2-C | 1-x |
| | 2-P |
+------+------+-
I am trying to replace the values in Col1 and Col2 of df1 with values from df2 using Spark Java.
The end dataframe df3 should look like this.
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 0-A | 0-z | m |
| 1-B | 3-w | n |
| 2-C | 1-x | o |
| 0-A | 0-z | n |
| 0-A | 2-p | o |
+------+------+------+
I am trying to replace all the values in the column1 and column2 of df1 with values from col1 and col2 of df2. Is there anyway that i can achieve this in Spark Java dataframe syntax.?
The initial idea i had was to do the following.
String pattern1="\\p{L}+(?: \\p{L}+)*$";
df1=df1.join(df2, df1.col("col1").equalTo(regexp_extract(df2.col("col1"),pattern1,1)),"left-semi");