0
votes

I need to select all not nulls column from Hive table and insert them into Hbase. For example, consider the below table:

Name      Place         Department  Experience
==============================================
Ram      | Ramgarh      |  Sales      |  14
Lakshman | Lakshmanpur  |Operations   | 
Sita     | Sitapur      |             |  14
Ravan    |              |             |  25

I have to write all the not null columns from above table to Hbase. So I wrote a logic to get not null columns in one column of dataframe as below. Name column is mandatory there.

Name        Place       Department  Experience      Not_null_columns
================================================================================
Ram         Ramgarh     Sales        14            Name, Place, Department, Experience
Lakshman    Lakshmanpur Operations                 Name, Place, Department
Sita        Sitapur                  14            Name, Place, Experience
Ravan                                25            Name, Experience

Now my requirement is to create a column in dataframe with all values of not null columns in a single column as provided below.

Name      Place        Department   Experience    Not_null_columns_values
Ram       Ramgarh      Sales        14           Name: Ram, Place: Ramgarh, Department: Sales, Experince: 14
Lakshman  Lakshmanpur  Operations                Name:    Lakshman, Place: Lakshmanpur, Department: Operations
Sita      Sitapur                   14           Name:    Sita, Place: Sitapur, Experience: 14
Ravan                               25           Name:    Ravan, Experience: 25

Once I get above df I will write it to Hbase with Name as key and last column as value.

Please let me know if there could have been a better approach to do this.

1

1 Answers

1
votes

Try this-

Load the test data provided

    val data =
      """
        |Name    |  Place    |     Department | Experience
        |
        |Ram      | Ramgarh      |  Sales      |  14
        |
        |Lakshman | Lakshmanpur  |Operations   |
        |
        |Sita     | Sitapur      |             |  14
        |
        |Ravan   |              |              |  25
      """.stripMargin

    val stringDS = data.split(System.lineSeparator())
      .map(_.split("\\|").map(_.replaceAll("""^[ \t]+|[ \t]+$""", "")).mkString(","))
      .toSeq.toDS()
    val df = spark.read
      .option("sep", ",")
      .option("inferSchema", "true")
      .option("header", "true")
//      .option("nullValue", "null")
      .csv(stringDS)

    df.show(false)
    df.printSchema()
    /**
      * +--------+-----------+----------+----------+
      * |Name    |Place      |Department|Experience|
      * +--------+-----------+----------+----------+
      * |Ram     |Ramgarh    |Sales     |14        |
      * |Lakshman|Lakshmanpur|Operations|null      |
      * |Sita    |Sitapur    |null      |14        |
      * |Ravan   |null       |null      |25        |
      * +--------+-----------+----------+----------+
      *
      * root
      * |-- Name: string (nullable = true)
      * |-- Place: string (nullable = true)
      * |-- Department: string (nullable = true)
      * |-- Experience: integer (nullable = true)
      */

convert struct and then json

    val x = df.withColumn("Not_null_columns_values",
      to_json(struct(df.columns.map(col): _*)))
    x.show(false)
    x.printSchema()

    /**
      * +--------+-----------+----------+----------+---------------------------------------------------------------------+
      * |Name    |Place      |Department|Experience|Not_null_columns_values                                              |
      * +--------+-----------+----------+----------+---------------------------------------------------------------------+
      * |Ram     |Ramgarh    |Sales     |14        |{"Name":"Ram","Place":"Ramgarh","Department":"Sales","Experience":14}|
      * |Lakshman|Lakshmanpur|Operations|null      |{"Name":"Lakshman","Place":"Lakshmanpur","Department":"Operations"}  |
      * |Sita    |Sitapur    |null      |14        |{"Name":"Sita","Place":"Sitapur","Experience":14}                    |
      * |Ravan   |null       |null      |25        |{"Name":"Ravan","Experience":25}                                     |
      * +--------+-----------+----------+----------+---------------------------------------------------------------------+
      */