I want to map my Timestamp fields in Dataset having values like 2018-08-17T19:58:46.000+0000
to format like 2018-08-17 19:58:46.000
, i.e. yyyy-MM-dd HH:mm:ss.SSS, and some columns to yyyy-MM-dd.
For example, I have a dataset DS1 with columns id, lastModif, created:
+------------------+----------------------------+----------------------------+
|Id |lastModif |created |
+------------------+----------------------------+----------------------------+
|abc1 |2019-01-14T19:51:55.000+0000|2019-02-07T20:37:53.000+0000|
|AQA2 |2019-02-05T19:26:36.000+0000|2019-02-07T20:40:06.000+0000|
+------------------+----------------------------+----------------------------+
From above DS1 I need the lastModif
column mapped to format yyyy-MM-dd HH:mm:ss.SSS
and createdTime
column mapped to yyyy-MM-dd
.
I have similar DS2, DS3 with different column mapping.
I have kept a properties file from which it will fetch the mapping columns as keys and timestamp format as the values.
In the code I am keeping the list of mapping column, and non-mapping columns, and selecting the column:
String cols = "Id,created,lastModif";
String[] colArr = cols.split(",");
String mappedCols = "lastModif,created"; //hardcoding as of now.
List<String> mappedColList = Arrays.asList(mappedCols.split(","));
String nonMappedCols = getNonMappingCols(colArr, mappedCols.split(",")).toLowerCase();
List<String> nonMapped = Arrays.asList(nonMappedCols.split(","));
//column-mapping logic
filtered = tempDS.selectExpr(convertListToSeq(nonMapped),unix_timestamp($"lastModif","yyyy-MM-dd HH:mm:ss.SSS").cast("timestamp").as("lastModif"));
filtered.show(false);
public static Seq<String> convertListToSeq(List<String> inputList)
{
return JavaConverters.asScalaIteratorConverter(inputList.iterator()).asScala().toSeq();
}
private static String getNonMappingCols(String[] cols, String[] mapped)
{
String nonMappedCols = "";
List<String> mappedList = Arrays.asList(mapped);
for(int i=0; i<cols.length; i++)
{
if(!mappedList.contains(cols[i])){
nonMappedCols += cols[i]+",";
}
}
nonMappedCols = nonMappedCols.substring(0, nonMappedCols.length()-1);
return nonMappedCols;
}
How do I map the column to the required timestamp format?
And in the line of code tempDS.selectExpr(convertListToSeq(nonMapped),unix_timestamp($"lastModif","yyyy-MM-dd HH:mm:ss.SSS").cast("timestamp").as("lastModif"));
the $"lastModif"
is unidentified in Java.
And secondly this way is a static way i.e. hardcoding the mapping column. How do I map the columns from my List<String> mappedColList
?
new Column("lastModif")
instead of$"lastModif"
? – W Almirorg.apache.spark.sql.functions.unix_timestamp(tempDS.col("lastModif"),"yyyy-MM-dd HH:mm:ss.SSS")
...the compiler error went away, but the my data is in string of typeyyyy-MM-ddTHH:mm:ss.SSS+Z
eg:2019-02-07T20:37:53.000+0000
and it is getting parsed to null. – aiman