3
votes

I have a dataset

+----------+--------+------------+
|        id|    date|       errors|
+----------+--------+------------+
|1         |20170319|      error1|
|1         |20170319|      error2|
|1         |20170319|      error2|
|1         |20170319|      error1|
|2         |20170319|        err6|
|1         |20170319|       error2|

Need the number error counts day wise

output

+----------+--------+------------+
|    date|       errors| count
+----------+--------+------------+
 |20170319|      error1|    2
 |20170319|      error2|    3
 |20170319|        err6|    1
    val dataset = spark.read.json(path);
    val c =dataset.groupBy("date").count()

//how I proceed to count errors

I tried Windowing over date in spark scala sql but not able find productive do i need to convert to Rdd and find a approach.?

1
Try changing groupBy("date") to groupBy("date", "errors")Nick
yes that worked..thanksHDev007

1 Answers

1
votes

You just need to groupBy both date and errors.

val c =dataset.groupBy("date","errors").count()