0
votes

The DataFrame in pyspark looks like below.

model,DAYS

MarutiDesire,15
MarutiErtiga,30
Suzukicelerio,45
I10lxi,60
Verna,55

Output i am trying to get like

Output : I am trying to get the output as when days less than 30 than economical,

between 30 and 60 than average, and when greater than 60 than Low Profit

enter image description here

Code i tried but giving incorrect output.

dataset1.selectExpr("*", "CASE WHEN DAYS <=30 THEN 'ECONOMICAL' WHEN DAYS>30 AND LESS THEN 60 THEN 'AVERAGE' ELSE 'LOWPROFIT' END REASON").show()

kindly share your suggestion. is there any better way to do this in pyspark.

1

1 Answers

-1
votes
>>> from pyspark.sql.functions import *
>>> df.show()
+-------------+----+
|        model|DAYS|
+-------------+----+
| MarutiDesire|  15|
| MarutiErtiga|  30|
|Suzukicelerio|  45|
|       I10lxi|  60|
|        Verna|  55|
+-------------+----+

>>> df.withColumn("REMARKS", when(col("DAYS") < 30, lit("ECONOMICAL")).when((col("DAYS") >= 30) & (col("DAYS") < 60), lit("AVERAGE")).otherwise(lit("LOWPROFIT"))).show()
+-------------+----+----------+
|        model|DAYS|   REMARKS|
+-------------+----+----------+
| MarutiDesire|  15|ECONOMICAL|
| MarutiErtiga|  30|   AVERAGE|
|Suzukicelerio|  45|   AVERAGE|
|       I10lxi|  60| LOWPROFIT|
|        Verna|  55|   AVERAGE|
+-------------+----+----------+