3
votes

i am trying to get the max value Alphabet from a panda dataframe as whole. I am not interested in what row or column it came from. I am just interested in a single max value within the dataframe.

This is what it looks like:

id conditionName
1     C
2     b
3     A
4     A
5     A

expected result is:

|id|conditionName|
+--+-------------+
| 3|   A         |
| 4|   A         |
| 5|   A         |
+----------------+

because 'A' is the first letter of the alphabet

df= df.withColumn("conditionName", col("conditionName").cast("String"))
    .groupBy("id,conditionName").max("conditionName");
df.show(false);

Exception: "conditionName" is not a numeric column. Aggregation function can only be applied on a numeric column.;

I need the max from an entire dataframe Alphabet character. What should I use, so that the desired results?

Thank advance !

1
what is expected output? and what is your definition of "max value from Alphabet" - mtoto
I'm sorry, I will edit its, Thank for your remind ! - ten hova
still unclear why, is it because "A" is the first letter of the alphabet or because it has highest count? - mtoto
because "A" is the first letter of the alphabet :) - ten hova

1 Answers

1
votes

You can sort your DataFrame by your string column, grab the first value and use it to filter your original data:

from pyspark.sql.functions import lower, desc, first

# we need lower() because ordering strings is case sensitive
first_letter = df.orderBy((lower(df["condition"]))) \
                 .groupBy() \
                 .agg(first("condition").alias("condition")) \
                 .collect()[0][0]

df.filter(df["condition"] == first_letter).show()
#+---+---------+
#| id|condition|
#+---+---------+
#|  3|        A|
#|  4|        A|
#|  5|        A|
#+---+---------+

Or more elegantly using Spark SQL:

df.registerTempTable("table")
sqlContext.sql("SELECT * 
                FROM table 
                WHERE lower(condition) = (SELECT min(lower(condition)) 
                                          FROM table)
               ")