18
votes

I have data like this.

Ram,500
Sam,400
Test,100
Ram,800
Sam,700
Test,300
Ram,900
Sam,800
Test,400

What is the shortest way to fine the "median" from above data. My result should be something like...

Median = 1/2(n+1), where n is the number of data values in the sample.

Test 500
Sam 700
Ram 800
6
If you're just looking for median algorithms try this one - John Mee

6 Answers

40
votes

Python 3.4 includes statistics built-in, so you can use the method statistics.median:

>>> from statistics import median
>>> median([1, 3, 5])
 3
31
votes

Its a little unclear how your data is actually represented, so I've assumed it is a list of tuples:

data = [('Ram',500), ('Sam',400), ('Test',100), ('Ram',800), ('Sam',700), 
        ('Test',300), ('Ram',900), ('Sam',800), ('Test',400)]

from collections import defaultdict

def median(mylist):
    sorts = sorted(mylist)
    length = len(sorts)
    if not length % 2:
        return (sorts[length / 2] + sorts[length / 2 - 1]) / 2.0
    return sorts[length / 2]

data_dict = defaultdict(list)
for el in data:
    data_dict[el[0]].append(el[1])

print [(key,median(val)) for key, val in data_dict.items()] 
print median([5,2,4,3,1])   
print median([5,2,4,3,1,6])
#output:
[('Test', 300), ('Ram', 800), ('Sam', 700)]
3
3.5

The function median returns the median from a list. If there are an even number of entries it takes the middle value of the middle two entries (this is standard).

I've used defaultdict to create a dict keyed by your data and their values, which is a more useful representation of your data.

4
votes

Check this out:

def median(lst):
    even = (0 if len(lst) % 2 else 1) + 1
    half = (len(lst) - 1) / 2
    return sum(sorted(lst)[half:half + even]) / float(even)

Note:

sorted(lst) produces a sorted copy of lst;

sum([1]) == 1;

0
votes

Easiest way to get the median of a list with integer data:

x = [1,3,2]
print "The median of x is:",sorted(x)[len(x)//2]
0
votes

I started with user3100512's answer and quickly realized it doesn't work for an even number of items. I added some conditionals to it to compute the median.

def median(x):
    if len(x)%2 != 0:
        return sorted(x)[len(x)/2]
    else:
        midavg = (sorted(x)[len(x)/2] + sorted(x)[len(x)/2-1])/2.0
        return midavg

    median([4,5,6,7])

should return 5.5