why python process a sorted list cost more time than a unsorted list

Question

Example:

import cProfile, random, copy
def foo(lIn): return [i*i for i in lIn]
lIn = [random.random() for i in range(1000000)]
lIn1 = copy.copy(lIn)
lIn2 = sorted(lIn1)
cProfile.run('foo(lIn)')
cProfile.run('foo(lIn2)')

Result:

3 function calls in 0.075 seconds

Ordered by: standard name


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.005    0.005    0.075    0.075 :1()
        1    0.070    0.070    0.070    0.070 test.py:716(foo)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

3 function calls in 0.143 seconds

Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.143    0.143 :1()
        1    0.137    0.137    0.137    0.137 test.py:716(foo)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

It doesn't really seem to have anything to do with the sort. You can do random.shuffle(lIn1) instead of the sort and cProfile.run('foo(lIn1)') and you'll get the same result. — sneep
Maybe the first list is still in cache? And you are using lIn, not lIn1 in the first test call. — Graipher

sneep sneep · Accepted Answer · 2018-03-21T09:46:13

Not really an answer yet, but the comment margin is a bit too small for this.

As random.shuffle() would yield the same result, I decided to implement my own shuffle function and vary the amount of times I'd shuffle. (In the below example, it's the parameter to xrange, 300000.

def my_shuffle(array):
    for _ in xrange(300000):
        rand1 = random.randint(0, 999999)
        rand2 = random.randint(0, 999999)
        array[rand1], array[rand2] = array[rand2], array[rand1]

The other code is pretty much unmodified:

import cProfile, random, copy
def foo(lIn): return [i*i for i in lIn]
lIn = [random.random()*100000 for i in range(1000000)]
lIn1 = copy.copy(lIn)
my_shuffle(lIn1)
cProfile.run('foo(lIn)')
cProfile.run('foo(lIn1)')

The results I got for the second cProfile depended on the number of times I shuffled:

10000 0.062
100000 0.082
200000 0.099
400000 0.122
800000 0.137
8000000 0.141
10000000 0.141
100000000 0.248

It looks like the more you mess an array up, the longer operations take, up to a certain point. (I don't know about the last result. It took so long that I did some light other stuff in the background and don't really want to retry.)

why python process a sorted list cost more time than a unsorted list

1 Answers