Efficient ways to duplicate array/list in Python

Question

Note: I'm a Ruby developer trying to find my way in Python.

When I wanted to figure out why some scripts use mylist[:] instead of list(mylist) to duplicate lists, I made a quick benchmark of the various methods to duplicate range(10) (see code below).

EDIT: I updated the tests to make use of Python's timeit as suggested below. This makes it impossible to directly compare it to Ruby, because timeit doesn't account for the looping while Ruby's Benchmark does, so Ruby code is for reference only.

Python 2.7.2

Array duplicating. Tests run 50000000 times
list(a)     18.7599430084
copy(a)     59.1787488461
a[:]         9.58828091621
a[0:len(a)] 14.9832749367

For reference, I wrote the same script in Ruby too:

Ruby 1.9.2p0

Array duplicating. Tests 50000000 times
                      user     system      total        real
Array.new(a)     14.590000   0.030000  14.620000 ( 14.693033)
Array[*a]        18.840000   0.060000  18.900000 ( 19.156352)
a.take(a.size)    8.780000   0.020000   8.800000 (  8.805700)
a.clone          16.310000   0.040000  16.350000 ( 16.384711)
a[0,a.size]       8.950000   0.020000   8.970000 (  8.990514)

Question 1: what is mylist[:] doing differently that it is 25 % faster than even mylist[0:len(mylist)]. Does it copy in memory directly or what?

Question 2: edit: updated benchmarks don't show huge differences in Python and Ruby anymore. was: Did I implement the tests in some obviously inefficient way, so that Ruby code is so much faster than Python?

Now the code listings:

Python:

import timeit

COUNT = 50000000

print "Array duplicating. Tests run", COUNT, "times"

setup = 'a = range(10); import copy'

print "list(a)\t\t", timeit.timeit(stmt='list(a)', setup=setup, number=COUNT)
print "copy(a)\t\t", timeit.timeit(stmt='copy.copy(a)', setup=setup, number=COUNT)
print "a[:]\t\t", timeit.timeit(stmt='a[:]', setup=setup, number=COUNT)
print "a[0:len(a)]\t", timeit.timeit(stmt='a[0:len(a)]', setup=setup, number=COUNT)

Ruby:

require 'benchmark'

a = (0...10).to_a

COUNT = 50_000_000

puts "Array duplicating. Tests #{COUNT} times"

Benchmark.bm(16) do |x|
  x.report("Array.new(a)")   {COUNT.times{ Array.new(a) }}
  x.report("Array[*a]")   {COUNT.times{ Array[*a] }}
  x.report("a.take(a.size)")   {COUNT.times{ a.take(a.size) }}
  x.report("a.clone")    {COUNT.times{ a.clone }}
  x.report("a[0,a.size]"){COUNT.times{ a[0,a.size] }}
end

Use the python timeit module to measure python execution times. I doubt it'll make things (much) faster but it'll avoid all the usual timing traps. — Martijn Pieters♦
As for the time difference in alist[:] versus alist[0:len(alist)]; the latter creates python int objects, something the former method doesn't need to deal with. — Martijn Pieters♦
@MartijnPieters -- The latter also needs to look up the global len (and call it) each time — mgilson
Array(a) does not duplicate an array. When given an array it just calls to_ary on it, which returns self. You should also use Ruby's Benchmark library instead of doing your timing manually. — Andrew Marshall

Ashwini Chaudhary Ashwini Chaudhary · Accepted Answer · 2012-10-24T11:10:39

Use the timeit module in python for testing timings.

from copy import *

a=range(1000)

def cop():
    b=copy(a)

def func1():
    b=list(a)

def slice():
    b=a[:]

def slice_len():
    b=a[0:len(a)]



if __name__=="__main__":
    import timeit
    print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop")
    print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1")
    print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice")
    print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len")

Results:

copy(a) 3.98940896988
list(a) 2.54542589188
a[:] 1.96630120277                   #winner
a[0:len(a)] 10.5431251526

It's surely the extra steps involved in a[0:len(a)] are the reason for it's slowness.

Here's the byte code comparison of the two:

In [19]: dis.dis(func1)
  2           0 LOAD_GLOBAL              0 (range)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 STORE_FAST               0 (a)

  3          12 LOAD_FAST                0 (a)
             15 SLICE+0             
             16 STORE_FAST               1 (b)
             19 LOAD_CONST               0 (None)
             22 RETURN_VALUE        

In [20]: dis.dis(func2)
  2           0 LOAD_GLOBAL              0 (range)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 STORE_FAST               0 (a)

  3          12 LOAD_FAST                0 (a)    #same up to here
             15 LOAD_CONST               2 (0)    #loads 0
             18 LOAD_GLOBAL              1 (len) # loads the builtin len(),
                                                 # so it might take some lookup time
             21 LOAD_FAST                0 (a)
             24 CALL_FUNCTION            1         
             27 SLICE+3             
             28 STORE_FAST               1 (b)
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE

Efficient ways to duplicate array/list in Python

2 Answers