Surprisingly, I find startswith
is slower than in
:
In [10]: s="ABCD"*10
In [11]: %timeit s.startswith("XYZ")
1000000 loops, best of 3: 307 ns per loop
In [12]: %timeit "XYZ" in s
10000000 loops, best of 3: 81.7 ns per loop
As we all know, the in
operation needs to search the whole string and startswith
just needs to check the first few characters, so startswith
should be more efficient.
When s
is big enough, startswith
is faster:
In [13]: s="ABCD"*200
In [14]: %timeit s.startswith("XYZ")
1000000 loops, best of 3: 306 ns per loop
In [15]: %timeit "XYZ" in s
1000000 loops, best of 3: 666 ns per loop
So it seems that calling startswith
has some overhead which makes it slower when the string is small.
And than I tried to figure out what's the overhead of the startswith
call.
First, I used an f
variable to reduce the cost of the dot operation - as mentioned in this answer - here we can see startswith
is still slower:
In [16]: f=s.startswith
In [17]: %timeit f("XYZ")
1000000 loops, best of 3: 270 ns per loop
Further, I tested the cost of an empty function call:
In [18]: def func(a): pass
In [19]: %timeit func("XYZ")
10000000 loops, best of 3: 106 ns per loop
Regardless of the cost of the dot operation and function call, the time of startswith
is about (270-106)=164ns, but the in
operation takes only 81.7ns. It seems there are still some overheads for startswith
, what's that?
Add the test result between startswith
and __contains__
as suggested by poke and lvc:
In [28]: %timeit s.startswith("XYZ")
1000000 loops, best of 3: 314 ns per loop
In [29]: %timeit s.__contains__("XYZ")
1000000 loops, best of 3: 192 ns per loop
s.__contains__("XYZ")
as this will take the same route ass.startswith("XYZ")
then (using thein
operator will short-cut the member access). However,startswith
is still slower for me then. – poke__contains__
being fully typed in C, whilestartswith
does actual argument parsing and stuff (you can also pass a tuple). – pokes.startswith("XYZ")
report 153ns, ands.__contains__("XYZ")
reports 169ns. As @poke says, usingin
will use completely different lookup rules than the method call - it can be looked up directly from a function pointer at the C level, while the method lookup does two dictionary searches and then has to do a Python-level function call. Timing those things separately can give you some idea of the difference, but isn't necessarily exact. On your numbers, subtracting both those overheads makes the time forstartswith
negative! – lvc%timeit "XYZ" == s[0:3]
which gives me10000000 loops, best of 3: 94 ns per loop
while%timeit "XYZ" in s
10000000 loops, best of 3: 59.2 ns per loop
. Tested with python 3.4.3. (it seems that in my case, slicing gives "some" overhead, as%timeit "XYZ" in s[0:3]
results in10000000 loops, best of 3: 101 ns per loop
) – Marandils
is a repetition of the string"ABCD"
, so the whole string has to be searched in order to come to the conclusion that"XYZ"
is not contained within. – poke