Is there a function that searches a sequence of elements for a subsequence? I am looking for an analogue of StringPosition
for List
s. In my current application I am working with integer lists, but I'd be interested in a general FindSequence[list, pattern, n]
function which will find the first n
occurrences of pattern
in list
.
Here's a toy example:
Generate some data:
In[1]:= $HistoryLength = 0
Out[1]= 0
In[2]:= Timing[digits = First[RealDigits[\[Pi], 2, 10000000]];]
Out[2]= {26.5, Null}
Let's convert it to a string so we can compare to StringPosition
. This is very slow an memory hungry. (The memory is freed when the evaluation finishes.)
In[3]:= Timing[str = StringJoin[ToString /@ digits];]
Out[3]= {43.813, Null}
I am looking for this subsequence:
In[4]:= patt = {1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 1, 1};
In[5]:= strpatt = StringJoin[ToString /@ patt];
Searching the string is very fast:
In[6]:= StringPosition[str, strpatt] // Timing
Out[6]= {1.047, {{5737922, 5737943}}}
This is a simple implementation of searching for numerical arrays. It's slower than StringPosition
:
In[7]:= Timing[
corr = ListCorrelate[patt, digits];
Select[Flatten@Position[corr, patt.patt],
digits[[# ;; # + Length[patt] - 1]] === patt &]
]
Out[7]= {2.234, {5737922}}
Summary:
- Is there a builtin that searches lists for subsequences?
- If there isn't, what is a fast and elegant implementation for numeric lists (my practical problem)?
- What about generic lists that can contain anything? (There are two possibilities here: "static" patterns only such as
{1,0,1}
, or general ones like{1,_,1}
, though these latter ones may introduce complications.)
I expect this will have many solutions, some fast, some more elegant, some more general :-)
Related questions:
- A fast implementation in Mathematica for Position2D (2D case of the same thing)
- What is the best way to find the period of a (repeating) list in Mathematica?
Interesting reading:
EDIT:
I just found the undocumented LongestCommonSubsequencePositions
. LongestCommonSubsequencePositions[a, b]
will find the longest common subsequence of the lists a
and b
, and return position of its first occurrence only in both a
and b
. (The documented LongestCommonSubsequence
, which I was not aware of, will only return the subsequence itself, not its position.)
It is slower than the alternatives above, but it works on general lists that can contain any expression.
In[57]:= LongestCommonSubsequencePositions[digits, patt] // Timing
Out[57]= {5.25, {{5737922, 5737943}, {1, 22}}}