6
votes

The 6.3.6 Vectors section in the Scheme R5RS standard states the following about vectors:

Vectors are heterogenous structures whose elements are indexed by integers. A vector typically occupies less space than a list of the same length, and the average time required to access a randomly chosen element is typically less for the vector than for the list.

This description of vectors is a bit diffuse.

I'd like to know what this actually means in terms of the vector-ref and list-ref operations and their complexity. Both procedures returns the k-th element of a vector and a list. Is the vector operation O(1) and is the list operation O(n)? How are vectors different than lists? Where can I find more information about this?

Right now I'm using association lists as a data structure for storing key/value pairs for easy lookup. If the keys are integers it would perhaps be better to use vectors to store the values.

2

2 Answers

4
votes

The very specific details of vector-ref and list-ref are implementation-dependent, meaning: each Scheme interpreter can implement the specification as it sees fit, so an answer for your question can not be generalized to all interpreters conforming to R5RS, it depends on the actual interpreter you're using.

But yes, in any decent implementation is a safe bet to assume that the vector-ref operation is O(1), and that the list-ref operation is probably O(n). Why? because a vector, under the hood, should be implemented using a data structure native to the implementation language, that allows O(1) access to an element given its index (say, a primitive array) - therefore making the implementation of vector-ref straightforward. Whereas lists in Lisp are created by linking cons cells, and finding an element at any given index entails traversing all the elements before it in the list - hence O(n) complexity.

As a side note - yes, using vectors would be a faster alternative than using association lists of key/value pairs, as long as the keys are integers and the number of elements to be indexed is known beforehand (a Scheme vector can not grow its size after its creation). For the general case (keys other than integers, variable size) check if your interpreter supports hash tables, or use an external library that provides them (say, SRFI 69).

1
votes

A list is constructed from cons cells. From the R5RS list section:

The objects in the car fields of successive pairs of a list are the elements of the list. For example, a two-element list is a pair whose car is the first element and whose cdr is a pair whose car is the second element and whose cdr is the empty list. The length of a list is the number of elements, which is the same as the number of pairs.

For example, the list (a b c) is equivalent to the following series of pairs: (a . (b . (c . ())))

And could be represented in memory by the following "nodes":

[p] --> [p] --> [p] --> null
 |       |       |
 |==> a  |==> b  |==> c

With each node [] containing a pointer p to the value (it's car), and another pointer to the next element (it's cdr).

This allows the list to grow to an unlimited length, but requires a ref operation to start at the front of the list and traverse k elements in order to find the requested one. As you stated, this is O(n).


By contrast, a vector is basically an array of values which could be internally represented as an array of pointers. For example, the vector #(a b c) might be represented as:

[p p p]
 | | |
 | | |==> c
 | |
 | |==> b
 |
 |==> a

Where the array [] contains a series of three pointers, and each pointer is assigned to a value in the vector. So internally you could reference the third element of the vector v using the notation v[3]. Since you do not need to traverse the previous elements, vector-ref is an O(1) operation.

The main disadvantage is that vectors are of fixed size, so if you need to add more elements than the vector can hold, you have to allocate a new vector and copy the old elements to this new vector. This can potentially be an expensive operation if your application does this on a regular basis.


There are many resources online - this article on Scheme Data Structures goes into more detail and provides some examples, although it is much more focused on lists.


All that said, if your keys are (or can become) integers and you either have a fixed number of elements or can manage with a reasonable amount of vector reallocations - for example, you load the vector at startup and then perform mostly reads - a vector may be an attractive alternative to an association list.