3
votes

Does an algorithm exist for converting a linear index to a list of subscripts with support for negative strides?

Background

Environments, such as MATLAB, Julia, and others, and libraries, such as NumPy, provide support for strided arrays (aka ndarrays). Strided arrays are backed by linear memory (e.g., a single underlying buffer), which stands in contrast to nested arrays, where each nested array corresponds to a dimension. For example, consider the following 2x2 matrix

[ 1 2
  3 4 ]

To implement as an array of arrays

A = [ [ 1, 2 ], [ 3, 4 ] ]

where (using zero-based indexing)

a01 = A[0][1] = 2

We can represent the same 2x2 matrix as a strided array as follows (assuming row-major)

A = [ 1, 2,
      3, 4 ]

where

a01 = A[ 2*0 + 1*1 ] = 2

In general, for a strided NxM matrix, the element (i,j) may be accessed via

function get( i, j ) {
    return buffer[ si*i + sj*j ];
}

where buffer is the underlying data buffer and si and sj correspond to the strides along the i and j dimensions, respectively. Assuming a row-major strided array, for the 2x2 matrix above, si = 2 and sj = 1 (omitting element byte length).

In general, the strides may be computed from the array shape as follows:

function shape2strides( shape, order ) {
    var out = new Array( shape.length );
    var s = 1;
    var i;
    if ( order === 'column-major' ) {
        for ( i = 0; i < shape.length; i++ ) {
            out[ i ] = shape[ i ];
            s *= shape[ i ];
        }
        return out;
    } else { // row-major
        for ( i = shape.length-1; i >= 0; i-- ) {
            out[ i ] = shape[ i ];
            s *= shape[ i ];
        }
    }
}

To facilitate working with strided arrays, environments/libraries often provide convenience functions which allow easy conversion between linear indices and subscripts. For example, in MATLAB, to convert from subscripts to a linear index

idx = sub2ind( size( A ), i, j )

Similarly, to convert from a linear index to subscripts in MATLAB

s = ind2sub( size( A ), idx )

Julia also has sub2ind and ind2sub. In NumPy, you can use ravel_multi_index and unravel_index.

In addition to data locality, strided arrays are convenient because they allow creating array "views" by manipulating whether a stride is negative or positive. When a stride is negative, instead of iterating from left-to-right, we iterate from right-to-left along that dimension. To support this iteration behavior, we need to determine where, in the underlying data buffer, is the first indexed element. By convention, we'll refer to this index as the "offset", which can be computed as follows

function strides2offset( shape, strides ) {
    var offset = 0;
    var i;
    for ( i = 0; i < shape.length; i++ ) {
        if ( strides[ i ] < 0 ) {
            offset -= strides[i] * ( shape[i]-1 ); // increments the offset
        }
    }
    return offset;
}

Once we have the offset, we need to modify our get( i, j ) function as follows

function get( i, j ) {
    return buffer[ offset + si*i + sj*j ];
}

For a 2x2 matrix A with strides 2,1, the offset is 0, thus returning the original get function above. When the strides are 2,-1, the offset is 1; for -2,1, the offset is 2; for -2,-1, the offset is 3. Accordingly, we can generate the following matrix views (assuming row-major)

Dims: 2x2

Strides: 2,1
Offset: 0

A = [ 1, 2,
      3, 4 ]

Strides: 2,-1
Offset: 1

A = [ 2, 1,
      4, 3 ]

Strides: -2,1
Offset: 2

A = [ 3, 4,
      1, 2 ]

Strides: -2,-1
Offset: 3

A = [ 4, 3,
      2, 1 ]

The above views highlight one of the advantages of strided arrays: O(1) operations. For example, to flip a matrix left-to-right, we need only flip the sign of the stride of the second dimension (assuming row-major). To flip up-to-down, we flip the sign of the stride of the first dimension (assuming row-major). To flip left-to-right, up-to-down, we flip the sign of both strides. All the aforementioned operations do not involve touching the underlying data buffer; we simply change strided array meta data.

sub2ind

Converting from subscripts to a linear index is straightforward, even when accounting for negative strides (i.e., strided array views). For example, for a strided array of arbitrary dimensions,

function sub2ind( ...subscripts ) {
    var sub;
    var idx;
    var s;
    var n;

    idx = offset;
    for ( n = 0; n < shape.length; n++ ) {
        sub = subscripts[ n ];
        s = strides[ n ];
        if ( s < 0 && offset === 0 ) { // assume want "view" index
            idx -= sub * s; // always increments `idx`
        } else { // assume want underlying data buffer index
            idx += sub * s; // may increment or decrement `idx`
        }
    }
    return idx;
}

Here, we allow for returning a linear index from the perspective of the view or from the perspective of the underlying data buffer. When the "offset" is 0, we assume we are always returning a linear index into the view (which may not correspond to the linear index in the underlying data buffer). In other words, for a 2x2 matrix view, (0,0) => 0, (0,1) => 1, (1,0) => 2, (1,1) => 3, always. Which makes sense from the standpoint that, when working with a view, this mapping is in accordance with intuition. When I want A(0,0), I expect the element to be located at the "first" linear index, even if that is not where that element is actually stored in underlying data buffer.

You can prove to yourself that sub2ind returns the same index for any offset as detailed above when extending element lookup to negative strides.

For example implementations, see Julia, NumPy, and stdlib.

ind2sub

The question being asked here is how to implement the reverse of sub2ind, with support for negative strides.

For positive strides (and, thus, an offset of 0), we can use modulo arithmetic to recover the subscripts. For example, consider the equation for resolving a linear index for a NxMxL strided array.

idx = offset + si*i + sj*j + sk*k

where, assuming row-major, si = nj*nk, sj = nk, sk = 1 and ni, nj, nk are the dimension sizes N, M, L, respectively. Substituting values,

idx = 0 + (nj*nk)*i + nk*j + k

which can be rearranged

idx = nk*(nj*i + j) + k

If we take the modulo of both sides using nk,

idx % nk = k

Knowing k, let's rearrange the initial equation

(idx - k) = nk*(nj*i + j)
(idx - k)/nk = nj*i + j

If we take the modulo of both sides using nj,

((idx - k)/nk) % nj = j

Knowing j, let's rearrange the initial equation to solve for i

(((idx - k)/nk) - j)/nj = i

The above algorithm generalizes to any number of dimensions and is straightforward to implement (see also Julia and NumPy).

function ind2sub( idx, order ) {
    var out = new Array( shape.length );
    var s;
    var i;
    if ( order === 'column-major' ) {
        for ( i = 0; i < shape.length; i++ ) {
            s = idx % shape[ i ];
            idx -= s;
            idx /= shape[ i ];
            out[ i ] = s;
        }
    } else { // row-major
        for ( i = shape.length-1; i >= 0; i-- ) {
            s = idx % shape[ i ];
            idx -= s;
            idx /= shape[ i ];
            out[ i ] = s;
        }
    }
    return out;
}

The above algorithm using modulo arithmetic, however, does not support negative strides. Were we to use the same procedure above to solve for subscripts i,j,k, we would begin with the equation

idx = offset + nk*(nj*i + j) + k

which could be simplified to

idx-offset = nk*(nj*i + j) + k

The problem here, of course, is that idx-offset can be negative and effectively shifts the range of possible i,j,k values (i should be on the half-open interval [0,N); j on the interval [0,M); and k on the interval [0,L)).

This then prompts the question as to whether an algorithm exists for converting a linear index to subscripts with support for negative strides. Or in other words, is there an algorithm which, given a linear index into an underlying data buffer, can return the corresponding view subscripts?

Implementations in other languages/libraries (such as Julia and NumPy) seem to only provide support for the offset = 0 case. I am looking for something more general, which can apply to strided array views, as well.

Any pointers to existing implementations/algorithms would be greatly appreciated.

2
Sorry my eyes glazed over at that wall of text, but I think you're looking for numpy.lib.stride_tricks.as_strided. Maybe. Whole-number multiples of dimensional strides work. Negative strides won't work I don't think think, but as_strided makes a view, and you can make a view of that view by using fancy indexing - view[::-1] - Daniel F
@DanielF Thanks for your comment, but not what I am looking for. As stated in the OP, I am interested in an algorithm which generalizes to negative strides. Ideally, this algorithm should be language/library independent. What you suggest is strongly tied to NumPy. - kgryte
Ahh I see. I think. You may want to tag some lower-level languages than numpy then, as those memory-level algorithms will generally be implemented in some low to mid-level language like C or FORTRAN - Daniel F
@DanielF Yeah, numpy was what SO recommended, so went with that. I may update the tags tomorrow or the day after. Thanks for the suggestion. - kgryte

2 Answers

0
votes

(edit - I may be dealing with the easier nd-index to flat index case, while you are focused on the reverse. It's too late to explore that task now - I'll revisit this in the morning.)


If the offset is right, I think the same formula, for converting n-d index, to flat indexing works with negative as well as positive strides:

Compare indexing of a (3,4) array with its double flip:

In [32]: x = np.arange(12).reshape(3,4)
In [33]: y = x[::-1, ::-1]
In [34]: x.strides
Out[34]: (16, 4)
In [35]: y.strides
Out[35]: (-16, -4)

Data buffer 'start' can be found with __array_interface__:

In [36]: x.__array_interface__['data']
Out[36]: (166934688, False)
In [37]: y.__array_interface__['data']
Out[37]: (166934732, False)
In [38]: 732-688
Out[38]: 44

There are 48 bytes in the buffer, but the offset for y is 44, the start of the x[2,3] element (11 in this example).

Now test the flat indexing for an x element:

In [39]: x[1,2]
Out[39]: 6            # value
In [40]: x[1:2, 2:3].__array_interface__['data']    # a view
Out[40]: (166934712, False)
In [41]: 688+(1*16)+(2*4)    # offset + si*i + sj*j 
Out[41]: 712

Now do the same for y:

In [42]: y[1:2, 2:3].__array_interface__['data']
Out[42]: (166934708, False)
In [43]: 732+(1*-16)+(2*-4)
Out[43]: 708
0
votes

I've written a piece of code which I think solves your problem. There is more than the one function you've asked for because the other functions slightly differ from yours so I've put all of them here to avoid potential inconsistencies.

The following code is not written in any particular language. However, you can find there some elements of C syntax.

function calcStrides(shape[]) {
    strides[]; # Result array
    currentStride = 1;
    for(i = shape.size; 0 < i;) {
        --i;
        if(0 < shape[i]) {
            strides[i] = currentStride;
            currentStride *= shape[i];
        } else {
            strides[i] = -currentStride;
            currentStride *= -shape[i];
        }
    }
    return strides;
}

function calcOffset(shape[], strides[]) {
    offset = 0;
    for(i = 0; i < shape.size; ++i) {
        if(shape[i] < 0) {
            offset += strides[i] * (shape[i] + 1);
        }
    }
    return offset;
}

function sub2ind(strides[], offset, subs[]) {
    ind = offset;
    for(i = 0; i < strides.size; ++i) {
        ind += strides[i] * subs[i];
    }
    return ind;
};

function ind2sub(shape[], strides[], ind) {
    subs[]; # Result array
    for(i = 0; i < shape.size; ++i) {
        if(0 < strides[i]) {
            subs[i] = ind / strides[i];
            ind -= subs[i] * strides[i];
        } else {
            absSub = ind / -strides[i];
            subs[i] = -shape[i] - 1 - absSub;
            ind -= absSub * -strides[i];
        }
    }
    return subs;
}