1
votes

Here is the code and related document (http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris), I am confused by this line, data.target[[10, 25, 50]], confused why using double [[]], if anyone could clarify, it will be great.

from sklearn.datasets import load_iris
data = load_iris()
print data.target[[10, 25, 50]]
print list(data.target_names)

thanks in advance, Lin

2
@TigerhawkT3 Looks like [10, 25, 50] are being indexed from the data.target arrayOneCricketeer
@TigerhawkT3, thanks and vote up. I tried print type(data.target), return is <type 'numpy.ndarray'>. Confused what do you mean list containing a single element, could you help to elaborate a bit more? Thanks.Lin Ma
@cricket_007, thanks and vote up, but why using double [[]]?Lin Ma
You are confused about NumPy arrays, then, not Python lists.OneCricketeer
Thanks @cricket_007, vote up. Then what is the meaning here for double [[]] in [[10, 25, 50]]Lin Ma

2 Answers

1
votes

Your confusion is understandable: this isn't "standard" Python by any means.

data.target in this case is an ndarray from numpy:

In [1]: from sklearn.datasets import load_iris
   ...: data = load_iris()
   ...: print data.target[[10, 25, 50]]
   ...: print list(data.target_names)
[0 0 1]
['setosa', 'versicolor', 'virginica']

In [2]: print type(data.target)
<type 'numpy.ndarray'>

numpy's ndarray implementation allows you to create a new array by providing a list of indices of the items you want. For example:

In [13]: data.target
Out[13]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [14]: data.target[1]
Out[14]: 0

In [15]: data.target[[1,2,3]]
Out[15]: array([0, 0, 0])

In [16]: print type(data.target[[1,2,3]])
<type 'numpy.ndarray'>

and it likely does this by overriding __getitem__.

For more information, see Indexing in the NumPy array documentation:

1
votes

This is retrieving elements from a numpy array A using "integer indexing" syntax (as opposed to the usual subscripts), i.e. a list of integers B will be used to find elements at those particular indices in A. Your output is a numpy array with the same shape as the list B that you use as "input", and the values of the output elements are obtained from the values of A at those integer indices e.g.:

>>> import numpy
>>> a = numpy.array([0,1,4,9,16,25,36,49,64,81])
>>> a[[1,4,4,1,5,6,6,5]]
  array([ 1, 16, 16,  1, 25, 36, 36, 25])

Integer indexing can be applied to more than one dimensions, e.g.:

>>> b = numpy.array([[0,1,4,9,16],[25,36,49,64,81]]) # 2D array
>>> b[[0,1,0,1,1,0],[0,1,4,3,2,3]]   # row and column integer indices
  array([ 0, 36, 16, 64, 49,  9])

or, the same example but with an input list of 2 dimensions, affecting the output shape:

>>> b[[[0,1,0],[1,1,0]],[[0,1,4],[3,2,3]]] # "row" and "column" 2D integer arrays
  array([[ 0, 36, 16],
         [64, 49,  9]])

Also note that you can perform "integer indexing" using a numpy array as well, rather than a list, e.g.

>>> a[numpy.array([0,3,2,4,1])]
  array([ 0,  9,  4, 16,  1])