In the following situations, they behave the same:
- Selecting a single column (
df['A']
is the same as df.loc[:, 'A']
-> selects column A)
- Selecting a list of columns (
df[['A', 'B', 'C']]
is the same as df.loc[:, ['A', 'B', 'C']]
-> selects columns A, B and C)
- Slicing by rows (
df[1:3]
is the same as df.iloc[1:3]
-> selects rows 1 and 2. Note, however, if you slice rows with loc
, instead of iloc
, you'll get rows 1, 2 and 3 assuming you have a RangeIndex. See details here.)
However, []
does not work in the following situations:
- You can select a single row with
df.loc[row_label]
- You can select a list of rows with
df.loc[[row_label1, row_label2]]
- You can slice columns with
df.loc[:, 'A':'C']
These three cannot be done with []
.
More importantly, if your selection involves both rows and columns, then assignment becomes problematic.
df[1:3]['A'] = 5
This selects rows 1 and 2 then selects column 'A' of the returning object and assigns value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of making this assignment is:
df.loc[1:3, 'A'] = 5
With .loc
, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']
), select a single row (df.loc[5]
), and select a list of rows (df.loc[[1, 2, 5]]
).
Also note that these two were not included in the API at the same time. .loc
was added much later as a more powerful and explicit indexer. See unutbu's answer for more detail.
Note: Getting columns with []
vs .
is a completely different topic. .
is only there for convenience. It only allows accessing columns whose names are valid Python identifiers (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1
won't work if there is no column a
). Other than that, .
and []
are the same.
df.col1
? All three of these are essentially equivalent for the very simple case of selecting a column..loc
will let you do much more than select a column. Possible duplicate of stackoverflow.com/questions/31593201/… – juanpa.arrivillagadf.sum
, what happens? (spoiler alert, nothing useful, althoughdf.sum()
still works luckily) So 3rd way should be seen as a shortcut that is fine, but need to be careful with – JohnE