31
votes

There are two functions in the R core library.

  • row.names Get and Set Row Names for Data Frames
  • rownames Retrieve or set the row names of a matrix-like object.

However the docs for row.names specifies For a data frame, ‘rownames’ and ‘colnames’ eventually call ‘row.names’ and ‘names’ respectively, but the latter are preferred. Why are is row.names preferred? Wouldn't it be easier to just ignore row.names and just call rownames?

1
That link doesn't help at all.Rich Scriven
@RichardScriven If this question gets a good answer, perhaps that other question should be closed as a duplicate of this one.Matthew Lundberg
One piece of the puzzle, I think is in the word "eventually." Since rownames eventually calls row.names for a data.frame, then it would be more efficient to cut out the middle man and take it to the source. I think another piece that this documentation focuses to data.frames.lmo
Note that a "data.frame" has an explicit "row.names" attribute and not a "rownames". Also, row.names is a generic function that gets this specific attribute of the object and methods can be created for similar to "data.frame" objectsalexis_laz
Looks like cross-compatibility to me. names(iris) and colnames(iris) both work. I suspect the authors were kind enough to know that for old-school programmers coming from S or early R could still use old functionality, and new school users can use the new functions. So the language looks kind of Frankenstein after awhile, but it's a good thing to not have to remember which function goes with which data type.Pierre L

1 Answers

24
votes

row.names() is an S3 generic function whereas rownames() is a lower level non-generic function. rownames() is in effect the default method for row.names() that is applied to any object in the absence of a more specific method.

If you are operating on a data frame x, then it is more efficient to use row.names(x) because there is a specific row.names() method for data frames. The row.names() method for data frames simply extracts the "row.names" attribute that is already stored in x. By contrast, because of the definition of rownames() and the inter-relationships between the functions, rownames(x) has to extract all the dimension names of x, then drop the column names, then combine with names(x), then drop names(x) again. This process even involves a call to row.names(x) as an intermediate step. This will all usually happen so quickly that you don't notice it, but just extracting the attribute is obviously more efficient.

If you don't want to bother distinguishing the two functions, then it would be logical to just use the generic version row.names() all the time, since it always dispatches the appropriate method. For example, if x is a matrix, then row.names(x) just passes cleanly through to rownames(x) because there is no more specific method for that class of object.