326
votes

Suppose I have a vector that is nested in a dataframe one or two levels. Is there a quick and dirty way to access the last value, without using the length() function? Something ala PERL's $# special var?

So I would like something like:

dat$vec1$vec2[$#]

instead of

dat$vec1$vec2[length(dat$vec1$vec2)]
11
I am by no means an R expert, but a quick google turned up this: <stat.ucl.ac.be/ISdidactique/Rhelp/library/pastecs/html/…> There appears to be a "last" function.benefactual
MATLAB has the notation "myvariable(end-k)" where k is an integer less than the length of the vector that will return the (length(myvariable)-k)th element. That would be nice to have in R.EngrStudent

11 Answers

423
votes

I use the tail function:

tail(vector, n=1)

The nice thing with tail is that it works on dataframes too, unlike the x[length(x)] idiom.

233
votes

To answer this not from an aesthetical but performance-oriented point of view, I've put all of the above suggestions through a benchmark. To be precise, I've considered the suggestions

  • x[length(x)]
  • mylast(x), where mylast is a C++ function implemented through Rcpp,
  • tail(x, n=1)
  • dplyr::last(x)
  • x[end(x)[1]]]
  • rev(x)[1]

and applied them to random vectors of various sizes (10^3, 10^4, 10^5, 10^6, and 10^7). Before we look at the numbers, I think it should be clear that anything that becomes noticeably slower with greater input size (i.e., anything that is not O(1)) is not an option. Here's the code that I used:

Rcpp::cppFunction('double mylast(NumericVector x) { int n = x.size(); return x[n-1]; }')
options(width=100)
for (n in c(1e3,1e4,1e5,1e6,1e7)) {
  x <- runif(n);
  print(microbenchmark::microbenchmark(x[length(x)],
                                       mylast(x),
                                       tail(x, n=1),
                                       dplyr::last(x),
                                       x[end(x)[1]],
                                       rev(x)[1]))}

It gives me

Unit: nanoseconds
           expr   min      lq     mean  median      uq   max neval
   x[length(x)]   171   291.5   388.91   337.5   390.0  3233   100
      mylast(x)  1291  1832.0  2329.11  2063.0  2276.0 19053   100
 tail(x, n = 1)  7718  9589.5 11236.27 10683.0 12149.0 32711   100
 dplyr::last(x) 16341 19049.5 22080.23 21673.0 23485.5 70047   100
   x[end(x)[1]]  7688 10434.0 13288.05 11889.5 13166.5 78536   100
      rev(x)[1]  7829  8951.5 10995.59  9883.0 10890.0 45763   100
Unit: nanoseconds
           expr   min      lq     mean  median      uq    max neval
   x[length(x)]   204   323.0   475.76   386.5   459.5   6029   100
      mylast(x)  1469  2102.5  2708.50  2462.0  2995.0   9723   100
 tail(x, n = 1)  7671  9504.5 12470.82 10986.5 12748.0  62320   100
 dplyr::last(x) 15703 19933.5 26352.66 22469.5 25356.5 126314   100
   x[end(x)[1]] 13766 18800.5 27137.17 21677.5 26207.5  95982   100
      rev(x)[1] 52785 58624.0 78640.93 60213.0 72778.0 851113   100
Unit: nanoseconds
           expr     min        lq       mean    median        uq     max neval
   x[length(x)]     214     346.0     583.40     529.5     720.0    1512   100
      mylast(x)    1393    2126.0    4872.60    4905.5    7338.0    9806   100
 tail(x, n = 1)    8343   10384.0   19558.05   18121.0   25417.0   69608   100
 dplyr::last(x)   16065   22960.0   36671.13   37212.0   48071.5   75946   100
   x[end(x)[1]]  360176  404965.5  432528.84  424798.0  450996.0  710501   100
      rev(x)[1] 1060547 1140149.0 1189297.38 1180997.5 1225849.0 1383479   100
Unit: nanoseconds
           expr     min        lq        mean    median         uq      max neval
   x[length(x)]     327     584.0     1150.75     996.5     1652.5     3974   100
      mylast(x)    2060    3128.5     7541.51    8899.0     9958.0    16175   100
 tail(x, n = 1)   10484   16936.0    30250.11   34030.0    39355.0    52689   100
 dplyr::last(x)   19133   47444.5    55280.09   61205.5    66312.5   105851   100
   x[end(x)[1]] 1110956 2298408.0  3670360.45 2334753.0  4475915.0 19235341   100
      rev(x)[1] 6536063 7969103.0 11004418.46 9973664.5 12340089.5 28447454   100
Unit: nanoseconds
           expr      min         lq         mean      median          uq       max neval
   x[length(x)]      327      722.0      1644.16      1133.5      2055.5     13724   100
      mylast(x)     1962     3727.5      9578.21      9951.5     12887.5     41773   100
 tail(x, n = 1)     9829    21038.0     36623.67     43710.0     48883.0     66289   100
 dplyr::last(x)    21832    35269.0     60523.40     63726.0     75539.5    200064   100
   x[end(x)[1]] 21008128 23004594.5  37356132.43  30006737.0  47839917.0 105430564   100
      rev(x)[1] 74317382 92985054.0 108618154.55 102328667.5 112443834.0 187925942   100

This immediately rules out anything involving rev or end since they're clearly not O(1) (and the resulting expressions are evaluated in a non-lazy fashion). tail and dplyr::last are not far from being O(1) but they're also considerably slower than mylast(x) and x[length(x)]. Since mylast(x) is slower than x[length(x)] and provides no benefits (rather, it's custom and does not handle an empty vector gracefully), I think the answer is clear: Please use x[length(x)].

126
votes

If you're looking for something as nice as Python's x[-1] notation, I think you're out of luck. The standard idiom is

x[length(x)]  

but it's easy enough to write a function to do this:

last <- function(x) { return( x[length(x)] ) }

This missing feature in R annoys me too!

50
votes

Combining lindelof's and Gregg Lind's ideas:

last <- function(x) { tail(x, n = 1) }

Working at the prompt, I usually omit the n=, i.e. tail(x, 1).

Unlike last from the pastecs package, head and tail (from utils) work not only on vectors but also on data frames etc., and also can return data "without first/last n elements", e.g.

but.last <- function(x) { head(x, n = -1) }

(Note that you have to use head for this, instead of tail.)

20
votes

The dplyr package includes a function last():

last(mtcars$mpg)
# [1] 21.4
18
votes

I just benchmarked these two approaches on data frame with 663,552 rows using the following code:

system.time(
  resultsByLevel$subject <- sapply(resultsByLevel$variable, function(x) {
    s <- strsplit(x, ".", fixed=TRUE)[[1]]
    s[length(s)]
  })
  )

 user  system elapsed 
  3.722   0.000   3.594 

and

system.time(
  resultsByLevel$subject <- sapply(resultsByLevel$variable, function(x) {
    s <- strsplit(x, ".", fixed=TRUE)[[1]]
    tail(s, n=1)
  })
  )

   user  system elapsed 
 28.174   0.000  27.662 

So, assuming you're working with vectors, accessing the length position is significantly faster.

12
votes

Another way is to take the first element of the reversed vector:

rev(dat$vect1$vec2)[1]
11
votes

I have another method for finding the last element in a vector. Say the vector is a.

> a<-c(1:100,555)
> end(a)      #Gives indices of last and first positions
[1] 101   1
> a[end(a)[1]]   #Gives last element in a vector
[1] 555

There you go!

11
votes

Package data.table includes last function

library(data.table)
last(c(1:10))
# [1] 10
8
votes

Whats about

> a <- c(1:100,555)
> a[NROW(a)]
[1] 555
3
votes

The xts package provides a last function:

library(xts)
a <- 1:100
last(a)
[1] 100