Update (to read first)
If you really are interested only in the row indexes, perhaps some straightforward use of split
and range
would be of use. The following assumes that the rownames in your dataset are sequentially numbered, but adaptations would probably also be possible.
irisFirstLast <- sapply(split(iris, iris$Species),
function(x) range(as.numeric(rownames(x))))
irisFirstLast ## Just the indices
# setosa versicolor virginica
# [1,] 1 51 101
# [2,] 50 100 150
iris[irisFirstLast[1, ], ] ## `1` would represent "first"
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 51 7.0 3.2 4.7 1.4 versicolor
# 101 6.3 3.3 6.0 2.5 virginica
iris[irisFirstLast, ] ## nothing would represent both first and last
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1 5.1 3.5 1.4 0.2 setosa
# 50 5.0 3.3 1.4 0.2 setosa
# 51 7.0 3.2 4.7 1.4 versicolor
# 100 5.7 2.8 4.1 1.3 versicolor
# 101 6.3 3.3 6.0 2.5 virginica
# 150 5.9 3.0 5.1 1.8 virginica
d <- datasets::Puromycin
dFirstLast <- sapply(split(d, d$state),
function(x) range(as.numeric(rownames(x))))
dFirstLast
# treated untreated
# [1,] 1 13
# [2,] 12 23
d[dFirstLast[2, ], ] ## `2` would represent `last`
# conc rate state
# 12 1.1 200 treated
# 23 1.1 160 untreated
If working with named rows, the general approach is the same, but you have to specify the range yourself. Here's the general pattern:
datasetFirstLast <- sapply(split(dataset, dataset$groupingvariable),
function(x) c(rownames(x)[1],
rownames(x)[length(rownames(x))]))
Initial answer (edited)
If you're interested in extracting the rows rather than needing the row number for other purposes, you can also explore data.table
. Here are some examples:
library(data.table)
DT <- data.table(iris, key="Species")
DT[J(unique(Species)), mult = "first"]
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1: setosa 5.1 3.5 1.4 0.2
# 2: versicolor 7.0 3.2 4.7 1.4
# 3: virginica 6.3 3.3 6.0 2.5
DT[J(unique(Species)), mult = "last"]
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1: setosa 5.0 3.3 1.4 0.2
# 2: versicolor 5.7 2.8 4.1 1.3
# 3: virginica 5.9 3.0 5.1 1.8
DT[, .SD[c(1,.N)], by=Species]
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1: setosa 5.1 3.5 1.4 0.2
# 2: setosa 5.0 3.3 1.4 0.2
# 3: versicolor 7.0 3.2 4.7 1.4
# 4: versicolor 5.7 2.8 4.1 1.3
# 5: virginica 6.3 3.3 6.0 2.5
# 6: virginica 5.9 3.0 5.1 1.8
This last approach is pretty convenient. For instance, if you wanted the first three rows and last three rows of each group, you can use: DT[, .SD[c(1:3, (.N-2):.N)], by=Species]
(Just for reference: .N
represents the number of cases per group.
Other useful approaches include:
DT[, tail(.SD, 2), by = Species] ## last two rows of each group
DT[, head(.SD, 4), by = Species] ## first four rows of each group
FIRST.
andLAST.
are not operators; they are automatic SAS data step variables defined to indicate column value changes duringBY
statement processing. – BellevueBobdiff()
, too ... – Ben Bolker