8
votes

How do I search for a string in a data.frame? As a minimal example, how do I find the locations (columns and rows) of 'horse' in this data.frame?

> df = data.frame(animal=c('goat','horse','horse','two', 'five'), level=c('five','one','three',30,'horse'), length=c(10, 20, 30, 'horse', 'eight'))
> df
  animal level length
1   goat  five     10
2  horse   one     20
3  horse three     30
4    two    30  horse
5   five horse  eight

... so row 4 and 5 have the wrong order. Any output that would allow me to identify that 'horse' has shifted to the level column in row 5 and to the length column in row 4 is good. Maybe:

> magic_function(df, 'horse')
col       row
'animal', 2
'animal', 3
'length', 4
'level',  5

Here's what I want to use this for: I have a very large data frame (around 60 columns, 20.000 rows) in which some columns are messed up for some rows. It's too large to eyeball in order to identify the different ways that order can be wrong, so searching would be nice. I will use this info to move data to the correct columns for these rows.

4
What order is correct, and what outcome do you want?Tim Biegeleisen
@TimBiegeleisen I updated the question. By "location" I meant columns and rows (a data frame is 2D). I'm not asking about how to change order - that I know how to do. It was just to present the context of my problem. I guess there's no value in pasting my actual 60 columns since everybody would then ask for a minimal example ;-) But in my minimal example, let's say that the order of the first three rows is correct and the data is on the wrong columns in the row 4 and 5.Jonas Lindeløv
re "hat 'horse' has shifted to the level column in row 5": there's no horsein row 5 of your example. I quit there.Tensibai
@RonakShah, I've updated with an example output.Jonas Lindeløv
@Tensibai, thanks for noticing. I've updated the example.Jonas Lindeløv

4 Answers

14
votes

What about:

which(df == "horse", arr.ind = TRUE)
#      row col
# [1,]   2   1
# [2,]   3   1
# [3,]   5   2
# [4,]   4   3
5
votes

Another way around:

l <- sapply(colnames(df), function(x) grep("horse", df[,x]))

$animal
[1] 2 3

$level
[1] 5

$length
[1] 4

If you want the output to be matrix:

sapply(l,'[',1:max(lengths(l)))

     animal level length
[1,]      2     5      4
[2,]      3    NA     NA
1
votes

Another way to do it is the following:

library(data.table)
library(zoo)
library(dplyr)
library(timeDate)
library(reshape2)
data frame name = tbl_account

first,Transpose it :

temp = t(tbl_Account)

Then, put it in to a list :

temp = list(temp)

This essentially puts every single observation in a data frame in to one massive string, allowing you to search the whole data frame in one go.

then do the searching :

temp[[1]][grep("Horse",temp[[1]])] #brings back the actual value occurrences
grep("Horse", temp[[1]]) # brings back the position of the element in a list it occurs in 

hope this helps :)

0
votes

We can get the indices where the value is equal to horse. Divide it by number of rows (nrow) to get the column indices and by columns (ncol) to get the row indices.

We use colnames to get column names instead of indices.

data.frame(col = colnames(df)[floor(which(df == "horse") / (nrow(df) + 1)) + 1], 
           row = floor(which(df == "horse") / ncol(df)) + 1)

#   col   row
#1 animal   1
#2 animal   2
#3  level   4
#4 length   5