2
votes

I have some sentences, from the sentences I want to separate the words to get row vector each. But the words are repeating to match with the largest sentence's row vector that I do not want. I want no matter how large the sentence is, the row vector of each of the sentences will only be the words one time.

sentence <- c("case sweden", "meeting minutes ht board meeting st march now also attachment added agenda today s board meeting", "draft meeting minutes board meeting final meeting minutes ht board meeting rd april")
sentence <- cbind(sentence)
word_table <- do.call(rbind, strsplit(as.character(sentence), " "))
test <- cbind(sentence, word_table)

This is what I get now, enter image description here

And this is what I want, enter image description here

I mean no-repeating.

1
Dataframes work as data structures with the same number of entries per row, a list based structure might be more efficient?user5219763
Yeah, either a list structure or a "long" dataframe, with string ID in one col and words in the second col.Frank
For example, for the third sentence what is largest, read.table is creating one extra row, in total now for three sentence it is becoming 4 rows, what is not expected :(BiMo
Aha, I see. Yes, it is working now. thanks @rawrBiMo
Thank you very much guys, stackoverflow is really wonderful, discussing with you all really solved my problem in the shortest time span. :)BiMo

1 Answers

2
votes

The Solution from rawr,

sentence <- c("case sweden", "meeting minutes ht board meeting st march now also attachment added agenda today s board meeting", "draft meeting minutes board meeting final meeting minutes ht board meeting rd april")
dd <- read.table(text = paste(sentence, collapse = '\n'), fill = TRUE)
test <- cbind(sentence, dd)

Or,

cc <- read.table(text = paste(gsub('\n', '', sentence), collapse = '\n'), fill = TRUE)
test1 <- cbind(sentence, cc)

Thanks.