I am having trouble figuring out how to read the first two lines of each document in a corpus in R. The first two lines contain headlines from news articles that I want to analyze. I want to search the headlines (not the rest of each text) for the word 'abortion.'
Here is my code for creating the corpus:
myCorp <- corpus(readtext(file='~/R/win-library/3.3/quanteda/Abortion/1972/*'))
I have tried using readLines in a for loop:
for (mycorp in myCorp) {
titles <- readLines(mycorp, n = 2)
write.table(mycorp, "1972_text_P.txt", sep="\n\n", append=TRUE)
write.table(titles, "1972_text_P.txt", append=TRUE)
}
Error in readLines(mycorp, n = 2) : 'con' is not a connection
I have intentionally not created a DFM because I want to keep the 465 files as single documents in the corpus. How can I get the headlines from the article textx? Or, ideally, how would I search only the first two lines of each document for a keyword (abortion) and create a file that contains only those headlines with the keyword in them? Thanks for any and all help with this.