0
votes

Greater Goal

I want to extract package names (and more details later) from the package bibliography in R (generated with write_bib()). In order to create a table with columns for the most relevant information on the packages used in my analyses (e.g. Name of the package, version, maintainer, citation).

In the example entry from the bibliography below, I want to get the following string R-base

"@Manual{R-base, title = {R: A Language and Environment for Statistical Computing}, author = {{R Core Team}}, organization = {R Foundation for Statistical Computing}, address = {Vienna, Austria}, year = {2020}, url = {https://www.R-project.org/}, }"

The extraction of a the packagename substring between { and , works with the regex ""(?<=\\{).*(?=,)" -> this returns R-base

Current problem

When outside of a loop, the code below results the desired output of R-Base

  teststring <- "@Manual{R-base,
  title = {R: A Language and Environment for Statistical Computing},
  author = {{R Core Team}},
  organization = {R Foundation for Statistical Computing},
  address = {Vienna, Austria},
  year = {2020},
  url = {https://www.R-project.org/},
  }"
  
  str_extract(teststring,"(?<=\\{).*(?=,)")

However when I try to do the exactly same thing inside of a for loop I get multiple matches from the str_extract() function.

bibliography <- write_bib()
  for (entry in bibliography[1]){
  # currently for testing purposes just for the first entry in bibliography
    print(typeof(entry))
    print(str_extract(entry,"(?<=\\{).*(?=,)")  )
  }

[1] "character"
[1] "R-base"                                                  
[2] "R: A Language and Environment for Statistical Computing}"
[3] "{R Core Team}}"                                          
[4] "R Foundation for Statistical Computing}"                 
[5] "Vienna, Austria}"                                        
[6] "2020}"                                                   
[7] "https://www.R-project.org/}"                             
[8] NA                                                        
[9] NA   

Which is strange to me. I also included typeof for validation purposes. However the character vector entry in the loop should be identical to teststring

Edit

Found the solution has to do with list, in bibliography as obtained by write_bib() Specifiying which element solved it somehow.

name <- str_extract(entry[1],"(?<=\\{).*(?=,)")
1
Where is the library located? How did you install it? Also, why did you use a loop here?Wiktor Stribiżew
Because I want to iterate over all packages I have loaded in my script. I forgot one line in the code above, I will edit it.SysRIP
Please try res <- sapply( bibliography, function(x) str_extract(paste(x, collapse="\n"), "(?<=\\{).*(?=,)") ), then names(res) <- NULL and check the res.Wiktor Stribiżew

1 Answers

0
votes

You can use

library(stringr)
library(knitr)
bibliography <- write_bib()
res <- sapply( bibliography, function(x) str_extract(paste(x, collapse="\n"), "(?<=\\{).*(?=,)") )

I get

[1] "R-base"    "R-knitr"   "knitr2015" "knitr2014"