3
votes

An updated version of my question. Below is code to produce two Word documents. The first document contains a series of table titles, each with an accompanying bookmark. The second document contains an actual table.

What I'd like to be able to do is to determine what the table title in the second document should be based on what is specified in the first document. I believe the mechanics of this might involve finding the relevant bookmark in the first document, moving up a line to where the actual title is, and then copying the title, so that it can be used in the second document.

library(officer)
library(magrittr)
library(flextable)

read_docx() %>%

body_add_par(value = "Fred Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "FredBMK") %>%
body_add_par("") %>%

body_add_par(value = "Sally Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "SallyBMK") %>%
body_add_par("") %>%               

body_add_par(value = "George Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "GeorgeBMK") %>%
body_add_par("") %>%                               

body_add_par(value = "Sample Data from the mtcars Dataset", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "mtcarsBMK") %>%
body_add_par("") %>%                                               

body_add_par(value = "Susan Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "SusanBMK") %>%
body_add_par("") %>%                               

print(target = "Test Report Skeleton.docx")


read_docx() %>%
body_add_par(value = "Table Title (Corresponding to mtcarsBMK) from Other Document Goes Here", style = "table title") %>%
body_add_par("") %>%
body_add_flextable(flextable(mtcars[1:12, 1:3])) %>%
print(target = "Test Target Table.docx")

Original Question:

I'm using the R officer package to generate Word documents. Imagine a scenario where text initially is synchronized in two word documents. One is a larger report and the other is a table that is generated and then automatically inserted into the report. The title of the table starts out the same in both documents. Now suppose a medical writer manually alters the title of the table in the report. I'd like to be able to detect that and then automatically update the title in the table so it matches what is in the report.

The officer package documentation shows how to replace text within a single document with a user specified text string. It's not clear to me though if it could be used to do what I'm trying to accomplish. Neither is it clear to me that it can't be done within officer.

Below is some code that makes two word documents. One represents a report where changes have been made to a table title. The other represents the original table for which the title needs to be updated to match the report. The difference is minor. There is all caps for a word in one title and not in the other.

My hope is that it will be clear to someone how to detect the change in the first document and then to update the title in the second document.

library(officer)
library(magrittr)

read_docx() %>%
body_add_par(value = "AWESOME Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "AwesomeBMK") %>%
body_add_par("(Awesome table appears here immediately after AwesomeBMK bookmark)") %>%
print(target = "Awesome Report.docx")

read_docx() %>%
body_add_par(value = "Awesome Table", style = "table title") %>%
body_add_par("") %>%
body_bookmark(id = "AwesomeBMK") %>%
body_add_par("(Awesome table appears here immediately after AwesomeBMK bookmark)") %>%
print(target = "Awesome Table.docx")
2
For me its not clear how your team intends to work with this. Do you create the docx for everyone? Is there only the table as content in the documents or do they add more? And why do you want to change the table title anyway? Also I see that you want to use body_bookmark(id = "SusanBMK") as some kind of marker, which seems to be a good idea but people can accidentally delete it?Johannes Stötzer
Who makes what and when is not yet fully determined. That likely will become clear when trying to implement in a production environment. People could delete the bookmarks. Plan is to take steps to mitigate this. Have been working on a larger body of code. Trying to produce a sort of proof of concept. The current problem is my one sticking point. Had hoped someone would immediately know the answer and would be willing to share it with me. Think I'll need to find time to investigate this more fully on my own.Paul

2 Answers

1
votes

I am not entirely sure which document to change to be right, I would appreciate a flowchart to better follow your workflow.

For my solution, first I create your documents and save the path to read it later. (you should read them from your directory later). Than I read the docx with docx_summary(), compare both documents and look for the change. You can't change multiple changes with this code but should be doable. Lastly, I use the officer functions to replace the text.

library(officer)
library(magrittr)

doc_copy <- read_docx() %>%
  body_add_par(value = "AWESOME Table", style = "table title") %>%
  body_add_par("") %>%
  body_bookmark(id = "AwesomeBMK") %>%
  body_add_par("(Awesome table appears here immediately after AwesomeBMK bookmark)") %>%
  print(target = "Awesome Report.docx")

doc_orginal <- read_docx() %>%
  body_add_par(value = "Awesome Table", style = "table title") %>%
  body_add_par("") %>%
  body_bookmark(id = "AwesomeBMK") %>%
  body_add_par("(Awesome table appears here immediately after AwesomeBMK bookmark)") %>%
  print(target = "Awesome Table.docx")


#detect change
doc_copy_summary <- read_docx(doc_copy) %>%
  docx_summary()

doc_orginal_summary <- read_docx(doc_orginal) %>%
  docx_summary()

test <- as.data.frame(doc_copy_summary == doc_orginal_summary)

old <- doc_orginal_summary[which(test==F, arr.ind = T)]
change <- doc_copy_summary[which(test==F, arr.ind = T)]

#instert text
my_doc <- read_docx(doc_orginal)  %>% 
  cursor_reach(keyword = paste0(old)) %>% 
  body_add_par(value = paste0(change), pos = "on")%>%
  print(target = "Change Awesome Table.docx")
0
votes

Below is what I believe to be a solution. My knowledge of XML is in its infancy. Think this is working though.

First part of the code makes a Word file. Second part makes accessible the XML underlying that file. Third part reads the relevant part of the XML. Fifth part captures table and figure titles that are immediately followed by a bookmark. Sixth part captures bookmarks that are immediately preceded by a table or figure title. There are a table title and a bookmark in the Word file/XML that are unmatched. The table title is unmatched because there is no bookmark immediately after. The bookmark is unmatched because there is no table or figure title immediately before. The last part links the table/figure title to its corresponding bookmark.

Had planned to provide the XML here as well. Decided against it though because the XML for any Word document is very verbose and it would have taken forever to format it.

People who attempt to run the code will not have the Word document template I used containing the Table Title 1 and Figure Title 1 styles. I believe that a suitable Word template can easily be devised though with one's own version of a style for table titles and figure titles.

Hopefully someday this will prove helpful to someone.

#### Make Word file ####

library(officer)
library(magrittr)
library(xml2)

read_docx("Report Template Blank.docx") %>%            

body_remove() %>%
body_add_par(value = "Fred Table", style = "Table Title 1") %>%
body_add_par("") %>%
body_bookmark(id = "FredtblBMK") %>%
body_add_par("") %>%

body_add_par(value = "Fred Figure", style = "Figure Title 1") %>%
body_add_par("") %>%
body_bookmark(id = "FredfigBMK") %>%
body_add_par("") %>%

body_add_par(value = "Sally Table", style = "Table Title 1") %>%
body_add_par("") %>%
body_bookmark(id = "SallytblBMK") %>%
body_add_par("") %>%

body_add_par(value = "Sally Figure", style = "Figure Title 1") %>%
body_add_par("") %>%
body_bookmark(id = "SallyfigBMK") %>%
body_add_par("") %>%

body_add_par(value = "Unmatched Table", style = "Table Title 1") %>%
body_add_par("") %>%

body_add_par("Some text separating the unmatched title and unmatched bookmark.") %>%
body_add_par("") %>%

body_bookmark(id = "UnmatchedBMK") %>%
body_add_par("") %>%

print(target = "Test Report Skeleton.docx")

#### Make XML underlying Word document accessible ####

file.copy("Test Report Skeleton.docx", "Test Report Skeleton.zip", overwrite = TRUE)
unzip("Test Report Skeleton.zip", exdir = "Test Report Skeleton XML")

#### Read XML ####

doc <-  read_xml("./Test Report Skeleton XML/word/document.xml")

#### Find qualifying table and figure titles ####

xml_tbl <-
xml_find_all(
doc,
"//w:p[w:pPr/w:pStyle[@w:val='TableTitle1' or @w:val='FigureTitle1'] and
./following-sibling::w:p[1][./w:bookmarkStart]]"
) %>%
xml_text()

#### Find qualifying bookmarks ####

xml_bmk <-
xml_find_all(
doc,
"//w:p[./w:bookmarkStart and
./preceding-sibling::w:p[1][./w:pPr/w:pStyle[@w:val='TableTitle1' or @w:val='FigureTitle1']]]
/w:bookmarkStart"
) %>%
xml_attr("name")

xml_tbl_bmk <- data.frame(title = xml_tbl, bookmark = xml_bmk)

#### Show results ####

xml_tbl_bmk

         title    bookmark
1   Fred Table  FredtblBMK
2  Fred Figure  FredfigBMK
3  Sally Table SallytblBMK
4 Sally Figure SallyfigBMK