1
votes

I am using the RevoScaleR package in MS Visual Studio, and I'm stuck on a step.

I have one XDF file with a column called "Total_Admits_Pred". I have another XDF file with a column called "Total_Admits".

Both XDF files have the same number of rows. I would like to combine the two XDF files into a single XDF file with both of these columns. How could I do that?

Thanks!

Thomas

2

2 Answers

3
votes

You can add columns to an existing xdf file with rxDataStep:

xdf1 <- RxXdfData("file1.xdf")  # dataset containing Total_Admits_Pred
xdf2 <- RxXdfData("file2.xdf")  # dataset containing Total_Admits

rxDataStep(xdf1, xdf2, varsToKeep="Total_Admits_Pred", append="cols")

This will result in file2.xdf containing all its pre-existing columns, plus Total_Admits_Pred.

Another way is to use the dplyrXdf package:

devtools::install_github("RevolutionAnalytics/dplyrXdf")

df <- data.frame(Total_Admits_Pred=xdf1$Total_Admits_Pred,
                 Total_Admits=xdf2$Total_Admits)

This creates an in-memory data frame with just the two columns you want. The advantage of this, over the other answer, is that it reads only those two columns into memory.

1
votes

You would do something like this:

xdf_df1 <- rxImport("<path/to/xdf1>")
xdf_df2 <- rxImport("<path/to/xdf2>")

xdfOut <- RxXdfData("<path/to/merged/xdf>") # Should not already exist

# This assumes that xdf2 was the one containing "Total_Admits_Pred"
# and that xdf1 contained "Total_Admits", you'll have to adjust this
# based on your data.
xdf_df1[["Total_Admits_Pred"]] <- xdf_df2$Total_Admits_Pred 

# Verify the Data Frame is correct
head(xdf_df1)

# Export it
rxDataStep(inData = xdf_df1, outFile = xdfOut)