1
votes

I am using dcast function to rshape datframe in R, but while using large dataframe. I converted that into ffdf dataframe unable to use dcast function please help me if any alternatives. Find the below example i used for small dataframe and what i want to do for ffdf dataframe:

- hdsample <- read.csv("C:/Users/PK5016573/Desktop/hdsample.csv")
- View(hdsample)


hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)

This is working but:

hhp<-read.ffdf("C:/Users/PK5016573/Desktop/hdsample.csv")

hd<-dcast(hhpsample,MemberID~Year+Specialty+ProcedureGroup+Vendor+PlaceSvc+PCP+PrimaryConditionGroup+CharlsonIndex)

This gives me error please help

thanks in advance pavan kancharala

1
Please provide a reproducible example.akrun
Hi akrun please downlad data from the url:heritagehealthprize.com/c/hhp/data after downloading sort it in excel take only two MemberID data try first example after that take all the data and try the second code u will find the errorNaga Pavan
Is it HHP_release1?akrun
ya claims dataset HHP_release3Naga Pavan
The objective of stackoverflow is that you provide a reproducible example and that others can help you where you are stuck. Not the other way around.user1600826

1 Answers

0
votes

I got answer for this question but it may not work largely factored data

# Reshape_function to process on data
   # Reshaping data as per year and Primary condition group
    library(reshape2)
    library(ffbase)
    reshapefunction<-function(x){
    df=dcast(x,MemberID~ Year+PrimaryConditionGroup,
    value.var= "rep.x..each...2668990.",              
    fun.aggregate = sum)
    }
    # Reshaping data using reshape_function 
    # Specifying size of chunks to process the data
    PrimaryConditionGroup<-ffdfdply(x=hhp,split=hhp$MemberID
    ,FUN = function(x) reshapefunction(x),BATCHBYTES = 100000000,trace=TRUE)

View(PrimaryConditionGroup)

All the data was taken from kaggle competition added one more column "rep.x..each...2668990." which contains 1 in every row used for aggregation purpose