0
votes

I am replicating a R code for the Bayesian analysis but I got this error that I have tried to solve it, also reading other questions here but still it does not work. I use the same dataset and same variables (from OECD). Can anyone tell me why it does not work? My code is this:

rm(list=ls())
# Name of variables to be extracted
v.resp=c("pv1math") # Response Variable
v.treat=c("IC02Q01","IC02Q02","IC02Q03") # Treatment variable(s)
# Student Confoundings
v.student.conf=c("Age", "Gender", "isced_0", "IMMIG", "HEDRES", "WEALTH", "ESCS","FAMSTRUC","hisced","hisei","HOMEPOS", "TIMEINT")
# School Confoundings
v.school.conf=c("CLSIZE","SCMATEDU","STRATIO","SMRATIO","PublicPrivate")

## LOAD DATA
dat <- read.dta("name.dta")
## Weighted sample with weights in the w vector
w=dat$W_FSTUWT

Subset data in R

dat=dat[c(v.resp,v.treat,v.student.conf,v.school.conf)]
names(dat)[names(dat)==v.resp]="y"
w=w[complete.cases(dat)]
w=w/sum(w)
nw=function(w) w/sum(w)
dat=dat[complete.cases(dat),]
dim(dat)

When I run the line dat=dat[c(v.resp,v.treat,v.student.conf,v.school.conf)] I got the error Error in [.data.frame(dat, c(v.resp, v.treat, v.student.conf, v.school.conf)) :undefined columns selected

I have 25000 observation and 900 variables but I want to subset my data with 21 variables and the observations related to them (less than 25000 for sure). I put comma between )] but nothing, run other lines I lose all data.

I also run this code from "Quick-R website" but again the same error message

# select variables v1, v2, v3
myvars <- c("v1", "v2", "v3")
newdata <- mydata[myvars] 

I would like to understand why it does not work. I am copying and pasting these codes from a paper that used them for the same dataset. Thank you.

1
Could you run any(c(v.resp,v.treat,v.student.conf,v.school.conf) %in% names(dat))? If it returns FALSE then the column names do not exist in the dataframe. Perhaps the read.dta simply does not transfer the column names correctly - Vandenman
Thank you! I tried and I have no that error message, the observations are less than before but I have still 900 variables when I should have 21 (those I want to keep) - sicecon

1 Answers

0
votes

The message stated: undefined columns selected. That is just what is the situation here: you only selected the rows you wanted, but forgot to tell which columns. When you use [ ] for subsetting, you must specify the rows and the columns. So, you need a comma to separate the info for the rows and for the columns. Since you have no selection on rows, you don't need to specify anything after the comma. But the comma is needed. The adjusted code:

dat=dat[c(v.resp,v.treat,v.student.conf,v.school.conf),]

The only difference is the comma before the closing ]