0
votes

I am estimating a model for firm bankruptcy that involves 11 factors. I have data from 1900 to 2000 and my goal is to estimate my model using proc logistic for the period 1900-1950 and then test its performance on the 1951 through 2000 data. Proc logistic runs fine but the problem I have is that the estimated coefficients have the same name as my factors that I was using in my model. Suppose the dataset that contains all my observations is called myData and the dataset that contains the estimated coefficients which I obtain using an outtest statement (in proc logistic) is called factorEstimates. Now both of these data sets have the variables factor1, factor2, ..., factorN. Now I want to form the dataset outOfSampleResults that does something like the following:

data outOfSampleResults; set myData factorEstimates; newVar=factor1*factor1; run;

Where the first mention of factor1 refers to that contained in myData and the second refers to that contained in factorEstimates. How can I inform sas which dataset it should read for this variable that is common to both of the datasets in the set statement? Alternatively, how could I quickly rename factor1, factor2, ..., factorN as factor1Estimate, factor2Estimate, ..., factorNEstimate in the factorEstimates dataset so as to circumvent this common variable name issue altogether?

1
Are you aware that the code you've written will simply concatenate both datasets and then produce newVar as the square of factor1? Do you wish to merge/join the myData and factorEstimates datasets instead?mjsqu
Read this page: support.sas.com/documentation/cdl/en/basess/58133/HTML/default/… and take note of the sections where the 'RENAME=' data set option is used.mjsqu
No, I need a a way to quickly rename all the column names in the factorEstimates dataset. Perhaps proc SQL?Phillip Champlin
If the variables are named factor1-factor12 then you can do a mass rename via: rename factor1-factor12=new_factor1-new_factor12;Reeza

1 Answers

0
votes

Two quick ways to get estimates for a model already developed: 1. Proc logistic score statement http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect066.htm

  1. Include the data in your original proc logistic but use a new variable and ensure that the dependent variable is missing for the observations you want to predict.

    data stacked; set all; if year >1950 then predicted=.; else predicted=y; run;

    proc logistic data=stacked; model predicted = factor1 - factor12; output out=out_predicted predicted=p; run;