0
votes

In my dataset the variables which are indicated by different range say Female_Age_Band are given as 15-20,20-25,25-30,...& so on.But the problem is wherever the data is unavailable that particular observation is labelled as "Unavailable" which is making sas to read this field as a character. So I believe this will make it difficult to invoke this variable in logistic regression. Further , there are also certain categorical fields which has say 3 distinct indicators 0 1 & 2. But even these fields have the "Unavailable" label. I can't technically replace them with zeroes because zero might be a valid value.

Can someone help with a solution ?

1
You can read it in as char and recode manually or replace all Unavailable in data by a space, (Find/Replace All)Reeza

1 Answers

0
votes

The problem is bigger than "unavailable" because "15-20" will also be read as character. But you don't want to replace them all with the middle value, as that would make things quite odd. You don't have XXX 17.5 year old people.

What I would do is use a data step to recode the data as uniformly distributed within each age band and recode "unavailable" as .

(Sorry, I just got a new computer and am waiting for a new SAS install, so I can't show code right now).