First off, I'm a first-time poster, so please bear with me. I've searched for answers both here and elsewhere, but have yet to find what I'm looking for. I'm quite new to SAS (and programming) and so it is highly possible I've searched for the wrong things.
Anyhow. I work in research, currently as a data manager for a big longitudinal questionnaire about work and health, a study that has been collected for the same participants for five waves of data collection. We want to facilitate the spreading of data and use of our dataset, so we'd like to create a teaching dataset from our current data. The teaching dataset currently includes 2000 randomly selected individuals and 463 variables - this is only a subset of the scales and some of the background info from the master set.
My problem is that one of the criteria that has to be met before we can start to spread the set, is that every person has to be and stay anonymous - therefore we must include random errors in the dataset. I have already grouped many of the background variables, income, age, education etc. But I want every variable to include at least some random error. I can't figure out have to do this. Most variables look like this:
Health_1 Health_n
1 2
4 2
5 5
. 1
1 1
Most variables can have values between 1 and 5 (and missing). I've been thinking about replacing values (i.e., every 1=2, every 2=3 etc) but it will make the end result bad since many analysis will turn out weird. For every variable, I would like to randomly change, for example, 50 of the 2000 observations to any integer the variable can assume (1 to 5 or missing).
Any suggestion? I guess I could make every n'th observation of variable y to be changed to x - but that won't be random. And I would like to change all variables instead of writing code for every single variable.