1
votes

I have data that's tracking a certain eye phenomena. Some patients have it in both eyes, and some patients have it in a single eye. This is what some of the data looks like:

EyeID   PatientID   STATUS  Gender
1   1   1   M
2   1   0   M
3   2   1   M
4   3   0   M
5   3   1   M
6   4   1   M
7   4   0   M
8   5   1   F
9   6   1   F
10  6   0   F
11  7   1   F
12  8   1   F
13  8   0   F
14  9   1   F

As you can see from the data above, there are 9 patients total and all of them have the particular phenomena in one eye.

I need the count the number of patients with this eye phenomena. To get the number of total patients in the dataset, I used:

PROC FREQ data=new nlevels;
tables PatientID;
run;

To count the number of patients with this eye phenomena, I used:

PROC SORT data=new out=new1 nodupkey;
by Patientid Status;
run;

proc freq data=new1 nlevels;
tables Status;
run;

However, it gave the correct number of patients with the phenomena (9), but not the correct number without (0).

enter image description here

I now need to calculate the gender distribution of this phenomena. I used:

proc freq data=new1;
tables gender*Status/chisq;
run;

enter image description here

However, in the cross table, it has the correct number of patients who have the phenomena (9), but not the correct number without (0). Does anyone have any thoughts on how to do this chi-square, where if the has this phenomena in at least 1 eye, then they are positive for this phenomena?

Thanks!

1
Can you post an example dataset?Joe
@Joe I have posted an example of what the data looks like in the screenshot above. Do you need more?ybao
The above does not replicate the issue you have. Please post complete data, not in image form, that is sufficient to replicate the issue, including expected results and actual results from your code.Joe
@Joe I have updated the post according to your suggestions. Thanksybao

1 Answers

1
votes

PROC FREQ is doing what you told it to: counting the status=0 cases.

In general here you are using sort of blunt tools to accomplish what you're trying to accomplish, when you probably should use a more precise tool. PROC SORT NODUPKEY is sort of overkill for example, and it doesn't really do what you want anyway.

To set up a dataset of has/doesn't have, for example, let's do a few things. First I add one more row - someone who actually doesn't have - so we see that working.

data have;
  input eyeID patientID status gender $;
  datalines;
1   1   1   M
2   1   0   M
3   2   1   M
4   3   0   M
5   3   1   M
6   4   1   M
7   4   0   M
8   5   1   F
9   6   1   F
10  6   0   F
11  7   1   F
12  8   1   F
13  8   0   F
14  9   1   F
15 10   0   M
;;;;
run;

Now we use the data step. We want a patient-level dataset at the end, where we have eye-level now. So we create a new patient-level status.

data patient_level;
  set have;
  by patientID;
  retain patient_status;
  if first.patientID then patient_status =0;
  patient_status = (patient_Status or status);
  if last.patientID then output;
  keep patientID patient_Status gender;
run;

Now, we can run your second proc freq. Also note you have a nice dataset of patients.

title "Patients with/without condition in any eye";
proc freq data=patient_level;
  tables patient_status;
run;
title;

You also may be able to do your chi-square analysis, though I'm not a statistician and won't dip my toe into whether this is an appropriate analysis. It's likely better than your first, anyway - as it correctly identifies has/doesn't have status in at least one eye. You may need a different indicator, if you need to know number of eyes.

title "Crosstab of gender by patient having/not having condition";
proc freq data=patient_level;
  tables gender*patient_Status/chisq;
run;
title;

If your actual data has every single patient having the condition, of course, it's unlikely a chi-square analysis is appropriate.