0
votes

Our university is forcing us to perform the old school chi square test using PROC FREQ (I am aware of the options with proc univariate).

I have generated one theoretical exponential distribution with Beta=15 (and written down the values laboriously), and I've generated 10000 random variables which have an exponential distribution, with beta=15.

I try to first enter the frequencies of my random variables (in each interval) via the datalines command:

data expofaktiska;
    input number count;
    datalines;
    1 2910
    2 2040
    3 1400
    4 1020
    5 732
    6 531
    7 377
    8 305
    9 210
    10 144
    11 106
    12 66
    13 40
    14 45
    15 29
    16 16
    17 12
    18 8
    19 8
    20 3
    21 2
    22 0
    23 1
    24 2
    25 0
    26 2
;
run;

This seems to work.

I then try to compare these values to the theoretical values, using the chi square test in proc freq (the one we are supposed to use)

As follows:

proc freq data=expofaktiska;
weight count;
tables number / testp=(0.28347 0.20311 0.14554 0.10428 0.07472 0.05354 0.03837 0.02749 0.01969 0.01412 0.01011 0.00724 0.0052 0.00372 0.00266 0.00191 0.00137 0.00098 0.00070 0.00051 0.00036 0.00026 0.00018 0.00013 0.00010 0.00007) chisq;
run;

I get the following error:

ERROR: The number of TESTP values does not equal the number of levels. For the table of number,
       there are 24 levels and 26 TESTP values.

This may be because two intervals contain 0 obervations. I don't really see a way around this.

Also, I don't get the chi square test in the results viewer, nor the "tes probability", I only the frequency/cumulative frequency of the random variables.

What am I doing wrong? Do both theoretical/actual distributions need to have the same form (probability/frequencies?)

We are using SAS 9.4

Thanks in advance!

/Magnus

1
Well you would have got same results by x=floor(15*rand("Exponential")) then proc freq; table x;stat
TESTP=(values)| SAS-data-set IF you use TESTP=SAS-data-set you would not have to laboriously write down the the values.data _null_

1 Answers

1
votes

You need ZEROS options on the WEIGHT statement.

data expofaktiska;
    input number count;
    datalines;
    1 2910
    2 2040
    3 1400
    4 1020
    5 732
    6 531
    7 377
    8 305
    9 210
    10 144
    11 106
    12 66
    13 40
    14 45
    15 29
    16 16
    17 12
    18 8
    19 8
    20 3
    21 2
    22 0
    23 1
    24 2
    25 0
    26 2
;
run;
proc freq data=expofaktiska;
weight count / zeros;
tables number / testp=(0.28347 0.20311 0.14554 0.10428 0.07472 0.05354 0.03837 0.02749 0.01969 0.01412 0.01011 0.00724 0.0052 0.00372 0.00266 0.00191 0.00137 0.00098 0.00070 0.00051 0.00036 0.00026 0.00018 0.00013 0.00010 0.00007) chisq;
run;