1
votes

I am trying to create a bar chart that sums up values in a field, but only for each unique identifier. For instance, for my data:

Condition   CT_ID   Enrollment  Company
I           5127    24          H
J           5127    24          H
P           5127    24          H
I           5127    24          O
J           5127    24          O
P           5127    24          O
L           27668   387         C
R           27668   387         C
D           38190   650         D
Q           38190   650         D
F           38785   30          A
E           39682   30          B
M           41818   17          I
O           44093   188         G
A           54850   18          K
G           59183               F
C           59891   266         J
G           61142   48          F
H           61425   28          L
K           61449               N
A           61793   12          E
N           61793   12          E
B           61910   120         M
B           61917   120         M
B           61961   130         M

Or, since I really want to eventually summarize these data by Condition, let me just show the above data resorted by Condition instead of [CT_ID].

Condition   CT_ID   Enrollment  Company
A           54850   18          K
A           61793   12          E
B           61910   120         M
B           61917   120         M
B           61961   130         M
C           59891   266         J
D           38190   650         D
E           39682   30          B
F           38785   30          A
G           59183               F
G           61142   48          F
H           61425   28          L
I           5127    24          H
I           5127    24          O
J           5127    24          H
J           5127    24          O
K           61449               N
L           27668   387         C
M           41818   17          I
N           61793   12          E
O           44093   188         G
P           5127    24          H
P           5127    24          O
Q           38190   650         D
R           27668   387         C

The rows are duplicated by different values in Condition and Company. CT_ID is the identifier for the record.

I want to sum up my Enrollment column, but I will be overcounting. So I thought I could create a custom expression like :

Sum(Avg([Enrollment]) OVER ([CT_ID]))

But this is somehow not a valid expression. Where am I going wrong?

For my bar chart, I'd like to have the Condition as the category axis and the Enrollment as the value axis. The below table shows how the Enrollment value should be calculated. Focus on Conditions P, I, and J.

Condition  Enrollment
A          30
B          370
C          266
D          650
E          30
F          30
G          48
H          28
I          24
J          24
K   
L          387
M          17
N          12
O          188
P          24
Q          650
R          387

My current solution uses a rank function and only puts the enrollment info in the first row for each unique CT_ID, but that is just plain wrong when I start filtering data. For instance, imagine in my dataset above I only had an Enrollment value for the first row of CT_ID 5127. If I filter out Condition "I" (the one in the first row), now the Enrollment value is zero!

Any help you can provide would be greatly appreciated! I'm no expert on OVER expressions, so hopefully there is an easy solution!

1
Nested aggregations aren't allowed in versions prior to 7.0. Are you using something other than 7.0 or 7.5? - S3S

1 Answers

1
votes

This is too long to comment...

Can you provide some expected results? It looks as though each CT_ID can only have one value, so SUM of them distinctively would just be any one of the values... right? If not, what determines a duplicate?

Here is how you can accomplish what you were attempting in your code above--but from your test data isn't just going to give you the value of Enrollment since there each CT_ID seemingly only has one value (duplicated).

Sum([Enrollment]) over ([CT_ID]) / Count() OVER ([CT_ID])

You could also just use First()

First([Enrollment]) OVER ([CT_ID])

EDIT

Since you have some duplicates and some not... let's just split out the duplicates into the average of the number of duplicates there are. Insert this calculated column:

Max([Enrollment]) over (Intersect([CT_ID],[Condition])) / Count([Enrollment]) over (Intersect([CT_ID],[Condition]))

Then use this column in place of Enrollment in what ever expression you want to ignore duplicates, where a duplicate is the same Condition, CT_ID, and Enrollment value.

For example... the rows for Condition = P and CT_ID = 5127 will have 12 instead of 24.