0
votes

I am attempting to get the average of the lower third of the data that has wages sorted by wages ascending. I have tried to just return the TOPN using a FILTER to not include BLANK() wage values. I then need to just select the column I care about in the AVERAGE calculation. So I wrote something like the following where [withSalaryJobCount] is a calculated measure that is only the count of rows that have a non-BLANK annualSalary column:

entryWages:= AVERAGE(
        SELECTCOLUMNS(
            CALCULATE(
                TOPN(
                    [withSalaryJobCount]- [withSalaryJobCount]/3,
                    'table',
                    'table'[salaryAnnual],
                    ASC
                    ),
                FILTER(table, [salaryAnnual] <> BLANK())
            ),[entryWages]
            "bottomThird",
            [salaryAnnual]
        )
    )

This is failing with an error that:

The AVERAGE function only accepts a column reference as an argument

Original Question: I have a set of SQL calculations that give me the percentile wages as well as what we term the entry and experience level wages. The list of wages are entered into a table sorted by their value with an IDENTITY column. A much simplified query to insert and calculate the percentile, entry, and experienced wages is listed below:

CREATE TABLE #t1 (
    id int identity,
    salaryannual decimal(18,2)
    )
INSERT INTO #t1
SELECT salaryannual
FROM table a
ORDER BY salaryannual

SELECT  
    (SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=minID AND ID<=minID+(ct/3)) entryLevelSalary,  
    (SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=maxID-(ct/3) AND ID<=maxID) experiencedSalary,
    (select AVG(CAST(salaryannual AS BIGINT)) from #t1 where ID = minID + (ct/2+1)/2 or ID = minID + (ct/2+1)/2 + (ct/2+1)%2) q1, 
    (select AVG(CAST(salaryannual AS BIGINT)) from #t1 where ID = minID + (ct+1)/2 or ID = minID + (ct+1)/2  + (ct+1)%2 ) median, 
    (select AVG(CAST(salaryannual AS BIGINT)) from #t1 where ID = minID + ct+1 - ((ct/2+1)/2 + (ct/2+1)%2) or ID = minID + ct+1 -((ct/2+1)/2) ) q3,
    (SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=minID AND ID<=maxID) avgSal
FROM
(
    SELECT COUNT(*) ct, MIN(ID) minID, MAX(ID) maxID
    FROM #t1
) uniqueIDs

Converting the percentile calculation is of the form:

pct25Wages:= Calculate(PERCENTILE.INC('table'[salaryAnnual], .25), FILTER([withSalaryCount] > 6))

The FILTER is used because we have a minimum requirement that there be at least 7 entries with a salary going forward.

My question is how to convert the entry/experience into a DAX/measure query?

(SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=minID AND ID<=minID+(ct/3)) entryLevelSalary,  
(SELECT AVG(CAST(salaryannual AS BIGINT)) FROM #t1 WHERE ID>=maxID-(ct/3) AND ID<=maxID) experiencedSalary,

I have tried using a STDDEV and AVG wage calculation like below but it does not give expected results and looking at it I can see it would not work as I expected anyway:

entryWages:= [avgWages] + 3 * [StdDevWage]
1

1 Answers

0
votes

So, after much banging my head against a wall this is how I accomplished this. First I needed to rank my rows in the set by the [salaryAnnual] column. But, since many entries can have the same salary I also tweaked the calculation a bit by using the uniqueID assigned to the row:

RANKX(
    'TABLE',
    'TABLE'[salaryAnnual] + ('TABLE'[ID] / 1000000000),
    ,
    ASC
    )

Then I used this value to give me the bottom third of rows with salary (measure [withSalaryJobCount]):

TOPN(
    1 + ( [withSalaryJobCount]/3),
    'TABLE',
    RANKX(
        'TABLE',
        'TABLE'[salaryAnnual] + ('TABLE'[ID] / 1000000000),
        ,
        ASC
        ),
    ASC
    )

Finally I needed to only get the [salaryAnnual] column that was not null and only pull out the [salaryAnnual] column from the calculated table to do the average of:

AVERAGEX(
        SELECTCOLUMNS(
            CALCULATETABLE(
                TOPN(
                    1 + ( [withSalaryJobCount]/3),
                    'TABLE',
                    RANKX(
                        'TABLE',
                        'TABLE'[salaryAnnual] + ('TABLE'[ID] / 1000000000),
                        ,
                        ASC
                        ),
                    ASC
                    ),
                FILTER(TABLE, [salaryAnnual] <> BLANK())
            ),
            "bottomThird",
            'TABLE'[salaryAnnual]
        ), [bottomThird]
    )