2
votes

I'm working on applicant pipeline data and need to get a count of applicants who made it to each phase of the pipeline/funnel. If an applicant skips a phase, I need to need to count them in the phase anyway. Here's an example of how that data might look for one applicant:

Stage name | Entered on
Application Review | 9/7/2018
Recruiter Screen | 9/10/2018
Phone Interview | blank
Interview | 9/17/2018
Interview 2 | 9/20/2018
Offer | blank

this is what the table looks like:

CREATE TABLE application_stages (
application_id bigint,
stage_id bigint,
entered_on timestamp without time zone,
exited_on timestamp without time zone,
stage_name character varying
);

In this example, I want to count Application Review through Interview 2 (including the skipped/blank Phone Interview phase), but not the Offer. How would I write the above in SQL? (Data is stored in Amazon Redshift. Using SQL workbench to query.)

Also, please let me know if there is anything else I can add to my question to make the issue/solution clearer.

2
Welcome to stackoverflow. You should be providing text data rather than screen shot. Also it would be good if you add scheme of table.Red Boy
Appreciate the guidance! I'll update.WilsonS

2 Answers

0
votes

You can hardcode the stages of the pipeline in event_list table like this:

id | stage_name
1 | first stage 
2 | second stage 
3 | third stage 
4 | fourth stage 

UPD: The deeper is the stage of the funnel, the higher is its ID. This way, you can compare them, i.e. third stage is deeper than second stage because 3>2. Thus, if you need to find people that reached the 2nd stage it includes people that have events with id=2 OR events with id>2, i.e. events deeper in the funnel.

If the second stage is missed and the third stage is recorded for some person you can still count that person as "reached second stage" by joining your event data to this table by stage_name and counting the number of records with id>=2, like

select count(distinct user_id)
from event_data t1
join event_list t2
using (stage_name)
where t2.id>=2

Alternatively, you can left join your event table to event_list and fill the gaps using lag function that returns the value of the previous row (i.e. assigning the timestamp of first stage to the second stage in the case above)

0
votes

Here is the SQL I ended up with. Thanks for the ideas, @AlexYes!

select stage_name,  
application_stages.application_id, entered_on, 
case when entered_on is NULL then lead(entered_on,1) 
ignore nulls
over 
(PARTITION BY application_stages.application_id order by case stage_name 
when 'Application Review' then 1 
when 'Recruiter Screen' then 2 
when 'Phone Interview' then 3
when 'Interview' then 4
when 'Interview 2' then 5
when 'Offer' then 6
when 'Hired' then 7 end) else entered_on end as for_count, exited_on
from application_stages

I realize that the above SQL doesn't give me the counts but I am doing the counts in Tableau. Happy to have the format above in case I need to do other calculations on the new "for_count" field.