Pivot with case and group

Question

I have an inelegant solution to pivoting a SELECT query in MySQL using CASE functions that I have used before, but am unable to see where I might be going wrong as I am not getting the results I would expect. In an attempt to convey my intentions:

I am trying to track the progress of time-stamps for a unique reference that I can easily pull in a SELECT query, but need to pivot them to rows. I have done that with the following query:

SELECT c.ID, d.lead_id,
CASE WHEN s.summary LIKE '%submitted and status set to "Pending FA Approval"%' THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END AS a,
CASE WHEN s.summary LIKE '%("Pending FA Approval" => "Not Approved, Please Revise")%' THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END AS b,
CASE WHEN s.summary LIKE '%("Not Approved, Please Revise" => "Pending FA Approval")%' THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END AS c,
CASE WHEN s.summary LIKE '%("Pending FA Approval" => "Complete, Pending Payment")%' THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END AS d,
CASE WHEN s.summary LIKE '%("Complete, Pending Payment" => "Paid, Complete")%' THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END AS e,
CASE WHEN s.summary LIKE '%("Paid, Complete" => "Processing")%' THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END AS f,
CASE WHEN s.summary LIKE '%("Paid, Complete" => "Processed")%' THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END AS g
FROM lead_detail d
JOIN stream s ON s.object_id = d.lead_id
JOIN chapter c ON c.ID = d.`value`
WHERE d.field_number = 41
AND s.created >= "2017-02-15"
AND s.action = "status-change"
AND s.summary NOT LIKE "%Voided%"
AND c.ID = 549

This returns the following:

 ID     |   lead_id |   a                           |   b       |   c       |   d                           |   e       |   f       |   g
 549    |   14512   |   February 15, 2017 [08:53 PM]|   NULL    |   NULL    |   NULL                        |   NULL    |   NULL    |   NULL
 549    |   14512   |   NULL                        |   NULL    |   NULL    |   February 15, 2017 [08:54 PM]|   NULL    |   NULL    |   NULL

The problem I have is that when I aggregate by appending GROUP BY c.ID to the end of my query, I get:

 ID     |   lead_id |   a                           |   b       |   c       |   d       |   e       |   f       |   g
 549    |   14512   |   February 15, 2017 [08:53 PM]|   NULL    |   NULL    |   NULL    |   NULL    |   NULL    |   NULL

instead of what I was looking for to be:

 ID |   lead_id |   a                           |   b       |   c       |   d                           |   e       |   f       |   g
549 |   14512   |   February 15, 2017 [08:53 PM]|   NULL    |   NULL    |   February 15, 2017 [08:54 PM]|   NULL    |   NULL    |   NULL

This is a relative sample, while other examples I am working with have different values sporadically throughout the CASE statements a-g. I have look at many sources and it seems that GROUP aggregates don't place well with NULL, but I can't find a good specific example or explanation to allow me to get what I need: One row, with the result pulled from the CASE statement in separate columns, where the returned value is NULL (or '') until replaced when a date entry is created. This also needs to be done in a select query without the convenience of just reorganizing the data into better suited relational tables.

Also, if it matters, this example needs to be resolved as explained, but know that it will also be joined by UNION to additional c.ID values.

Any help or guidance would be greatly appreciated, even if it is just to point out I am going about this entirely incorrectly.

tl;dr: you can use subrequests to find values of a, etc; you can try using aggregate functions like max on a, b, ... columns in order to find some non-null date. — yeputons

Tim Biegeleisen Tim Biegeleisen · Accepted Answer · 2017-02-16T06:18:09

Adding GROUP c.ID to your current query alone does not make any logical sense, and would not even run in most databases, because as soon as you specify GROUP BY, you are telling the database to aggregate groups of records. Hence, every column you select must either be the group itself (i.e. it's OK to select c.ID) or must involve some aggregate function, such as MAX() or SUM().

SELECT c.ID,
       d.lead_id,
       MAX(CASE WHEN s.summary LIKE '%submitted and status set to "Pending FA Approval"%'
                THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END) AS a,
       MAX(CASE WHEN s.summary LIKE '%("Pending FA Approval" => "Not Approved, Please Revise")%'
                THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END) AS b,
       MAX(CASE WHEN s.summary LIKE '%("Not Approved, Please Revise" => "Pending FA Approval")%'
                THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END) AS c,
       MAX(CASE WHEN s.summary LIKE '%("Pending FA Approval" => "Complete, Pending Payment")%'
                THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END) AS d,
       MAX(CASE WHEN s.summary LIKE '%("Complete, Pending Payment" => "Paid, Complete")%'
                THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END) AS e,
       MAX(CASE WHEN s.summary LIKE '%("Paid, Complete" => "Processing")%'
                THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END) AS f,
       MAX(CASE WHEN s.summary LIKE '%("Paid, Complete" => "Processed")%'
                THEN DATE_FORMAT(CONVERT_TZ (s.created,'+00:00','-06:00'), '%M %e, %Y [%h:%i %p]') END) AS g
FROM lead_detail d
INNER JOIN stream s
    ON s.object_id = d.lead_id
INNER JOIN chapter c
    ON c.ID = d.`value`
WHERE d.field_number = 41           AND
      s.created >= "2017-02-15"     AND
      s.action = "status-change"    AND
      s.summary NOT LIKE "%Voided%" AND
      c.ID = 549
GROUP BY c.ID

Here is a brief diagram showing how MAX() makes the pivot work

id | col
1  | 2         <-- the max of col when id=1 is 2, because NULLs are ignored
1  | NULL
1  | NULL
2  | NULL
2  | 5         <-- the max of col when id=2 is 5, because NULLs are ignored
2  | NULL

Pivot with case and group

1 Answers