1
votes

I am attempting to retrieve a list of organizations from our database based on the organizations' city and state, grouped by an attribute org_name. These organizations have multiple nested arrays as other attributes. I have to unnest the LOCATIONS aggregated array in order to query the proper locations. Since I am grouping by organization name, I aggregate the remaining desired attributes with ARRAY_CONCAT_AGG for every array and then run a user defined function called dedup to remove any repeated values since all desired attributes must be grouped or aggregated. This works perfectly on a small table, but as I scale up I receive the following error.

Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 116% of limit. Top memory consumer(s): aggregate functions and GROUP BY clauses: 95% other/unattributed: 5%.

The code is as follows:

CREATE TEMP FUNCTION
  dedup(val ANY TYPE) AS ((
    SELECT
      ARRAY_AGG(t)
    FROM (
      SELECT
        DISTINCT *
      FROM
        UNNEST(val) v) t ));

SELECT
  org_name,
  dedup(ARRAY_CONCAT_AGG(sizes)) AS sizes,
  dedup(ARRAY_CONCAT_AGG(org_nums)) AS org_nums,
  dedup(ARRAY_CONCAT_AGG(location)) AS location,
  dedup(ARRAY_CONCAT_AGG(types_of_professionals)) AS types_of_professionals,
  dedup(ARRAY_CONCAT_AGG(specialties)) AS specialties
FROM
  `merging_npi_medicare.practices`,
  UNNEST(location) AS loc
WHERE
  loc.professional_city = "HARRISBURG"
  AND loc.professional_state = "PA"
GROUP BY
  org_name

My expected results would be a table of unique organizations with an array of sizes, phone numbers, locations, types of professionals, and specialties. However, I receive the memory exceeded error when I run this code. I understand that the mistake must be with my many aggregated arrays and UDF function calls but I simply do not know how else to achieve what I would like.

The following link demonstrates how I would like the results to look. It was achieved by running the same query as above just with an extra parameter that only looked for organizations with names that included PINNACLE.

https://i.stack.imgur.com/8nJpb.png

Any and all help would be appreciated!

1
Sample data would really help.Gordon Linoff

1 Answers

1
votes

I have to speculate a bit on what the data looks like. But does rephrasing it like this fix the problem?

select org_name,
       (select array_agg(s.size)
        from `merging_npi_medicare.practices` p cross join
             unnest(sizes) s
        where p.org_name = o.org_name
       ) as sizes
from (select distinct org_name
      from `merging_npi_medicare.practices` cross join
           unnest(location) AS loc
      where loc.professional_city = 'HARRISBURG' and
            loc.professional_state = 'PA'
     ) o;