5
votes

I have a table that looks like this:

enter image description here

I am looking for a table that gives a frequency count of the elements in the fields l_0, l_1, l_2, l_3.

For example the output should look like this:

| author_id  | year | l_o.name         | l_0.count| l1.name    | l1.count | l2.name             | l2.count| l3.name            | l3.count|
| 2164089123 | 1987 | biology          | 3        | botany     | 3        |                     |         |                    |         |
| 2595831531 | 1987 | computer science | 2        | simulation | 2        | computer simulation | 2       | mathematical model | 2       |

Edit:

In some cases the array field might have more than one type of element. For example l_0 could be ['biology', 'biology', 'geometry', 'geometry']. In that case the output for fields l_0, l_1, l_2, l_3 would be a nested repeated field with all the elements in l_0.name and all the corresponding counts in the l_0.count.

1
it is still not clear what exactly the content of the each repeated field! is it always the same element per field or it can be multiple? and if multiple - how result should look like?Mikhail Berlyant
It is not always the same element per field. For multiple elements the count of the objects should be a repeated nested object with element name and the element count.Syed Arefinul Haque

1 Answers

8
votes

This should work, assuming you want to count on a per-array basis:

SELECT
  author_id,
  year,
  (SELECT AS STRUCT ANY_VALUE(l_0) AS name, COUNT(*) AS count
   FROM UNNEST(l_0) AS l_0) AS l_0,
  (SELECT AS STRUCT ANY_VALUE(l_1) AS name, COUNT(*) AS count
   FROM UNNEST(l_1) AS l_1) AS l_1,
  (SELECT AS STRUCT ANY_VALUE(l_2) AS name, COUNT(*) AS count
   FROM UNNEST(l_2) AS l_2) AS l_2,
  (SELECT AS STRUCT ANY_VALUE(l_3) AS name, COUNT(*) AS count
   FROM UNNEST(l_3) AS l_3) AS l_3
FROM YourTable;

To avoid so much repetition, you can make use of a SQL UDF:

CREATE TEMP FUNCTION GetNameAndCount(elements ARRAY<STRING>) AS (
  (SELECT AS STRUCT ANY_VALUE(elem) AS name, COUNT(*) AS count
   FROM UNNEST(elements) AS elem)
);

SELECT
  author_id,
  year,
  GetNameAndCount(l_0) AS l_0,
  GetNameAndCount(l_1) AS l_1,
  GetNameAndCount(l_2) AS l_2,
  GetNameAndCount(l_3) AS l_3
FROM YourTable;

If you potentially need to account for multiple different names within an array, you can have the UDF return an array of them with associated counts instead:

CREATE TEMP FUNCTION GetNamesAndCounts(elements ARRAY<STRING>) AS (
  ARRAY(
    SELECT AS STRUCT elem AS name, COUNT(*) AS count
    FROM UNNEST(elements) AS elem
    GROUP BY elem
    ORDER BY count
  )
);

SELECT
  author_id,
  year,
  GetNamesAndCounts(l_0) AS l_0,
  GetNamesAndCounts(l_1) AS l_1,
  GetNamesAndCounts(l_2) AS l_2,
  GetNamesAndCounts(l_3) AS l_3
FROM YourTable;

Note that if you want to perform counting across rows, however, you'll need to unnest the arrays and perform the GROUP BY at the outer level, but it doesn't look like this is your intention based on the question.