I am querying the github public dataset on BigQuery. Currently, my best query for what I need looks like the following.
SELECT type, created_at, repository_name FROM [githubarchive:github.timeline]
WHERE
(created_at CONTAINS '2012-')
AND repository_owner="twitter"
ORDER BY created_at, repository_name;
This gives me all the events ("type") from the repository_owner twitter (or any other user) for all the repositories ("repository_name") that this user owns, but in a single column.
However, what I really want is to have all the events ("type") in columns, one column for each repository ("repository_name"), more or less like this:
bootstrap commons twui
WatchEvent PushEvent PushEvent
WatchEvent WatchEvent PushEvent
The timestamp ("created_at") is only relevant as an ordering mechanism. The columns does not need have to be equally long, and the events on a single row does not need to happening at the same time.
I will use this to put the events into the R package TraMineR to do sequence analysis.
How can I accomplish this?