0
votes

I need to restructure a nested dataset into a flat one.

My dataset looks like this:

UserID   Test
<p>A &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;X
<p>A &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Y
<p>A &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Z
<p>B &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Y
<p>B &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Z

Each user has completed a different number of tests, and their order is arbitrary. X, Y and Z stand for unified string test names.

I need it to look like this:

UserID &nbsp;X &nbsp;&nbsp; Y &nbsp;&nbsp; Z 
<p>A &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1 &nbsp;&nbsp; 1 &nbsp;&nbsp; 1 &nbsp;&nbsp;
<p>B &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 &nbsp;&nbsp; 1 &nbsp;&nbsp; 1 &nbsp;&nbsp;

1 - completed the test; 0 - did not complete the test.

In other words, I need to perform some kind of tokenization but from multiple columns (not a single string).

I would appreciate any advice on how to accomplish this. Thanks!

I transformed the dataset using UserId as an identifier, but the resulting dataset has as many columns as the largest number of completed tests by any user (which is good), but identical tests are misaligned across users (e.g. the first column for user A contains X because user A completed test X first, but Y for user B, because the user did not complete the test X, or did not complete the tests in the same order).

1
Something is strange with your formatting. Try posting the output of dput instead. I removed the snippets, because those are only for running javascript, but I don't know if putting your data in snippets would have ruined the formattingcamille

1 Answers

0
votes

An option would be

library(dplyr)
library(tidyr)
df1 %>%
  extract(Test, into = c("Test", "colNm"), sep= "(.*);[^;]+$") %>%
  mutate(colNm = str_c(colNm, "&nbsp", sep="_")) %>%
  group_by(UserID) %>%
  mutate(rn = row_number()) %>%      
  spread(colNm, Test)