Sorting multiple data and removing duplicates in the first column in R

Question

Here is a part of the sample but considering I have more columns in the output.

df <- tribble(
    ~Name1, ~id1,   ~place1,
    "john, Saboor", 12, 1,
    "Rose,Keely",   13, 2,
    "Kate, Kaomeri",    17, 1,
    "Ellis,Jorey",  19, 2,
    "john, Saboor", 20, 3,
    "john, Saboor", 23, 5,
    "Rose,Keely",   18, 4,
    "Harry,komi",   33, 1,
    "Harry,komi",   32, 4,
    "john, Saboor", 20, 3
    
)

The logic is that in Name1, we have several people, but these people have a different id1 and a different place1. To get the output of interest, assuming "john, Saboor", he repeated 4 times in Name1. I want to keep him for the first time in the column1 and then remove all his information in the Name1.

Next, I want to consider it as Name2, Name3, Name 4, Name5.... and keeping his information in the next columns. As you see he is repeated 4 times, then you see his name in Name1, Name2, Name3 and Name4, id4 and place4. Below, please see the output of interest.

I would like to see NA as blank

Name1   id1 place1  Name2   id2 place2  Name3   id3 place3  Name4   id4 place4
john, Saboor    12  1   NA  NA  NA  NA  NA  NA  NA  NA  NA
Rose,Keely  13  2   NA  NA  NA  NA  NA  NA  NA  NA  NA
Kate, Kaomeri   17  1   NA  NA  NA  NA  NA  NA  NA  NA  NA
Ellis,Jorey 19  2   NA  NA  NA  NA  NA  NA  NA  NA  NA
NA  NA  NA  john, Saboor    20  3   NA  NA  NA  NA  NA  NA
NA  NA  NA  NA  NA  NA  john, Saboor    23  5   NA  NA  NA
NA  NA  NA  Rose,Keely  18  4   NA  NA  NA  NA  NA  NA
Harry,komi  33  1   NA  NA  NA  NA  NA  NA  NA  NA  NA
NA  NA  NA  Harry,komi  32  4   NA  NA  NA  NA  NA  NA
NA  NA  NA  NA  NA  NA  NA  NA  NA  john, Saboor    20  3

Ronak Shah Ronak Shah · Accepted Answer · 2020-10-24T12:00:24

Using dplyr and tidyr libraries, you can do :

library(dplyr)
library(tidyr)

df %>%
  #Change the columns to character
  mutate(across(.fns = as.character), 
  #Create a row number column
         row = row_number()) %>%
  #Get data in long format
  pivot_longer(cols = -row) %>%
  #Remove number from each of the column name
  mutate(name = gsub('\\d+', '', name),
  #Create a unique group number for each unique name
         grp = match(value, unique(value[name == 'Name']))) %>%
  #Fill `NA` values by previous non-NA value
  fill(grp) %>%
  #For each grp and column name
  group_by(grp, name) %>%
  #Create an incremental column number
  mutate(name = paste0(name, row_number())) %>%
  #Get data in wide format
  pivot_wider(values_fill = '') %>%
  ungroup %>%
  #Remove unwanted columns
  select(-row, -grp) %>%
  data.frame()

Sorting multiple data and removing duplicates in the first column in R

1 Answers