0
votes

For my thesis, I am trying to use several variables from two types of surveys (the British Election Studies (BES) and the British Social Attitudes Survey (BSA)) and combine them into one dataset. Currently, I have two datasets, one with BES data, which looks like this (in simplified version):

| year | class   | education | gender | age |
| ---- | -----   | --------- | ------ | --- |
| 1992 | working | A-levels  | female | 32  |
| 1992 | middle  | GCSE      | male   | 49  |
| 1997 | lower   | Undergrad | female | 24  |
| 1997 | middle  | GCSE      | male   | 29  |

The BSA data looks like this (again, simplified):

| year | class   | education | gender | age |
| ---- | -----   | --------- | ------ | --- |
| 1992 | middle  | A-levels  | male   | 22  |
| 1993 | working | GCSE      | female | 45  |
| 1994 | upper   | Postgrad  | female | 38  |
| 1994 | middle  | GCSE      | male   | 59  |

Basically, what I am trying to do is combine the two into one dataframe that looks like this:

| year | class   | education | gender | age |
| ---- | -----   | --------- | ------ | --- |
| 1992 | working | A-levels  | female | 32  |
| 1992 | middle  | GCSE      | male   | 49  |
| 1992 | middle  | A-levels  | male   | 22  |
| 1993 | working | GCSE      | female | 45  |
| 1994 | upper   | Postgrad  | female | 38  |
| 1994 | middle  | GCSE      | male   | 59  |
| 1997 | lower   | Undergrad | female | 24  |
| 1997 | middle  | GCSE      | male   | 29  |

I have googled a lot about joins and merging, but I can't figure it out in a way that works correctly. From what I understand, I believe I should join "by" the year variable, but is that correct? And how can I prevent it taking up a lot of memory to perform the computation (the actual datasets are about 30k for the BES and 130k for the BSA)? Is there a solution using either dplyr or data.tables in R?

Any help is much appreciated!!!

1

1 Answers

2
votes

This is not a "merge" (or join) operation, it's just row-concatenation. In R, that's done with rbind (which works for matrix and data.frame using different methods). (For perspective, there's also cbind, which concatenates by columns. Not applicable here.)

base R

rbind(BES, BSA)
#   year   class education gender age
# 1 1992 working  A-levels female  32
# 2 1992  middle      GCSE   male  49
# 3 1997   lower Undergrad female  24
# 4 1997  middle      GCSE   male  29
# 5 1992  middle  A-levels   male  22
# 6 1993 working      GCSE female  45
# 7 1994   upper  Postgrad female  38
# 8 1994  middle      GCSE   male  59

other dialects

dplyr::bind_rows(BES, BSA)
data.table::rbindlist(list(BES, BSA))