For my thesis, I am trying to use several variables from two types of surveys (the British Election Studies (BES) and the British Social Attitudes Survey (BSA)) and combine them into one dataset. Currently, I have two datasets, one with BES data, which looks like this (in simplified version):
| year | class | education | gender | age |
| ---- | ----- | --------- | ------ | --- |
| 1992 | working | A-levels | female | 32 |
| 1992 | middle | GCSE | male | 49 |
| 1997 | lower | Undergrad | female | 24 |
| 1997 | middle | GCSE | male | 29 |
The BSA data looks like this (again, simplified):
| year | class | education | gender | age |
| ---- | ----- | --------- | ------ | --- |
| 1992 | middle | A-levels | male | 22 |
| 1993 | working | GCSE | female | 45 |
| 1994 | upper | Postgrad | female | 38 |
| 1994 | middle | GCSE | male | 59 |
Basically, what I am trying to do is combine the two into one dataframe that looks like this:
| year | class | education | gender | age |
| ---- | ----- | --------- | ------ | --- |
| 1992 | working | A-levels | female | 32 |
| 1992 | middle | GCSE | male | 49 |
| 1992 | middle | A-levels | male | 22 |
| 1993 | working | GCSE | female | 45 |
| 1994 | upper | Postgrad | female | 38 |
| 1994 | middle | GCSE | male | 59 |
| 1997 | lower | Undergrad | female | 24 |
| 1997 | middle | GCSE | male | 29 |
I have googled a lot about joins and merging, but I can't figure it out in a way that works correctly. From what I understand, I believe I should join "by" the year variable, but is that correct? And how can I prevent it taking up a lot of memory to perform the computation (the actual datasets are about 30k for the BES and 130k for the BSA)? Is there a solution using either dplyr or data.tables in R?
Any help is much appreciated!!!