I am trying to prepare my variables to use the data in a regression analysis. I get an error when I create the following data table. I need to prepare the data to display the number of times a member participates in the debate (n_Edu) per year and include the other relevant variables alongside. All of the variables seem to be fine, except for the days_in_house one. Here is my code.
library(data.table)
df1 <- data.table(df1)
mp_by_year <- df1[,list(n_parent_Edu = sum(parent_Edu), isFemale = unique(isFemale), party = unique(party), days_in_house = unique(days_in_house)), by = list(member_id, year)]
When I run this code without the day_in_house variable (ie just with the isFemale, parent_Edu, member_id, year and party variables) it works fine and produces a new data frame. However when I add this variable, it gives me the below error. The variable looks like this:
days_in_house
1647
6383
463
3528
462
3639
16
1738
16
187
3732
...and so on. I get the following error when I add in this variable to the data table:
"Supplied 2 items for column 3 of group 242 which has 5 rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code."
My other variables appear as follows:
isFemale
0
1
0
0
0
0
1
party
Conervative
Labour
Liberal Democrats
Conservative
Conervative
Labour
membership_id
463
283
352
287
27
372
year
1997
1997
1997
1997
1997
df1[,uniqueN(isFemale),list(member_id,year)][N != <expected_value]
– smingerson