0
votes

I've created a data.frame by populating it with NA values

date_base <- rep(NA, df_length)
x <- rep(NA, df_length)
y <- rep(NA, df_length)

df1 <- data.frame(date_base,x,y)

I then cycle through some data and populate each row individually (hence the need for pre-populating the data.frame. Yes I could use rbind, but I thought this would be easier this way).

The first column is to contain dates which are formatted as 'yyyy-mm-dd'. These are obtained from another data.frame 'in_data' which is pulled from a database. I ensure they are dates using the as.Date function.

in_data$date_base <- as.Date(as.character(in_data$date_base),"%Y-%m-%d")

For each row I simple set:

df1$date_base <- end_date

Where end_date is a value from in_data$date_base. I've checked the data type of end_date and it's Date[1]

However once populated I check df1$date_base and they are numeric representations of the date: 14487, 14517, 14548 instead of 2009-08-31, 2009-09-30, 2009-10-31

If instead of creating the data.frame with NA's I prepopulate with dates like:

date_base <- rep(as.Date(as.character('1970-01-01'),"%Y-%m-%d"), length(unique_dates) * versions_len)

then the resulting dates in df1 maintain the 'yyyy-mm-dd' format in df1.

Why does populating the data frame with NULL values have this effect? Is this a bad practice to pre-populate a data.frame? If it is, what is best practice?

Thanks for your help.

1

1 Answers

1
votes

I think it could be as simple as making the first column a date initially as follows:

date_base <- as.Date(rep(NA, df_length))

When you set it as NA it creates it by default as a logical type rather than a Date. Here is a link to a previous question which describes this in more detail: NA in data.table