0
votes

I am trying to create a replicable piece of data extraction and I'm having an issue with organising the columns.

I have a large database of aggregates, and there are various checks performed against them.

There are around 500 aggregates, each with their own code, and a series of checks which boils down to A + B - C = 0.

I have a number of vectors, which are aggregategroup1 <- c(aggregate1, aggregate2, aggregate3), repeated nearly 100 times. The way the vectors are set out, means that the above sum would be applied like "aggregate 1 + aggregate 2 - aggregate 3".

The problem is that the aggregate codes are just numbers/letters and so when I am extracting the data, they aren't laying out in the table in the order I want- - namely the order they are in in the vector. It might come out like "Date, Aggregate 2, Aggregate 3, Aggregate 1", so then I can't apply the check consistently.

The code I am using so far the following. SQLconn is a connection from R to a SQL database where the data is housed.

  dplyr::tbl("AGGREGATE_RESULT_LATEST") %>%
  dplyr::select(one_of("AGGREGATE_NAME", "REPORTING_DATE", "ADJUSTED_POSITION")) %>% 
  dplyr::filter(
    AGGREGATE_NAME%in%aggregate1,
    REPORTING_DATE %in%databaseConnectR::as_oracle_date(range2)
  ) %>%
  collect() %>% 
  pivot_wider(names_from = "AGGREGATE_NAME", values_from = "ADJUSTED_POSITION" ) %>%
  mutate("Check" = (aggregate1check[2] + aggregate1check[4] - aggregate1check[3]  )) 

The table extracted up until the "collect" line

Aggregate Name Date Position
Code 1 2021 123
Code 2 2021 123
Code 3 2021 123

And then After the pivot/mutate lines:

Date Code1 Code3 Code 2
2021 123 123 123

I just want to be able to reorder the columns, according to the order that they are present in the original vector, which would be code1, code2, code3, so that the mutate command can work consistently and I don't need to check the columns have come out correctly.

They are not coming out in the "wrong" order consistently either so I can't just adjust for that.

I tried adding the following subset as a pipeline but that didn't work, I also tried the same thing but turned the vector "aggregate1" into a list but that didn't work either.

%>% 
  pivot_wider(names_from = "AGGREGATE_NAME", values_from = "ADJUSTED_POSITION" ) %>%
  subset("REPORTING_DATE", aggregate1) %>% 
  mutate("Check" = (GT_32D_check[2] + GT_32D_check[4] - GT_32D_check[3]  )) 
1

1 Answers

0
votes

I have replaced the subset line with a relocate function.

I created a new vector which is the same as "aggregate1", but with "REPORTING_DATE" included at the start. I was then able to reorder the columns using the following line

 pivot_wider(names_from = "AGGREGATE_NAME", values_from = "ADJUSTED_POSITION" ) %>%
  relocate(aggregate1list) %>% 
  mutate("Check" = (aggregate1[2] + aggregate1[3]) - aggregate1[4] )