0
votes

I am new to R and i've been stuck on this. I have a data set below wherein I created a new array list variable called 'amountOfTxn_array' that contains three numeric values in sequential order. These are amounts of transactions taken from Jan to Mar. My objective is to create new variables from this array list that iterate each data elements in the 'amountOfTxn_array'.

> head(myData_05_Array)
Index   accountID   amountOfTxn_array
1:00    8887    c(36.44, 75.00,185.24)
2:00    13462   c(639.45,656.10,237.00)
3:00    47249   c(0,  24, 2012)
4:00    49528   c(1189.20,2326.26,1695.89)
5:00    57201   c(24.67, 0.00, 0.00)
6:00    57206   c(0.00, 661.98,2957.68)

str(myData_05_Array) Classes ‘data.table’ and 'data.frame': 3176 obs. of 4 variables: $ accountID : int 8887 13462 47249 49528 57201 57206 58522 79073 80465 81032 ... $ amountOfTxn_200501: num 36.4 639.5 0 1189.2 24.7 ... $ amountOfTxn_200502: num 75 656 24 2326 0 ... $ amountOfTxn_200503: num 185 237 2012 1696 0 ... $ amountOfTxn_array :List of 3176

Also, an example code for creating a new variable is provided below wherein I would like to tag 1 if a value in the array is greater than 100 and 0 else. When I ran the example code, I am getting "Error: (list) object cannot be coerced to type ‘double’ error. May I ask for a solution for this. I would highly appreciate any response. Thanks!

> for(i in 1:3)
+ {  
+   if(myData_05_Array$amountOfTxn_array[i] > 100){
+     myData_05_Array$testArray[i] <- 1
+   }
+   else{
+     myData_05_Array$testArray[i] <- 0
+   }
+ }

Error: (list) object cannot be coerced to type 'double'

What I am expecting as the output is as follows: amountOfTxn_testArray c(0, 0, 1) c(1, 1, 1) c(0, 0, 0) c(1, 1, 1) c(0, 0, 0) c(0, 1, 1)

1
I think there is a problem, because with myData_05_Array$amountOfTxn_array[i] you take the i 'th row/entry of amoutOfTxn_array which is a vector and in this situation, you compare a vector with 100 which throws an error. Maybe you split this column up to 3 and compare every elementwolf_wue
can you show the output of str(myData_05_Array)?Janna Maas
also, in general, you can use any() to check a condition within, e.g., a vector: any(c(1:10) >5) so you don't have to use a loop.Janna Maas
@wolf_wue Actually, the amoutOfTxn_array was created from the 3 original columns representing Jan, Feb and Mar. Actually, I have 24 columns in my dataset that represent monthly data for two years. Doing calculations for 24 columns is quite cumbersome that's why I am exploring other options like combining these 24 columns into one column as numeric list. Here, I am exploring only 3 monthly data. I have tried getting the diff between 3rd and 2nd position by index value and it is working fine. Next thing I would like to do is do some logical operations of each of the data elements.DataRockStarian
Hi @JannaMaas please see above print out of str(myData_05_Array).DataRockStarian

1 Answers

0
votes

"Doing calculations for 24 columns is quite cumbersome"

a HA! welcome to the dplyr world:

library(dplyr)
#generate dummy data
dummyDf <-read.table(text='Index   accountID   Jan Feb March
1:00    8887    36.44 75.00 185.24
2:00    13462   639.45 656.10 237.00
3:00    47249   0 24 2012
4:00    49528   1189.20 2326.26 1695.89
5:00    57201   24.67 0.00 0.00
6:00    57206   0.00 661.98 2957.68', header=TRUE, stringsAsFactors=FALSE) 

mutate column by column index

#the dot (.) argument refers to the focal column

df %>% mutate_at(3:5, funs(as.numeric(.>100)))

mutate columns by predefined names

changeVars =c("Jan","Feb","March")
df %>% mutate_at(.cols=changeVars, funs(as.numeric(.>100)))

mutate columns if some condition is met

df %>%mutate_if(is.double,  funs(as.numeric(.>100)))

output:

 Index accountID Jan Feb March
1  1:00      8887   0   0     1
2  2:00     13462   1   1     1
3  3:00     47249   0   0     1
4  4:00     49528   1   1     1
5  5:00     57201   0   0     0
6  6:00     57206   0   1     1