R - Nested for loops with multiple conditions using grepl and identical

Question

I am trying to mark all fruit with a "1" if it is only supplied by one country or a "0" otherwise.

I have two tables of data:

Table 1:

Fruit - Each row has a different fruit in it e.g. Apple, Banana, Peach,etc...

Country - Each row has the fruits main country of supply in 2-digit iso format e.g. US, UK, NO, etc...

SourceUnique - This is the column I want to fill with "1" in rows with fruit that are only supplied by one country and "0" otherwise.

Table 2:

Country - Each row has the suppliers country in 2-digit iso format like the last table.

Supplies - Each row has a list of fruits that the supplier delivers e.g. row 1 is "Apple, Banana", row 2 is "Pineapple, Peach, Pear, Apple", etc...

Both tables are imported from CSV files then my code is as follows:

Table1$SourceUnique=rep(1,length(Table1$Country))

for(i in 1:length(Table1$Country)){
  for(k in 1:length(Table2$Country)){
    if(grepl(Table1$Fruit[i], Table2$Supplies[k])==TRUE && identical(Table1$Country[i], Table2$Country[k])==FALSE){
      Table1$SourceUnique[i]=0
    }
  }
}

I get no errors but the SourceUnique column does not fill correctly. I get 1's and 0's with some correct and others not. After lots of searching and messing around I have accepted that I have no idea and need help, so any advice or solutions would be fantastic.

Thanks.

Edit for more info:

Some fruits have many suppliers from the same country and Table2$Supplies is messy with other words in it annoyingly.

Example data:

Table1$Country <- c("UK","US","NO")
Table1$Fruit <- c("Apple","Banana","Pear")

Table2$Country <- c("UK","US","UK")
Table2$Supplies <- c("Apple,Pear","Banana,Pear","Banana and Apple")

Edit Again:

grepl and identical work in my code when I run them separately with numbers. I can't understand why they do not work in my loops... In theory my code loops through "Supplies", searches the two criteria and returns a 0 when both criteria are satisfied. It then moves on to the next i ("fruit") and repeats. Maybe the && is my problem? it seems correct from my knowledge.

An Excel solution would also work for my purpose but I am not experienced enough with Excel to know where to start with that.

Hi, welcome to SO. Please consider reading up on How to Ask and how to produce a reproducible example. It makes it easier for others to help you. — Heroka
Sorry about that, I tried creating a small table of data with HTML but it didn't seem to work so I tried to be as descriptive as possible. My data is investors not fruit so I can't share the CSV files... Two tables with 3 random rows would work. — GoBuildAngus
A new column in Table 1 with 3 rows which are 1, 0 and 0. Apple is 1 as it is only supplied by the UK. The other two are 0's as Bananas are supplied by the UK and US and Pears are supplied by Norway mainly but we can see that they are supplied by the US and the UK also. — GoBuildAngus
Is it feasible to extract the relevant values from your real equivalent of Supplies? In your example data this could be done by splitting at the occurrence of the regex "\\s*(and|,)\\s*". — Mikko Marttila

jkt jkt · Accepted Answer · 2016-06-20T12:11:46

Perhaps you can simplify the problem by counting the occurrences of each fruit in table 2 for each country:

for (i in Table1$Fruit){
  as.integer(rowSums(table(grepl(i,Table2$Supplies),Table2$Country))[2]==1)
}

This gives you 1 for those fruits that only occur once per country in table 2 and 0 otherwise.

R - Nested for loops with multiple conditions using grepl and identical

2 Answers