I am trying to mark all fruit with a "1" if it is only supplied by one country or a "0" otherwise.
I have two tables of data:
Table 1:
Fruit - Each row has a different fruit in it e.g. Apple, Banana, Peach,etc...
Country - Each row has the fruits main country of supply in 2-digit iso format e.g. US, UK, NO, etc...
SourceUnique - This is the column I want to fill with "1" in rows with fruit that are only supplied by one country and "0" otherwise.
Table 2:
Country - Each row has the suppliers country in 2-digit iso format like the last table.
Supplies - Each row has a list of fruits that the supplier delivers e.g. row 1 is "Apple, Banana", row 2 is "Pineapple, Peach, Pear, Apple", etc...
Both tables are imported from CSV files then my code is as follows:
Table1$SourceUnique=rep(1,length(Table1$Country))
for(i in 1:length(Table1$Country)){
for(k in 1:length(Table2$Country)){
if(grepl(Table1$Fruit[i], Table2$Supplies[k])==TRUE && identical(Table1$Country[i], Table2$Country[k])==FALSE){
Table1$SourceUnique[i]=0
}
}
}
I get no errors but the SourceUnique column does not fill correctly. I get 1's and 0's with some correct and others not. After lots of searching and messing around I have accepted that I have no idea and need help, so any advice or solutions would be fantastic.
Thanks.
Edit for more info:
Some fruits have many suppliers from the same country and Table2$Supplies is messy with other words in it annoyingly.
Example data:
Table1$Country <- c("UK","US","NO")
Table1$Fruit <- c("Apple","Banana","Pear")
Table2$Country <- c("UK","US","UK")
Table2$Supplies <- c("Apple,Pear","Banana,Pear","Banana and Apple")
Edit Again:
grepl and identical work in my code when I run them separately with numbers. I can't understand why they do not work in my loops... In theory my code loops through "Supplies", searches the two criteria and returns a 0 when both criteria are satisfied. It then moves on to the next i ("fruit") and repeats. Maybe the && is my problem? it seems correct from my knowledge.
An Excel solution would also work for my purpose but I am not experienced enough with Excel to know where to start with that.
Supplies
? In your example data this could be done by splitting at the occurrence of the regex"\\s*(and|,)\\s*"
. – Mikko Marttila