I'm conducting a household survey with a random sample of 200 villages. Using QGIS, I picked a random point 5-10km from my original villages. I then obtained, from the national statistical office, the village codes for those 200 "neighbor" villages - as well as a buffer of 10 additional neighbor villages. So my total sample is:
200 original villages + 210 neighbor villages = 410 villages, total
We're going to begin fieldwork soon, and I want to give each survey team a map for 1 original village + the nearest neighbor village. Because I'm surveying in some dense urban areas as well, sometimes a neighbor village is actually quite close to more than one original village.
My problem is this: if I run Distance Matrix in QGIS, matching an old village to its nearest neighbor village, I get duplicates in the latter. To get around this, I've matched each old village to the nearest 5 neighbor villages. My main idea/goal is to pick the nearest neighbor that hasn't already been picked.
I end up with a .csv like so:

As you can see, picking the five nearest villages, I'm getting repeats - neighbor village 79 is showing up as nearby to original villages 1, 2, 3, and 4. This is fine, as long as I can assign neighbor village 79 to one (and only one) original village, and then have the rest uniquely match as well.
What I want to do, then, is to uniquely match each original village to one neighbor village. I've tried a bunch of stuff, none of which has worked: My sense is that I need to loop over original village groups, assign a variable (e.g. taken==1) to one of the neighbor villages, and then - somehow - have each instance of that taken==1 apply to all instances of, say, neighbor village 79.
Here's some sample code of what I was thinking. Note: this uniquely matches 163 of my neighbors.
gen taken = 0
so ea distance
by ea: replace taken=1 if _n==1
keep if taken==1
codebook FID ea
This also doesn't work; it just sets taken to 1 for all obs:
foreach i in 5 4 3 2 1 {
by ea: replace taken=1 if _n==`i' & taken==0
}
What I need to do, I think, is loop over both _N and _n, and maybe use an if/else. But I'm not sure how to put it all together.
(Tangentially, is there a better way to loop over decreasing values in Stata? Similar to i-- in other programming languages?)
list, clean noobsand copy that here. - Roberto Ferrer