Here's what Stata is doing under the hoods (pun intended):
sysuse auto, clear
gen expensive=0
replace expensive=1 if price>=4000
logit expensive i.foreign, coefl
predict phat, pr
/* Change in Pr(Expensive) for a tiny change in foreign */
margins, dydx(foreign) continuous // this is like your second spec
gen double me_foreign = phat*(1-phat)*_b[1.foreign]
sum me_foreign
/* Discrete change in Pr(Expensive) for when foreign goes from all 1 to all 0 */
margins, dydx(foreign)
replace foreign=1
predict phat1, pr
replace foreign=0
predict phat0, pr
gen double fd_foreign = phat1 - phat0
sum fd_foreign
When you omit the i. prefix, Stata calculates the change in probability of being expensive as is there was a tiny change in in foreign. You can mimic that by adding the continuous option to margins, dydx() instead of fitting a second model. Stata calculates the derivative of the predicted probability of being expensive with respect to foreign for every observation and then takes the average. This doesn't quite makes sense, since it doesn't correspond to a sensible manipulation. Foreign is binary, but the derivative gives you the change in probability for a small change in foreign, as if it was continuous. In linear models this difference does not matter, but in non-linear ones it can.
With the prefix i., Stata calculates the finite difference between the predicted probability as if every car was foreign minus the predicted probability as if every car was manufactured domestically, and then takes the average. This is arguably more sensible with a binary variable. On the other hand, the difference here (and in many empirical applications) is not that large, and you often see people do the former instead of the latter.