Section 2.13.2 mentions that the arbitration ID is used to determine which processor issues the no-op cycle first and I have seen this on multiple sources and the intel manual. The intel manual that references the MP initialisation sequence only addresses Pentium 4 when when there was a 'system bus' and before that there was originally an 'APIC bus'. I am under the impression that arbitration ID was only needed in those architectures where multiple cpus shared the same bus. But now, with the ring bus architecture, arbitration is done by sensing an empty slot on the ring bus and placing the transaction on it and it moves round at one stop per cycle meaning arbitration is no longer required.
What's interesting is Section 2.13.2 is part of a document that speaks about Intel ME and the PCH, so it is obviously speaking about Nehalem and recent but to say that the APIC ArbID is used, perhaps it is indeed only talking about Nehalem or Westmere.
So I ask, how is the BSP selected on ring and indeed mesh architectures? My thought was that it could use cache as RAM and if cache coherency does function in no fill mode then they could race for a mutex