According to several Intel documents, I understand that a core on Xeon Phi can issue up to 2 instructions per cycle. One on U-pipe and one on V-pipe. The following documentation states that the the front-end switches among multiple contexts in a round-robin fashion. Are these 2 instructions coming from the same context? Or, can they come from different contexts? I don't think they can, but I haven't find detailed documentations on this.
Another important thing to know about the front-end of the Intel Xeon Phi coprocessor pipeline is that it does not issue instructions from the same hardware context (hardware thread) for two clock cycles in a row, even if that hardware context is the only one executing. So, in order to achieve the maximum issue rate, at least two hardware contexts must be running. With multiple contexts running, the front-end will switch between them in a round-robin fashion.
Also, assuming we have a vector instruction and a scalar instruction, does the front-end issue the vector one in the U-pipe and the scalar one in the V-pipe to achieve the maximum issue rate? I wonder because the actual use of these two pipelines influences the issue rate given that the V-pipe can only execute a small subset of vector instructions.
The core is a 2-wide processor meaning it can execute two instructions per cycle, one on U-pipe and the other on V-pipe. It also contains an x87 unit to perform floating point instructions when needed.
...
The vector unit communicates with the core and executes vector instructions allocated in the U or V pipeline. The core can execute two instructions per clock, one on U-pipe and another on the V-pipe. The V-pipe executes a subset of the instructions and is governed by instruction pairing rules, which is important to account for in getting optimum processor performance.
Source: https://software.intel.com/en-us/articles/intel-xeon-phi-core-micro-architecture