I am working on a project that uses a Microchip PIC24FJ256GA702. For several days I have been looking for an intermittent bug with some UART code. It uses an interrupt service routine (ISR) to transmit a buffer named char txBuff[]
and its length int txLen
.
In the main (i.e. not ISR) code I occasionally access the buffer to add bytes or to see how many bytes are left to transmit. To ensure the main code sees a clean copy of the buffer it has a critical section in the form:
Disable interrupts
Read and write 'txBuff' and its length 'txLen'
Enable interrupt
There are several ways to disable an ISR on this PIC. Including: Use the DISI
instruction, or clear the GIE
bit, or alter interrupt priority levels.
But I have chosen to clear the interrupt to enable bit of the specific interrupt bit in IECx
. In this case, it is IEC1bits.U2TXIE because I am using UART2. I do this because it is easy, it does not disable unrelated interrupts, and it has always worked well in my other projects. So in C, the critical section is:
IEC1bits.U2TXIE = 0;
copyOfTxLen = txLen;
...
IEC1bits.U2TXIE = 1;
And the disassembly listing starts:
BCLR 0x9B, #7 // disable interrupt
MOV txLen, W1 // read txLen
...
The problem: I am now fairly certain that the ISR occasionally gets called during read txLen
such that copyOfTxLen
ends up being the value of txLen
before the ISR was called, but the remainder of the critical section sees variables in the state they are after the ISR is called.
The question: Is my code faulty? If so how can the fault be avoided?
Some background:
- UART is running at a high speed (250 kBaud) with a PIC24 clock of 8 MHz, a byte every 160 instructions, so the ISR is quite busy and this I think increases the chances of hitting a race condition compared to my previous projects.
- I see dsPIC33/PIC24 Family Reference Manual / Interrupts, section 2.3.1 Note 2 says "There is one cycle delay between clearing the GIE bit and the interrupts being disabled." Now this refers to
GIE
, notU2TXIE
and I am not sure whether it would cause a problem like I am seeing because theMOV
instruction is a 2 cycle instruction. But nevertheless, it looks like a cunning way to trap unwary programmers. - If I try to debug it or add any debug code the already very intermittent problem goes away (feels like a race condition). I don't want to just add some
NOP
s without knowing exactly why I should because race conditions always bite you back unless you actually fix them.