2
votes

I am working on a project that uses a Microchip PIC24FJ256GA702. For several days I have been looking for an intermittent bug with some UART code. It uses an interrupt service routine (ISR) to transmit a buffer named char txBuff[] and its length int txLen.

In the main (i.e. not ISR) code I occasionally access the buffer to add bytes or to see how many bytes are left to transmit. To ensure the main code sees a clean copy of the buffer it has a critical section in the form:

Disable interrupts
Read and write 'txBuff' and its length 'txLen'
Enable interrupt

There are several ways to disable an ISR on this PIC. Including: Use the DISI instruction, or clear the GIE bit, or alter interrupt priority levels.

But I have chosen to clear the interrupt to enable bit of the specific interrupt bit in IECx. In this case, it is IEC1bits.U2TXIE because I am using UART2. I do this because it is easy, it does not disable unrelated interrupts, and it has always worked well in my other projects. So in C, the critical section is:

IEC1bits.U2TXIE = 0;
copyOfTxLen = txLen;
...
IEC1bits.U2TXIE = 1;

And the disassembly listing starts:

 BCLR 0x9B, #7 // disable interrupt
 MOV txLen, W1 // read txLen
...

The problem: I am now fairly certain that the ISR occasionally gets called during read txLen such that copyOfTxLen ends up being the value of txLen before the ISR was called, but the remainder of the critical section sees variables in the state they are after the ISR is called.

The question: Is my code faulty? If so how can the fault be avoided?

Some background:

  • UART is running at a high speed (250 kBaud) with a PIC24 clock of 8 MHz, a byte every 160 instructions, so the ISR is quite busy and this I think increases the chances of hitting a race condition compared to my previous projects.
  • I see dsPIC33/PIC24 Family Reference Manual / Interrupts, section 2.3.1 Note 2 says "There is one cycle delay between clearing the GIE bit and the interrupts being disabled." Now this refers to GIE, not U2TXIE and I am not sure whether it would cause a problem like I am seeing because the MOV instruction is a 2 cycle instruction. But nevertheless, it looks like a cunning way to trap unwary programmers.
  • If I try to debug it or add any debug code the already very intermittent problem goes away (feels like a race condition). I don't want to just add some NOPs without knowing exactly why I should because race conditions always bite you back unless you actually fix them.
1

1 Answers

1
votes

The code in the question is faulty. To fix it, wrap expressions intended to disable interrupts such as IEC1bits.U2TXIE = 0 in a macro like this:

__write_to_IEC(IEC1bits.U2TXIE = 0);

XC16 compiler release notes says:

__write_to_IEC(X) - a macro that wraps the expression X with an appropriate number of nop instructions to ensure that the write to the IEC register has taken effect before the program executes. For example, __write_to_IEC(IEC0bits.T1IE = 0); will not progress until the device has disabled the interrupt enable bit for T1 (Timer1).

In p24FJ256GA702.h that is provided with the XC16 compiler, the macro is defined:

#define __write_to_IEC(X) \
   ( (void)(X), \
     __builtin_nop() \
   )

So despite not being mentioned anywhere in the PIC24FJ256GA705 FAMILY datasheet or in the separate Interrupts document, it seems clear that a single NOP is required here on this PIC.

Update: I find exactly the same issue on the dsPIC33EP256MU806 (dsPIC33E family) which requires two NOP instructions.