2
votes

The Problem: I send one value into a UART and nulls emerge on the other UART.

--- Details ---

These are both PIC processors (PIC24 and PIC32)

They are both hard wired onto a printed circuit board.

They are communicating, each via one of the UART modules which reside within them.

They are (ostensibly; according to docs) both configured for 115200 bps, 8-N-1

No handshaking, no CTS enabled, no RTS enabled; I'm just putting bytes on the wire and out they go.

(These are short little 4-byte commands and responses which fits pretty neatly)

The PIC32 is going 80 MHz.

The PIC24 has F[cy] = 14745600

i.e., it is going 14.7456 MHz

The PIC24 sends four bytes (a specific command sequence)

When I set a breakpoint at the Interrupt Service Routine for the UART, The PIC32 shows nulls, then I am seeing repeated hits on the (PIC32 code) breakpoint after the first four, and I continue to see nulls (which makes sense since the PIC24 is not sending anything)

i.e., the UART appears to be repeatedly generating interrupts when there is no reason

I did not write the code on the PIC32 side, and I am learning daily how it works.

Then I let the code just run, and I inevitably wind up on a line that says

    52570 1D01_335C 9D01_335C  _general_execption_handler  sdbbp 0x0

When I get there,

  • The cause register holds 0080181C
  • The EPC register holds 9D00F228
  • The SP register holds 9F8FFFA0

This happened like clockwork, so I got suspicious of the __ISR that would not stop. MpLab showed me this...

           432:
           433:                 //*********************************************************//
           434:                 void __ISR(_UART1_VECTOR, ipl5) IntUart1Handler(void)   //MCU communication port
           435:                 {
           9D00F204  415DE800   rdpgpr      sp,sp
           9D00F208  401A7000   mfc0        k0,EPC
           9D00F20C  401B6000   mfc0        k1,Status
           9D00F210  27BDFF88   addiu       sp,sp,-120
           9D00F214  AFBA0074   sw          k0,116(sp)
           9D00F218  AFBB0070   sw          k1,112(sp)
           9D00F21C  7C1B7844   ins         k1,zero,1,15
           9D00F220  377B1400   ori         k1,k1,0x1400
           9D00F224  409B6000   mtc0        k1,Status
           9D00F228  AFBF0064   sw          ra,100(sp) ;<<<-------EPC register always points here
           9D00F22C  AFBE0060   sw          s8,96(sp)
           9D00F230  AFB9005C   sw          t9,92(sp)
           9D00F234  AFB80058   sw          t8,88(sp)
           9D00F238  AFAF0054   sw          t7,84(sp)
           9D00F23C  AFAE0050   sw          t6,80(sp)
           9D00F240  AFAD004C   sw          t5,76(sp)
           9D00F244  AFAC0048   sw          t4,72(sp)
           9D00F248  AFAB0044   sw          t3,68(sp)
           9D00F24C  AFAA0040   sw          t2,64(sp)
           9D00F250  AFA9003C   sw          t1,60(sp)
           9D00F254  AFA80038   sw          t0,56(sp)
           9D00F258  AFA70034   sw          a3,52(sp)
           9D00F25C  AFA60030   sw          a2,48(sp)
           9D00F260  AFA5002C   sw          a1,44(sp)
           9D00F264  AFA40028   sw          a0,40(sp)
           9D00F268  AFA30024   sw          v1,36(sp)
           9D00F26C  AFA20020   sw          v0,32(sp)
           9D00F270  AFA1001C   sw          at,28(sp)
           9D00F274  00001012   mflo        v0
           9D00F278  AFA2006C   sw          v0,108(sp)
           9D00F27C  00001810   mfhi        v1
           9D00F280  AFA30068   sw          v1,104(sp)
           9D00F284  03A0F021   addu        s8,sp,zero

I look a little more closely at the numbers, and I see that at that time, if we add 100 (0x64) to FFA0 (the bottom 16 bits of the SP) we get 0x10004, which I am guessing is 4 too much.

PIC Manual DS61143E-page 50 says that that nomenclature means, SW Store Word Mem[Rs+offset> = Rt and other experts have told me that the cause register is telling me that the EXCCODE bits are 7 which is the code for a bus exception on load or store.

Or, I'm totally guessing here (would love to get some experts' knowledge on this) something is not clearing something and I'm encountering infinite recursion on an int handler.

All of this is starting to make sense.

THE QUESTION

Could someone please suggest the most common reasons for an int like this to be repeatedly hitting me ?

Does anyone see any common relationship between the bogus nuls coming from the UART which could somehow be connected with this endlessly generated int ? Am I even on the right track ?

In your answer, please tell me how to acknowledge the Int from the UART. I know how I do that in the PIC24 (I wrote that code totally, in ASM) but I don't know how this is done in in C on the PIC32. Assembly will be fine. I'll inline it. I'm working with code I didn't write here, and I thank you for your answers

What is the most common reason that the UART (#1, in this case) would be repeatedly generating interrupts ?

4
How does the signal look on an oscilloscope?starblue
I have edited the question to include later evidence from debugging. I believe that the UART is re-interrupting the processor, and at this momenent, I don't know how to set the appropriate bit so that the int will stop. Advice is welcomeUser.1
I suspect that the problem is caused by noise in the TX/RX line as it is not loaded with a level-shifter (eg:MAX232) as it is a direct connection from mcu to mcu. Try stopping the tx from the PIC24 and see whether the null and sp-overflow will still occur.mfc
The idea is not to sent any byte from the PIC24, to see whether the problem in the PIC32 still exist. If it does, it indicates that the problem is at the PIC32 side, problably receiving noises from the tx/rx pcb connection track due to the tri-state hi-impedance of both mcu pins.mfc
Do you mean that by sending 1 byte from the PIC24, this byte(8 bits) is received at the PIC32 properly without any un-wanted side effects like null and stack-overflow ? If yes, you can further test it by adding a time delay between the 4-bytes command to confirm the fault is in the receiving logic of the PIC32.mfc

4 Answers

6
votes

The most common reason an interrupt subroutine is called over and over is that the interrupt request is never acknowledged in the routine. Are you sure you clear the corresponding IRQ bit?

To ease UART debugging you should first connect the UART to a PC and make sure your target can communicate both ways with the PC. With two targets at the same time, you can't determine on which one the problem is apart from inspecting signals with a scope.

6
votes

On many devices an interrupt must be explicitly cleared to prevent the ISR from simply re-entering when complete.

In most cases a UART will have status bits that indicate the source of the interrupt, knowing that might tell you something, but not telling us makes it difficult to help you. You can inspect the UART registers directly in the debugger, however in some devices the act of reading a bit may in fact clear a bit, and that is true in the debugger too, so be aware of that possibility (check the data sheet/user manual).

Some UARTS require their transmitter to be explicitly switched off to stop transmitting nulls, while others are triggered automatically when data is placed in the tx register and stop after the necessary number of bits are shifted out. Again check the data sheet/manual for the part. If the PIC32 code is known to be working, then since this possible error would be with the PIC24 code, it seems to fit. You can check this simply by using an oscilloscope on the Tx line from the PIC24, if it is transmitting, you will see at least start/stop bit transitions (framing). If there is nothing, then the problem is probably at the PIC32 end.

While you have the scope out, you can check that the bit timing is correct and that you are actually transmitting at 115200. It is easy to get the clocking wrong, and that should be your first check. If the baud rate is incorrect, the PIC32 will likely generate framing error interrupts, which if not handled may persist indefinitely.

Another possibility is that after transmission the PIC24 leaves the line in the "break" state, and that the PIC32 UART is generating "line-break" interrupts. That is why it is important to look at the UART status registers to determine the interrupt cause.

As you can see, there are many possibilities; I think I have covered the most likely ones, but more methodical debugging effort and information gathering on your part is required. I hope I have guided you in this too.

1
votes

There were the three root causes which were in place...

  • The interrupt priority level was set at value 6 in the initialization code for UART1
  • The first line of the interrupt service routine was coded with an interrupt priority level of 5
  • The first three bytes of UART data were disappearing from the data stream (this is still unsolved)

Here's the not-so-obvious way they were causing the problem

  • First three bytes never appeared
  • Fourth byte did appear
  • Interrupt hit (as level 6) and invoked __ISR routine
  • __ISR was acting as ipl5 agent
  • First instruction executed (possibly more, I couldn't debug that accurately)
  • As soon as the first instruction finished, the "higher" priority 6 interrupt immediately kicked in
  • This resulted in the same interrupt again
  • The process repeated itself infinitely.
  • In short order, Stack Overflow resulted

The Fix

Make sure these two lines of code agree with each other...

The IPL line in the init code, wrong way then the right way

     //IPC6bits.U1IP=6; //// Wrong !!! Uart 1 IPL should not be 6 !!!
     
     IPC6bits.U1IP=5;   //// Uart 1 IPL = 5  Correct way; matches __ISR

Interrupt Service Routine

     void __ISR(_UART1_VECTOR, ipl5) IntUart1Handler(void)  //// Operating as IPL 5
     :
     :
     :
     :
0
votes

Poor design decision. If both are on same board SPI would have been more feasible and a lot faster.