Bits Are Scrambled

Question

The Problem: I send one value into a UART and nulls emerge on the other UART.

--- Details ---

These are both PIC processors (PIC24 and PIC32)

They are both hard wired onto a printed circuit board.

They are communicating, each via one of the UART modules which reside within them.

They are (ostensibly; according to docs) both configured for 115200 bps, 8-N-1

No handshaking, no CTS enabled, no RTS enabled; I'm just putting bytes on the wire and out they go.

(These are short little 4-byte commands and responses which fits pretty neatly)

The PIC32 is going 80 MHz.

The PIC24 has F[cy] = 14745600

i.e., it is going 14.7456 MHz

The PIC24 sends four bytes (a specific command sequence)

When I set a breakpoint at the Interrupt Service Routine for the UART, The PIC32 shows nulls, then I am seeing repeated hits on the (PIC32 code) breakpoint after the first four, and I continue to see nulls (which makes sense since the PIC24 is not sending anything)

i.e., the UART appears to be repeatedly generating interrupts when there is no reason

I did not write the code on the PIC32 side, and I am learning daily how it works.

Then I let the code just run, and I inevitably wind up on a line that says

    52570 1D01_335C 9D01_335C  _general_execption_handler  sdbbp 0x0

When I get there,

The cause register holds 0080181C
The EPC register holds 9D00F228
The SP register holds 9F8FFFA0

This happened like clockwork, so I got suspicious of the __ISR that would not stop. MpLab showed me this...

           432:
           433:                 //*********************************************************//
           434:                 void __ISR(_UART1_VECTOR, ipl5) IntUart1Handler(void)   //MCU communication port
           435:                 {
           9D00F204  415DE800   rdpgpr      sp,sp
           9D00F208  401A7000   mfc0        k0,EPC
           9D00F20C  401B6000   mfc0        k1,Status
           9D00F210  27BDFF88   addiu       sp,sp,-120
           9D00F214  AFBA0074   sw          k0,116(sp)
           9D00F218  AFBB0070   sw          k1,112(sp)
           9D00F21C  7C1B7844   ins         k1,zero,1,15
           9D00F220  377B1400   ori         k1,k1,0x1400
           9D00F224  409B6000   mtc0        k1,Status
           9D00F228  AFBF0064   sw          ra,100(sp) ;<<<-------EPC register always points here
           9D00F22C  AFBE0060   sw          s8,96(sp)
           9D00F230  AFB9005C   sw          t9,92(sp)
           9D00F234  AFB80058   sw          t8,88(sp)
           9D00F238  AFAF0054   sw          t7,84(sp)
           9D00F23C  AFAE0050   sw          t6,80(sp)
           9D00F240  AFAD004C   sw          t5,76(sp)
           9D00F244  AFAC0048   sw          t4,72(sp)
           9D00F248  AFAB0044   sw          t3,68(sp)
           9D00F24C  AFAA0040   sw          t2,64(sp)
           9D00F250  AFA9003C   sw          t1,60(sp)
           9D00F254  AFA80038   sw          t0,56(sp)
           9D00F258  AFA70034   sw          a3,52(sp)
           9D00F25C  AFA60030   sw          a2,48(sp)
           9D00F260  AFA5002C   sw          a1,44(sp)
           9D00F264  AFA40028   sw          a0,40(sp)
           9D00F268  AFA30024   sw          v1,36(sp)
           9D00F26C  AFA20020   sw          v0,32(sp)
           9D00F270  AFA1001C   sw          at,28(sp)
           9D00F274  00001012   mflo        v0
           9D00F278  AFA2006C   sw          v0,108(sp)
           9D00F27C  00001810   mfhi        v1
           9D00F280  AFA30068   sw          v1,104(sp)
           9D00F284  03A0F021   addu        s8,sp,zero

I look a little more closely at the numbers, and I see that at that time, if we add 100 (0x64) to FFA0 (the bottom 16 bits of the SP) we get 0x10004, which I am guessing is 4 too much.

PIC Manual DS61143E-page 50 says that that nomenclature means, SW Store Word Mem[Rs+offset> = Rt and other experts have told me that the cause register is telling me that the EXCCODE bits are 7 which is the code for a bus exception on load or store.

Or, I'm totally guessing here (would love to get some experts' knowledge on this) something is not clearing something and I'm encountering infinite recursion on an int handler.

All of this is starting to make sense.

THE QUESTION

Could someone please suggest the most common reasons for an int like this to be repeatedly hitting me ?

Does anyone see any common relationship between the bogus nuls coming from the UART which could somehow be connected with this endlessly generated int ? Am I even on the right track ?

In your answer, please tell me how to acknowledge the Int from the UART. I know how I do that in the PIC24 (I wrote that code totally, in ASM) but I don't know how this is done in in C on the PIC32. Assembly will be fine. I'll inline it. I'm working with code I didn't write here, and I thank you for your answers

What is the most common reason that the UART (#1, in this case) would be repeatedly generating interrupts ?

I have edited the question to include later evidence from debugging. I believe that the UART is re-interrupting the processor, and at this momenent, I don't know how to set the appropriate bit so that the int will stop. Advice is welcome — User.1
I suspect that the problem is caused by noise in the TX/RX line as it is not loaded with a level-shifter (eg:MAX232) as it is a direct connection from mcu to mcu. Try stopping the tx from the PIC24 and see whether the null and sp-overflow will still occur. — mfc
The idea is not to sent any byte from the PIC24, to see whether the problem in the PIC32 still exist. If it does, it indicates that the problem is at the PIC32 side, problably receiving noises from the tx/rx pcb connection track due to the tri-state hi-impedance of both mcu pins. — mfc
Do you mean that by sending 1 byte from the PIC24, this byte(8 bits) is received at the PIC32 properly without any un-wanted side effects like null and stack-overflow ? If yes, you can further test it by adding a time delay between the 4-bytes command to confirm the fault is in the receiving logic of the PIC32. — mfc

greydet greydet · Accepted Answer · 2013-02-22T22:15:55

The most common reason an interrupt subroutine is called over and over is that the interrupt request is never acknowledged in the routine. Are you sure you clear the corresponding IRQ bit?

To ease UART debugging you should first connect the UART to a PC and make sure your target can communicate both ways with the PC. With two targets at the same time, you can't determine on which one the problem is apart from inspecting signals with a scope.

Bits Are Scrambled

THE QUESTION

4 Answers

The Fix