14
votes

I have a USB device that outputs data of size of one byte, and I want to pass these bytes to FPGA component that exists on AXI bridge, FPGA and CPU are on the same chip... it's SoC FPGA Altera Cyclone V. CPU is ARM Cortex-A9. Kernel version 3.7.0.

There is a software that reads from the USB device and writes to a dump file... it works just fine. I tried to use mmap() to map the FPGA address to the virtual space and write to it from the userspace. When doing so... after say a minute, the kernel seem to crash.

I wrote a driver for my FPGA component and I passed the driver path to that software as a file, so that it writes to it, and eventually to my FPGA component, but the same result... kernel crashes again after a random time.

I also wrote a simple program that reads bytes from a local file and pass it to FPGA... this works fine either ways (using mmap() or driver module), the file passes through to the FPGA with no problems at all no matter how big is the file.

So the problem is when passing from USB device to FPGA, either using mmap() or a driver module.

Here is a sample crash message:

  Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
  Modules linked in: ipv6
  CPU: 1    Not tainted  (3.7.0 #106)
  PC is at scheduler_ipi+0x8/0x4c
  LR is at handle_IPI+0x10c/0x19c
  pc : [<800521a0>]    lr : [<800140d4>]    psr: 80000193
  sp : bf87ff58  ip : 8056acc8  fp : 00000000
  r10: 00000000  r9 : 413fc090  r8 : 00000001
  r7 : 00000000  r6 : bf87e000  r5 : 80535018  r4 : 8053eec0
  r3 : 8056ac80  r2 : bf87ff58  r1 : 00000482  r0 : 00000481
  Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
  Control: 10c5387d  Table: 3f0c404a  DAC: 00000015
  Process swapper/1 (pid: 0, stack limit = 0xbf87e240)
  Stack: (0xbf87ff58 to 0xbf880000)
  ff40:                                                       00000000 800140d4
  ff60: fffec10c 8053e418 bf87ff90 fffec100 8000f6e0 8000851c 8000f708 8000f70c
  ff80: 60000013 ffffffff bf87ffc4 8000e180 00000000 00000000 00000001 00000000
  ffa0: bf87e000 80565688 803ddfb0 80541fc8 8000f6e0 413fc090 00000000 00000000
  ffc0: 8053e9b8 bf87ffd8 8000f708 8000f70c 60000013 ffffffff 00000020 8000f894
  ffe0: 3f86c06a 00000015 10c0387d 805658d8 0000406a 003d1ee8 31ca2085 5c1021c3
  Code: eaffffad 80564700 e92d4800 e1a0200d (4c4c9b50)
  ---[ end trace 9e492cde975c41f9 ]---

Other crash messages start like:

  Unable to handle kernel paging request at virtual address 2a7a4390
  Internal error: Oops - bad syscall: ebcffb [#1] SMP ARM
  pgd = bf318000
  [2a7a4390] *pgd=00000000

And:

 Internal error: Oops - undefined instruction: 0 [#2] SMP ARM
 Modules linked in: ipv6
 CPU: 1    Tainted: G      D       (3.7.0 #106)

Here is the full crash messages.

I noticed that all the crash messages I get intersect with the PC and LR locations, but actually I don't have previous experience with Linux kernel. I found similar error messages online but none of the proposed solutions worked for me.

Source Code:

This is function is called whenever a new buffer of bytes arrives from USB:

static void rtlsdr_callback(unsigned char *buf, uint32_t len, void *ctx)
{
    if (ctx) {
        if (do_exit)
            return;

        if ((bytes_to_read > 0) && (bytes_to_read < len)) {
            len = bytes_to_read;
            do_exit = 1;
            rtlsdr_cancel_async(dev);
        }

/*      if (fwrite(buf, 1, len, (FILE*)ctx) != len) {
            fprintf(stderr, "Short write, samples lost, exiting!\n");
            rtlsdr_cancel_async(dev);
        }
*/
        if (fm_receiver_addr == NULL)
        {
            virtual_base = mmap(NULL, HPS2FPGA_SPAN, PROT_WRITE, MAP_PRIVATE, fd, HPS2FPGA_BASE);
            if (virtual_base == MAP_FAILED)
            {
                perror("mmap");
                close(fd);
                exit(1);
            }

            fm_receiver_addr = (unsigned char*)(virtual_base + FM_DEMOD_OFFSET);
        }

        int i, j;
        for (i = 0; i < len; i++)
        {
            *fm_receiver_addr = buf[i];
            for (j = 0; j < 150; j++);
        }

        if (bytes_to_read > 0)
            bytes_to_read -= len;
    }
}

You see I commented fwrite() function (it's used by the original code to write to files) and replaced it with my code that writes to my FPGA component: *fm_receiver_addr = buf[i];. Before that I check the address to see if it's valid and obtain another address if it's not.

For the other way, the driver module, I wrote this code:

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/device.h>
#include <linux/platform_device.h>
#include <linux/uaccess.h>
#include <linux/ioport.h>
#include <linux/io.h>

#define HPS2FPGA_BASE       0xC0000000
#define HPS2FPGA_SPAN       PAGE_SIZE

void* fm_demod_addr;
int i;

// Get a driver entry in Sysfs
static struct device_driver fm_demod_driver = 
{
    .name = "fm-demodulator",   // Name of the driver
    .bus = &platform_bus_type,  // Which bus does the device exist
};

// Function that is used when we read from the file in /sys, but we won't use it
ssize_t fm_demod_read(struct device_driver* drv, char* buf)
{ return 0; }

// Function that is called when we write to the file in /sys
ssize_t fm_demod_write_sample(struct device_driver* drv, const char* buf, size_t count)
{
    if (buf == NULL)
    {
        pr_err("Error! String must not be NULL!\n");
        return -EINVAL;
    }

    for (i = 0; i < count; i++)
    {
        iowrite8(buf[i], fm_demod_addr);
    }

    return count;
}

// Set our module's pointers and set permissions mode
static DRIVER_ATTR(fm_demod, S_IWUSR, fm_demod_read, fm_demod_write_sample);

// Set module information
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Siraj Muhammad <[email protected]>");
MODULE_DESCRIPTION("Driver for FPGA component 'FM Demodulator'");

static int __init fm_demod_init(void)
{
    int ret;
    struct resource* res;

    // Register driver in kernel
    ret = driver_register(&fm_demod_driver);
    if (ret < 0)
        return ret;

    // Create file system in /sys
    ret = driver_create_file(&fm_demod_driver, &driver_attr_fm_demod);
    if (ret < 0)
    {
        driver_unregister(&fm_demod_driver);
        return ret;
    }

    // Request exclusive access to the memory region we want to write to
    res = request_mem_region(HPS2FPGA_BASE, HPS2FPGA_SPAN, "fm-demodulator");
    if (res == NULL)
    {
        driver_remove_file(&fm_demod_driver, &driver_attr_fm_demod);
        driver_unregister(&fm_demod_driver);
        return -EBUSY;
    }

    // Map the address into virtual memory
    fm_demod_addr = ioremap(HPS2FPGA_BASE, HPS2FPGA_SPAN);
    if (fm_demod_addr == NULL)
    {
        driver_remove_file(&fm_demod_driver, &driver_attr_fm_demod);
        driver_unregister(&fm_demod_driver);
        release_mem_region(HPS2FPGA_BASE, HPS2FPGA_SPAN);
        return -EFAULT;
    }

    return 0;
}

static void __exit fm_demod_exit(void)
{
    // Remove file system from /sys
    driver_remove_file(&fm_demod_driver, &driver_attr_fm_demod);
    // Unregister the driver
    driver_unregister(&fm_demod_driver);
    // Release requested memory
    release_mem_region(HPS2FPGA_BASE, HPS2FPGA_SPAN);
    // Un-map address
    iounmap(fm_demod_addr);

}

module_init(fm_demod_init);
module_exit(fm_demod_exit);

And I revert the userspace code to its original state, and pass the driver path: /sys/bus/platform/drivers/fm-demodulator/fm_demod to the userspace app to write to it.

Any thought about it?

1
I'm not an expert in this field, but are you sure mmap is the best solution? How about request_mem_region ? (makelinux.net/ldd3/chp-9-sect-4)VAndrei
I used this function in the driver module approach, but didn't solve the problem... It prevented crashes sometimes but with no bytes transferred to FPGA. Some other times kernel crashed. I just wonder why everything goes well when reading from a file to FPGA, while it doesn't when reading from USB device to FPGA. I wish I could decode the error logs produced... the PC and LR are always at scheduler_ipi+0x8/0x4c and handle_IPI+0x10c/0x19c when kernel crashes.Siraj Muhammad
Did you try using addr2line? elinux.org/Addr2line_for_kernel_debugging To me it looks like a memory alignment issueVAndrei
If we assume that the virtual address obtained from mmap() changes if an interrupt or something happens to change that address (by the way, can this happen? I'm just wondering), shouldn't the driver module have a fixed address that never changes? Moreover, why doesn't the error show when copying from a local file?Siraj Muhammad
Can be anything: wrong RAM timings, bad compiler (btw, which compiler was used for building your kernel and for building your module?). Also it would be nice if you can share your code for user-space tool (which works on mmap(), w/o involving kernel module). Basically, looks like memory corruption to me (ARM tries to execute some instruction (in RAM) which is not a correct ARM instruction, which can be due to memory corruption). I don't think that it's related to unaligned access, because it would give you another exception.Sam Protsenko

1 Answers

2
votes
  Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
  PC is at scheduler_ipi+0x8/0x4c
  LR is at handle_IPI+0x10c/0x19c
  pc : [<800521a0>]    lr : [<800140d4>]    psr: 80000193
  [snip]
  Code: eaffffad 80564700 e92d4800 e1a0200d (4c4c9b50)
  ---[ end trace 9e492cde975c41f9 ]---

No one can probably absolutely know the answer. Note: undefined instruction!

The PC is at scheduler_ipi+0x8/0x4c, this is hardcore ARM-Linux scheduling; an inter-processor interrupt. You can disassemble the 'Code:' part to help,

   0:   eaffffad        b       0xfffffebc
   4:   80564700        subshi  r4, r6, r0, lsl #14
   8:   e92d4800        push    {fp, lr}
   c:   e1a0200d        mov     r2, sp
  10:   4c4c9b50        mcrrmi  11, 5, r9, ip, cr0

The crash is at the instruction mcrrmi and this appears to be non-sense. If you disassemble sched/core.o you will see the instruction sequence, but I bet that the '4c4c9b50' value is corrupt. Ie, this is not the code the compiler generated.

So the problem is when passing from USB device to FPGA, either using mmap() or a driver module.

I will use a zen move and think a little. The USB device use DMA? Your FPGA is probably also some how in control of the ARM/AXI bus. I would at least consider the possibility that the FPGA is corrupting a bus cycle and perhaps flipping address bits and causing a phycial write to kernel code space. This can happen when you use an innocent by-stander like a DMA peripheral. The ARM CPU will use cache and burst everything.

Things to check,

  1. The code address in (brackets) is reported as the compiler produced. If not, hardware has probably corrupted things. It is hard for Linux code to do this as the kernel code pages are typically R/O.
  2. You should also produce disassembler for any code and see what register is in effect. For instance, the (4c4c9b50) code can be found with,

    printf '\x50\x9b\x4c\x4c' > arm.bin
    objdump -marm -b binary -D arm.bin

You can just objdump vmlinux to find the assembler for the scheduler_ipi routine and then determine what a pointer might be. For instance, if this_rq() was in R9 and R9 is bogus, then you have a clue.

If the code is corrupt, you need a bus analyzer and/or some routine to monitor the location and report whenever it changes to try and locate the source of corruption.