1
votes

Can you enable interrupts in page fault handler? Is there an ARM kernel contention with preemptive scheduling?

I got an ARM kernel oops in UDP receiving code with CONFIG_PREEMPT, or when interrupt is enabled in fault handler.

The problem is similar to what another user reported here. But in my case when I send 110% load UDP packets to the system (system drops about 10% packets), kernel oops in a few minutes. This happens only if there are some busybox shell scripts running, not if only the UDP receiving program is running. I've tracked the data addresses it always looks good, the buffer was allocated and used before it is freed.

There are two ways to avoid it:

[1] When changing scheduling from preempt (CONFIG_PREEMPT) to preempt_voluntary, the problem goes away. Is this a known issue with ARM on kernel 2.6.39? With preempt scheduling I also see problem in jffs2 after a long while, but not with preempt_voluntary.

For a moment I suspected it is the Ethernet DMA fully utilized the bus thus blocking CPU from loading its TLB entry thus causing page fault. I'm deducing because busybox scripts need to be in the picture, when a script is spawned it creates address space and load many TLB entries thus overloading the bus. If preempt_voluntary is a solution, can DMA blocking bus be ruled out?

The test I'm running is a LTIB kernel 2.6.39.4 lpclinux on a phy3250 based system.

[2] Some more tests showed that the page fault handler is nested by Ethernet interrupts. When disabling interrupts in the kernel page fault handler __dabt_svc, but keep it enabled in the user page fault handler __dabt_user, the problem goes away. If not, the nest level goes up to 4 and it oops'ed. So the question is: Is enabling interrupts in page fault handler correct?

The test code for [2] goes below. Lines with @@@@ are added or modified. Then capture the nesting level in do_DataAbort().

file arch/arm/kernel/entry-armv.S:
__dabt_svc:
    svc_entry
... ...
    @
    @ set desired IRQ state, then call main handler
    @
    debug_entry r1
    @@@@Not_Enable_Irq_In_Dabtsvc
    ldr r2, =armv_dabtsvc_count @@@@
    ldr r3, [r2]    @@@@
    add r3, r3, #1  @@@@
    str r3, [r2]    @@@@
    msr cpsr_c, r9 @@@@disable thisk
    mov r2, r2 @@@@add this extra inst
    mov r2, sp
    bl  do_DataAbort

    @
    @ IRQs off again before pulling preserved data off the stack
    @
    disable_irq_notrace

    ldr r2, =armv_dabtsvc_count @@@@
    ldr r3, [r2]    @@@@
    sub r3, r3, #1  @@@@
    str r3, [r2]    @@@@
    @
    @ restore SPSR and restart the instruction
    @
    ldr r2, [sp, #S_PSR]
    svc_exit r2             @ return from exception
 UNWIND(.fnend      )
ENDPROC(__dabt_svc)

And add the variable to the file too:

file arch/arm/kernel/entry-armv.S:
@@@@save nesting level:
    .data            @@@@
    .align           @@@@
armv_dabtsvc_count:  @@@@
    .long   0   @ count svc entry    @@@@

I'm trying to link all these up. Can kernel experts see whether all the tests make sense? Is disabling interrupts in page fault handler is a valid solution?

Edit: The oops in page fault handler is not the first failure. There was a "do_bad_area" in a proceeding alignment handler. Subsequently that failed fixup to unaligned access caused the page fault. Yes as someone commented below, fixing unaligned access is very troublesome. Those unaligned accesses are from ip_input, ip_fragment, and udp stack. Once I fixed all those in the stack, the problem is gone.

Edit again: The problem is with two operations in alignment handler: It fetches the instruction, and fetches data the instruction refers to. The oops is reported by data access, but the cause is fetching instruction failed with a first page fault failure. Since the fetch instruction is in kernel space, the page is always valid, that indicates a silicon bug. If change the code to fetch again it would succeed, that confirms it is more likely a silicon bug. Interrupt gets into the picture because of excess TLB flushing it brings in. For short, TLB loading is automatic thus fetching instruction in kernel space cannot fail. But still it failed.

1
You should contact the linux kernel mailing list about this. If it crashes then it's a bug.Nico Erfurth
No, you should not disable interrupts in the page fault handler. The page fault can happen in a kernel thread that is doing copy_from_user(). There is no reason to block interrupts in this case; you increase latency. Probably you are masking the problem with all of your suggestions; the kernel code is full of subtleties.artless noise
For alignment, see this question. The question would by why is a driver/kernel module doing un-aligned accesses?artless noise
The interrupts are masked for user mode, to prevent a mix-up in the alignment trap flags. The page fault handlers should not call copy_from_user(); they are part of the implementation. __get_user() is different, it is trying to get the code/instruction that caused the alignment trap. The switch statement manual decodes the instruction.artless noise
It is a hail mary operation to fix-up kernel accesses. It is much better to fix the drivers, etc. However, Masta's suggestion is good. The ARM Linux mailing list is a better place to ask. My guess is they will say the same thing and ask you to upgrade to the latest code.artless noise

1 Answers

1
votes

I guess this is the answer (incomplete, to be tested):

There is a problem when enabling interrupt too early. The __get_user() is assumed to be used in atomic context when it is used with interrupt enabled in do_alignment(). If the interrupt-enabling is deferred to after that point, everything should be ok.

Please look into two kernel commits. The first one on Jun 25 2011, that defers interrupt-enabling. The second one on Feb 25 2013 which changes uses of __get_user() to probling_kernel_address().

The first commit:

The 3.x kernel removed interrupt-enabling in low-level handlers __dabt_svc and __dabt_user etc. The commit message:

git diff 8b418616..02fe2845 entry-armv.S
commit 02fe2845d6a837ab02f0738f6cf4591a02cc88d4
Author: Russell King <[email protected]>
Date:   Sat Jun 25 11:44:06 2011 +0100

    ARM: entry: avoid enabling interrupts in prefetch/data abort handlers

    Avoid enabling interrupts if the parent context had interrupts enabled
    in the abort handler assembly code, and move this into the breakpoint/
    page/alignment fault handlers instead.

    This gets rid of some special-casing for the breakpoint fault handlers
    from the low level abort handler path.

    Acked-by: Will Deacon <[email protected]>
    Signed-off-by: Russell King <[email protected]>

commit 8b4186160b7894ca4583f702a562856d5d9e9118
Author: Russell King <[email protected]>
Date:   Sat Jun 25 19:25:02 2011 +0100

And the code diff snippet:

diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index d644d02..c46bafa 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -185,20 +185,15 @@ ENDPROC(__und_invalid)
 __dabt_svc:
        svc_entry
... ...
        dabt_helper

        @
-       @ set desired IRQ state, then call main handler
+       @ call main handler
        @
-       debug_entry r1
-       msr     cpsr_c, r9
        mov     r2, sp
        bl      do_DataAbort
......

That confirms interrupts do not need to be enabled too early in fault handlers.

The second commit:

commit b255188f90e2bade1bd11a986dd1ca4861869f4d
Author: Russell King <[email protected]>
Date:   Mon Feb 25 16:10:42 2013 +0000

    ARM: fix scheduling while atomic warning in alignment handling code

    Paolo Pisati reports that IPv6 triggers this warning:

    BUG: scheduling while atomic: swapper/0/0/0x40000100
    [<c001b1c4>] (unwind_backtrace+0x0/0xf0) from [<c0503c5c>] (__schedule_bug+0x48/0x5c)
    [<c0503c5c>] (__schedule_bug+0x48/0x5c) from [<c0508608>] (__schedule+0x700/0x740)
    [<c0508608>] (__schedule+0x700/0x740) from [<c007007c>] (__cond_resched+0x24/0x34)
    [<c007007c>] (__cond_resched+0x24/0x34) from [<c05086dc>] (_cond_resched+0x3c/0x44)
    [<c05086dc>] (_cond_resched+0x3c/0x44) from [<c0021f6c>] (do_alignment+0x178/0x78c)
    [<c0021f6c>] (do_alignment+0x178/0x78c) from [<c00083e0>] (do_DataAbort+0x34/0x98)
    [<c00083e0>] (do_DataAbort+0x34/0x98) from [<c0509a60>] (__dabt_svc+0x40/0x60)
    Exception stack(0xc0763d70 to 0xc0763db8)
    [<c0509a60>] (__dabt_svc+0x40/0x60) from [<c02a8490>] (__csum_ipv6_magic+0x8/0xc8)

Fix this by using probe_kernel_address() stead of __get_user().
 arch/arm/mm/alignment.c |   11 ++++-------