1
votes

Short Background

I'm currently writing a linux kernel module as a project to better understand linux kernel internals. I've written 'hello world'-type modules before, but I want to get beyond that, so I'm trying to replace some common system calls like open, read, write, and close with my own so that I can print a bit more information into the system log.

Some content I found while searching was either pre-2.6 kernel, which is not useful because the sys_call_table symbol stopped being exported starting on kernel 2.6.x. On the other hand, those that I found for 2.6.x or later appear seem to have problems of their own, even though they apparently worked at the time.

One particular O'Reilly article, which I found on the sys_call_table in linux kernel 2.6.18 post, suggests that what I'm trying to do ought to work, but it isn't. (Specifically, see the Intercepting sys_unlink() Using System.map section.)

I also read through the Linux Kernel: System call hooking example and Kernel sys_call_table address does not match address specified in system.map which, while somewhat informative, were not useful for me.

Problems and Questions

Part 1 - Unexpected Address Mismatch

I'm using Linux kernel 4.2.0-16-generic on a Kubuntu 15.10 x86_64 architecture installation. Since the sys_call_table symbol is no longer exported, I grepped the address from the system map file:

# grep 'sys_call_table' < System.map-4.2.0-16-generic
ffffffff818001c0 R sys_call_table
ffffffff81801580 R ia32_sys_call_table

With this in hand, I added the following line to my kernel module:

static unsigned long *syscall_table = (unsigned long *) 0xffffffff818001c0;

Based on this, I was expecting that a simple check would actually confirm that I was actually pointing to the location I thought I was pointing to, i.e. the base address of the kernel's unexported sys_call_table. So, I wrote a simple check like the one below into the module's init function to verify:

if(syscall_table[__NR_close] != (unsigned long *)sys_close)
{
        pr_info("sys_close = 0x%p, syscall_table[__NR_close] = 0x%p\n", sys_close, syscall_table[__NR_close]);
        return -ENXIO;
}

This check failed and different addresses were printed in the log.

I was not expecting the body of this if statement to get executed because I thought the address returned by syscall_table[__NR_close] would be the same as that of sys_close, but it does enter.

Q1: Have I missed something so far regarding the expected address-based comparison? If so, what?

Part 2 - Partially Successful?

If I remove this check, it seems I'm partially successful, because, apparently, I can at least replace the read call successfully using the code below:

static asmlinkage ssize_t (*original_read)(unsigned int fd, char __user *buf, size_t count);
// ...
static void systrap_replace_syscalls(void)
{
    pr_debug("systrap: replacing system calls\n");

    original_read  = syscall_table[__NR_read];
    original_write = syscall_table[__NR_write];
    original_close = syscall_table[__NR_close];

    write_cr0(read_cr0() & ~0x10000);

    syscall_table[__NR_read]  = systrap_read;
    syscall_table[__NR_write] = systrap_write;
    syscall_table[__NR_close] = systrap_close;

    write_cr0(read_cr0() | 0x10000);

    pr_debug("systrap: system calls replaced\n");
}

My replacement functions simply print a message and forward the call to the actual system call. For example, the read replacement function's code is below:

static asmlinkage ssize_t systrap_read(unsigned int fd, char __user *buf, size_t count)
{
        pr_debug("systrap: reading from fd = %u\n", fd);
        return original_read(fd, buf, count);
}

And the system log shows the following output when I insmod and rmmod the module:

kernel: [23226.797460] systrap: setting up module
kernel: [23226.797462] systrap: replacing system calls
kernel: [23226.797464] systrap: system calls replaced
kernel: [23226.797465] systrap: module setup complete
kernel: [23226.864198] systrap: reading from fd = 4279272912

<similar output ommitted for brevity>

kernel: [23235.560663] systrap: reading from fd = 2835745072
kernel: [23235.564774] systrap: reading from fd = 861079840
kernel: [23235.564986] systrap: cleaning up module
kernel: [23235.564990] systrap: trying to restore system calls
kernel: [23235.564993] systrap: restored sys_read
kernel: [23235.564995] systrap: restored sys_write
kernel: [23235.564997] systrap: restored sys_close
kernel: [23235.565000] systrap: system call restoration attempt complete
kernel: [23235.565002] systrap: module cleanup complete

I can let it run for a long time and, oddly enough, I never observe entries for the write and close function calls --only for the reads, which is why I thought I was only partially successful.

Q2: Have I missed something regarding the replaced system calls? If so, what?

Part 3 - Unexpected Error Message on rmmod Command

Even though the module seems to operate normally, I always get the following error when I rmmod the module from the kernel:

rmmod: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '(null)/modules.builtin.bin'

My module cleanup function simply calls another one (below) that tries to restore the function calls by doing the opposite of the replacement function above:

// called by the exit function
static void systrap_restore_syscalls(void)
{
    pr_debug("systrap: trying to restore system calls\n");
    write_cr0(read_cr0() & ~0x10000);

    /* make sure no other modules have made changes before restoring */
    if(syscall_table[__NR_read] == systrap_read)
    {
            syscall_table[__NR_read] = original_read;
            pr_debug("systrap: restored sys_read\n");
    }
    else
    {
            pr_warn("systrap: sys_read not restored; address mismatch\n");
    }
    // ... ommitted: same stuff for other sys calls

    write_cr0(read_cr0() | 0x10000);
    pr_debug("systrap: system call restoration attempt complete\n");
}

Q3: I don't know what causes the error message; any ideas here?

Part 4 - sys_open Marked for Deprecation?

In another unexpected turn of events, I find that the __NR_open macro is no longer be defined by default. In order for me to see the definition, I have to #define __ARCH_WANT_SYSCALL_NO_AT before #includeing the header files:

/*
 * Force __NR_open definition. It seems sys_open has been replaced by sys_openat(?)
 * See include/uapi/asm-generic/unistd.h:724-725
 */
#define __ARCH_WANT_SYSCALL_NO_AT

#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
// ...

Going through the kernel source code (mentioned in comment above), you find the following comments:

/*
* All syscalls below here should go away really,
* these are provided for both review and as a porting
* help for the C library version.
*
* Last chance: are any of these important enough to
* enable by default?
*/
#ifdef __ARCH_WANT_SYSCALL_NO_AT
#define __NR_open 1024
__SYSCALL(__NR_open, sys_open)
// ...

Can anyone clarify:

Q4: ...the comments above on why __NR_open is not available by default?,

Q5: ...whether it's a good idea to do what I'm doing with the #define?, and

Q6: ...what I should be using instead if I really shouldn't be trying to use __NR_open?

Epiloge - Crashing My System ????

I tried using __NR_openat, replacing that call as I had done with the previous ones:

static asmlinkage long systrap_openat(int dfd, const char __user *filename, int flags, umode_t mode)
{
    pr_debug("systrap: opening file dfd = %d, name = % s\n", filename);
    return original_openat(dfd, filename, flags, mode);
}

But this simply helped me unceremoniously crash my own system ???? by causing other processes to segfault when they tried to open a file, with gems such as:

kernel: [135489.202693] systrap: opening file dfd = 0, name = P^Q
kernel: [135489.202913] zsh[11806]: segfault at 410 ip 00007f3a380abe60 sp 00007ffd04c5b550 error 4 in libc-2.21.so[7f3a37fe1000+1c0000]

Trying to print argument data also showed odd/garbage info.

Q7: Any additional suggestions on why it would suddenly crash and why the arguments seem to be garbage-like?

I've spent several days trying to work through this and I just hope I've not missed something utterly stupid...

Please, let me know if something's not entirely clear to you in the comments and I'll attempt to clarify.

I'd be most helpful if you could provide some code snippets that actually work and/or point me in a precise-enough direction that would allow me to understand what I'm doing wrong and how to quickly get this fixed.

1
Q1: You have read question "Kernel sys_call_table address does not match address specified in system.map" and nevertheless use System.map for this purpose. Nice! Q4: openat provide functionality of open when called with AT_FDCWD as the first argument, so open is not needed. Q5: Defining __ARCH_WANT_SYSCALL_NO_AT in module code is definitely bad idea. This macro describes whole architecture, not a module. Also, as a rule, almost every macro, checked by kernel's includes, can be defined only within kernel, not by a module.Tsyvarev
@Tsyvarev: "and nevertheless use System.map for this purpose. Nice!" Well, other people I've talked to are doing something similar and it seems to work fine for them. This approach is also suggested by the linked O'Reilly article, specifically on intercepting sys_unlink using System.map. Did you not read it?code_dredd
O'Reilly article is older than 3.13 kernel, used in the question about addresses inconsistency. Your kernel(4.2.) is even newer. Also, address you get from System.map is similar to one in the question. Why do not check validness of the table's address using other method?Tsyvarev
@Tsyvarev: My understanding is that, as long as it's a post 2.6 kernel, it's supposed to work in a similar manner to what I'm trying above. If this is not true (anymore?), please elaborate. Also, I'm not sure what the kernel memory range for 64-bit systems is (what I found seemed inconclusive), but I also wanted to avoid scanning memory ranges like some articles using 32-bit kernels do during module initialization. So, what 'other method' would you suggest?code_dredd
My understanding is that, as long as it's a post 2.6 kernel, it's supposed to work in a similar manner to what I'm trying above. - No, Linux kernel continues to actively change even after 2.6. I know several things, related to symbols visibility, which has been changed between 2.6.32 and 3.10. I suggest scanning as the other method, this method has been used (on x86_64!) by the asker in the question about addreses inconsistency. The method doesn't need to be your final decision, just for check.Tsyvarev

1 Answers

4
votes

I've managed to complete this and I'm now taking the time to document my findings.

Q1: Have I missed something so far regarding the expected address-based comparison?

The problem with this comparison is that, after checking /proc/kallsyms, I saw that sys_close and other related symbols are also no longer exported. I already knew this for some symbols, but I was still under the (mistaken) impression that some others were still available. So the check I was using (below) evaluates to true and causes the module to fail the 'safety' check.

if(syscall_table[__NR_close] != (unsigned long *)sys_close)
{
        /* ... */
}

In short, you simply need to trust the assumption about the system call table address retrieved from the System.map-$(uname -r) file. The 'safety' check is unnecessary and will also not work as expected.

Q2: Have I missed something regarding the replaced system calls?

This problem was eventually traced to either one or both of the following header files I had included (I didn't bother to figure out which one.):

#include <uapi/asm-generic/unistd.h>
#include <uapi/asm-generic/errno-base.h>

These were causing the __NR_* macros to get redefined, and therefore expanded, to incorrect values --at least for the x86_64 architecture. For example, the indices for sys_read and sys_write in the system call table are supposed to be 0 and 1 respectively, but they were getting other values and ended up indexing to completely unexpected locations in the table.

Just removing the header files above fixed the issue without additional code changes.

Q3: I don't know what causes the error message; any ideas here?

The error message was a side-effect of the previous issue. Obviously, the fact that the system call table was being indexed incorrectly (see Q2) caused other locations in memory to get modified.

Q4: ...the comments above on why __NR_open is not available by default?

This was a mis-report of the IDE, which I stopped using. The __NR_open macro was already defined; the fix on Q2 made it even more obvious.

Q5: ...whether it's a good idea to do what I'm doing with the #define?

Short answer: No, not a good idea and definitely not needed. See Q2 above.

Q6: ...what I should be using instead if I really shouldn't be trying to use __NR_open

Based on answers to previous questions, this is not a problem. Using __NR_open is just fine and expected. This part had gotten messed up due to the header files in Q2

Q7: Any additional suggestions on why it would suddenly crash and why the arguments seem to be garbage-like?

The use of __NR_openat and the crashes was likely being caused by the macro being expanded to an incorrect value (see Q2 again). However, I can say that I had no real need to use it. I was supposed to be using __NR_open as specified above, but was trying out __NR_openat as a workaround for the issue fixed in Q2.

In short, the answer to Q2 helped fix several issues in a cascading effect.