Short Background
I'm currently writing a linux kernel module as a project to better understand linux kernel internals. I've written 'hello world'-type modules before, but I want to get beyond that, so I'm trying to replace some common system calls like open
, read
, write
, and close
with my own so that I can print
a bit more information into the system log.
Some content I found while searching was either pre-2.6 kernel, which is not useful because the sys_call_table
symbol stopped being exported starting on kernel 2.6.x. On the other hand, those that I found for 2.6.x or later appear seem to have problems of their own, even though they apparently worked at the time.
One particular O'Reilly article, which I found on the sys_call_table in linux kernel 2.6.18 post, suggests that what I'm trying to do ought to work, but it isn't. (Specifically, see the Intercepting sys_unlink() Using System.map section.)
I also read through the Linux Kernel: System call hooking example and Kernel sys_call_table address does not match address specified in system.map which, while somewhat informative, were not useful for me.
Problems and Questions
Part 1 - Unexpected Address Mismatch
I'm using Linux kernel 4.2.0-16-generic on a Kubuntu 15.10 x86_64 architecture installation. Since the sys_call_table
symbol is no longer exported, I grep
ped the address from the system map file:
# grep 'sys_call_table' < System.map-4.2.0-16-generic
ffffffff818001c0 R sys_call_table
ffffffff81801580 R ia32_sys_call_table
With this in hand, I added the following line to my kernel module:
static unsigned long *syscall_table = (unsigned long *) 0xffffffff818001c0;
Based on this, I was expecting that a simple check would actually confirm that I was actually pointing to the location I thought I was pointing to, i.e. the base address of the kernel's unexported sys_call_table
. So, I wrote a simple check like the one below into the module's init function to verify:
if(syscall_table[__NR_close] != (unsigned long *)sys_close)
{
pr_info("sys_close = 0x%p, syscall_table[__NR_close] = 0x%p\n", sys_close, syscall_table[__NR_close]);
return -ENXIO;
}
This check failed and different addresses were printed in the log.
I was not expecting the body of this if
statement to get executed because I thought the address returned by syscall_table[__NR_close]
would be the same as that of sys_close
, but it does enter.
Q1: Have I missed something so far regarding the expected address-based comparison? If so, what?
Part 2 - Partially Successful?
If I remove this check, it seems I'm partially successful, because, apparently, I can at least replace the read
call successfully using the code below:
static asmlinkage ssize_t (*original_read)(unsigned int fd, char __user *buf, size_t count);
// ...
static void systrap_replace_syscalls(void)
{
pr_debug("systrap: replacing system calls\n");
original_read = syscall_table[__NR_read];
original_write = syscall_table[__NR_write];
original_close = syscall_table[__NR_close];
write_cr0(read_cr0() & ~0x10000);
syscall_table[__NR_read] = systrap_read;
syscall_table[__NR_write] = systrap_write;
syscall_table[__NR_close] = systrap_close;
write_cr0(read_cr0() | 0x10000);
pr_debug("systrap: system calls replaced\n");
}
My replacement functions simply print a message and forward the call to the actual system call. For example, the read replacement function's code is below:
static asmlinkage ssize_t systrap_read(unsigned int fd, char __user *buf, size_t count)
{
pr_debug("systrap: reading from fd = %u\n", fd);
return original_read(fd, buf, count);
}
And the system log shows the following output when I insmod
and rmmod
the module:
kernel: [23226.797460] systrap: setting up module
kernel: [23226.797462] systrap: replacing system calls
kernel: [23226.797464] systrap: system calls replaced
kernel: [23226.797465] systrap: module setup complete
kernel: [23226.864198] systrap: reading from fd = 4279272912
<similar output ommitted for brevity>
kernel: [23235.560663] systrap: reading from fd = 2835745072
kernel: [23235.564774] systrap: reading from fd = 861079840
kernel: [23235.564986] systrap: cleaning up module
kernel: [23235.564990] systrap: trying to restore system calls
kernel: [23235.564993] systrap: restored sys_read
kernel: [23235.564995] systrap: restored sys_write
kernel: [23235.564997] systrap: restored sys_close
kernel: [23235.565000] systrap: system call restoration attempt complete
kernel: [23235.565002] systrap: module cleanup complete
I can let it run for a long time and, oddly enough, I never observe entries for the write
and close
function calls --only for the read
s, which is why I thought I was only partially successful.
Q2: Have I missed something regarding the replaced system calls? If so, what?
Part 3 - Unexpected Error Message on rmmod
Command
Even though the module seems to operate normally, I always get the following error when I rmmod
the module from the kernel:
rmmod: ERROR: ../libkmod/libkmod.c:506 lookup_builtin_file() could not open builtin file '(null)/modules.builtin.bin'
My module cleanup function simply calls another one (below) that tries to restore the function calls by doing the opposite of the replacement function above:
// called by the exit function
static void systrap_restore_syscalls(void)
{
pr_debug("systrap: trying to restore system calls\n");
write_cr0(read_cr0() & ~0x10000);
/* make sure no other modules have made changes before restoring */
if(syscall_table[__NR_read] == systrap_read)
{
syscall_table[__NR_read] = original_read;
pr_debug("systrap: restored sys_read\n");
}
else
{
pr_warn("systrap: sys_read not restored; address mismatch\n");
}
// ... ommitted: same stuff for other sys calls
write_cr0(read_cr0() | 0x10000);
pr_debug("systrap: system call restoration attempt complete\n");
}
Q3: I don't know what causes the error message; any ideas here?
Part 4 - sys_open
Marked for Deprecation?
In another unexpected turn of events, I find that the __NR_open
macro is no longer be defined by default. In order for me to see the definition, I have to #define __ARCH_WANT_SYSCALL_NO_AT
before #include
ing the header files:
/*
* Force __NR_open definition. It seems sys_open has been replaced by sys_openat(?)
* See include/uapi/asm-generic/unistd.h:724-725
*/
#define __ARCH_WANT_SYSCALL_NO_AT
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
// ...
Going through the kernel source code (mentioned in comment above), you find the following comments:
/*
* All syscalls below here should go away really,
* these are provided for both review and as a porting
* help for the C library version.
*
* Last chance: are any of these important enough to
* enable by default?
*/
#ifdef __ARCH_WANT_SYSCALL_NO_AT
#define __NR_open 1024
__SYSCALL(__NR_open, sys_open)
// ...
Can anyone clarify:
Q4: ...the comments above on why __NR_open
is not available by default?,
Q5: ...whether it's a good idea to do what I'm doing with the #define
?, and
Q6: ...what I should be using instead if I really shouldn't be trying to use __NR_open
?
Epiloge - Crashing My System ????
I tried using __NR_openat
, replacing that call as I had done with the previous ones:
static asmlinkage long systrap_openat(int dfd, const char __user *filename, int flags, umode_t mode)
{
pr_debug("systrap: opening file dfd = %d, name = % s\n", filename);
return original_openat(dfd, filename, flags, mode);
}
But this simply helped me unceremoniously crash my own system ???? by causing other processes to segfault when they tried to open a file, with gems such as:
kernel: [135489.202693] systrap: opening file dfd = 0, name = P^Q
kernel: [135489.202913] zsh[11806]: segfault at 410 ip 00007f3a380abe60 sp 00007ffd04c5b550 error 4 in libc-2.21.so[7f3a37fe1000+1c0000]
Trying to print argument data also showed odd/garbage info.
Q7: Any additional suggestions on why it would suddenly crash and why the arguments seem to be garbage-like?
I've spent several days trying to work through this and I just hope I've not missed something utterly stupid...
Please, let me know if something's not entirely clear to you in the comments and I'll attempt to clarify.
I'd be most helpful if you could provide some code snippets that actually work and/or point me in a precise-enough direction that would allow me to understand what I'm doing wrong and how to quickly get this fixed.
System.map
for this purpose. Nice! Q4:openat
provide functionality ofopen
when called withAT_FDCWD
as the first argument, soopen
is not needed. Q5: Defining__ARCH_WANT_SYSCALL_NO_AT
in module code is definitely bad idea. This macro describes whole architecture, not a module. Also, as a rule, almost every macro, checked by kernel's includes, can be defined only within kernel, not by a module. – TsyvarevSystem.map
is similar to one in the question. Why do not check validness of the table's address using other method? – TsyvarevMy understanding is that, as long as it's a post 2.6 kernel, it's supposed to work in a similar manner to what I'm trying above.
- No, Linux kernel continues to actively change even after 2.6. I know several things, related to symbols visibility, which has been changed between 2.6.32 and 3.10. I suggest scanning as the other method, this method has been used (on x86_64!) by the asker in the question about addreses inconsistency. The method doesn't need to be your final decision, just for check. – Tsyvarev