12
votes

One of the Linux kernel drivers I am developing is using network communication in the kernel (sock_create(), sock->ops->bind(), and so on).

The problem is there will be multiple sockets to receive data from. So I need something that will simulate a select() or poll() in kernel space. Since these functions use file descriptors, I cannot use the system calls unless I use the system calls to create the sockets, but that seems unnecessary since I am working in the kernel.

So I was thinking of wrapping the default sock->sk_data_ready handler in my own handler (custom_sk_data_ready()), which would unlock a semaphore. Then I can write my own kernel_select() function that tries to lock the semaphore and does a blocking wait until it is open. That way the kernel function goes to sleep until the semaphore is unlocked by custom_sk_data_ready(). Once kernel_select() gets the lock, it unlocks and calls custom_sk_data_ready() to relock it. So the only additional initialization is to run custom_sk_data_ready() before binding a socket so the first call to custom_select() does not falsely trigger.

I see one possible problem. If multiple receives occur, then multiple calls to custom_sk_data_ready() will try unlock the semaphore. So to not lose the multiple calls and to track the sock being used, there will have to be a table or list of pointers to the sockets being used. And custom_sk_data_ready() will have to flag in the table/list which socket it was passed.

Is this method sound? Or should I just struggle with the user/kernel space issue when using the standard system calls?

Initial Finding:

All callback functions in the sock structure are called in an interrupt context. This means they cannot sleep. To allow the main kernel thread to sleep on a list of ready sockets, mutexes are used, but the custom_sk_data_ready() must act like a spinlock on the mutexes (calling mutex_trylock() repeatedly). This also means that any dynamic allocation must use the GFP_ATOMIC flag.


Additional possibility:

For every open socket, replace each socket's sk_data_ready() with a custom one (custom_sk_data_ready()) and create a worker (struct work_struct) and work queue (struct workqueue_struct). A common process_msg() function will be use for each worker. Create a kernel module-level global list where each list element has a pointer to the socket and contains the worker structure. When data is ready on a socket, custom_sk_data_ready() will execute and find the matching list element with the same socket, and then call queue_work() with the list element's work queue and worker. Then the process_msg() function will be called, and can either find the matching list element through the contents of the struct work_struct * parameter (an address), or use the container_of() macro to get the address of the list structure that holds the worker structure.

Which technique is the most sound?

1
Can't you have a user-space helper program doing the poll ? Multiplexing input with poll or select is related to the scheduler (since the pausing process is idle, so other processes can run) so I won't do that inside the kernel!Basile Starynkevitch
@BasileStarynkevitch: That is why I am only trying to simulate the sleep blocking of poll() and select(). Using those two system calls from the kernel is a last resort. I do suspect that there are problems with running poll() and select() in a user space helper. The helper must have access to a file descriptor (which is not done in sock_create()) and possibly permission to access a socket residing in kernel space. So now socket creation has to occur in the user space helper and the module must find the socket based on the user space file descriptor. Now it become more complicated.Joshua
You should not be doing any of that in the kernel.mpe
@mpe, you are definitely wrong about the in kernel sockets. They are used in the kernel. And about messing with the sk_data_ready handler, that is also a common practice--particularly with modules that use Netlink. The blocking/sleeping on the is another issue. I did not see any examples of such behavior.Joshua
You can do it in the kernel, but that doesn't mean you should. Code should only be in the kernel if there's a really good reason it can't be in userspace.mpe

1 Answers

3
votes

Your second idea sounds more like it will work.

The CEPH code looks like it does something similar, see net/ceph/messenger.c.