How is system call return value passed back to user process?

Question

Assume we have a single core cpu running

int filedesc = open("foo.txt", O_RDONLY);

filedesc is a variable in user process, when open begins to be executed cpu gets context switch and runs kernel process, how is the return value of open be passed to filedesc?

additionally, compared to

FILE *file = fopen("foo.txt", "r");

read/file with fopen is much faster due to buffering, but under the hood it calls open, I wonder in this case does open still retrieve one byte after another? If so there would be context switch overhead for each byte since fopen buffer is in user process, with system call return value passing back and forth scenario in my first question, how come it runs faster? Thanks in advance!

I suggest you to have a look at the sources for the c-library you are using. It seems like the source for glibc can be found at the following address: sourceware.org/git/?p=glibc.git;a=blob;f=libio/… — kungjohan
Hi @th33lf I watched this tutorial about open/fopen by Prof Sorber youtube.com/… — mzoz
@mzoz In that example, he is not timing open by itself. He calls open, read and write, out of which open probably forms a tiny, insignificant part of the overhead. It is called only once, while the other calls are made in a loop! Most of the time would be spent in read and write. — th33lf
"How is system call return value passed back to user process?" There is an application binary interface (ABI) specification between the kernel and user space that defines (amongst other things) how parameters are passed to system calls and values are returned. — Ian Abbott

ryyker ryyker · Accepted Answer · 2020-08-24T12:55:40

"fopen is much faster [then fopen] due to buffering, but under the hood it calls open..."
In general, by definition if function1() implementation includes calling function2(), then calling function2() directly, and if using the same option set as when called by function1(), will always have a shorter execution time. If you are seeing the opposite with fopen() and open(), then it suggests the option set used when you are calling open() directly would have to be different than when it is called within fopen(). But the implementation of the internal do_sys_open() has the same number of arguments open(), so speed differential for that reason is not possible. You should question your bench-marking technique.

Regarding how return values is returned to the user...
Linux system calls are defined using variations of SYSCALL_DEFINEn. The following example implementation of open() illustrates this, and shows that in the encapsulation of the function do_sys_open(), one of the arguments include const char __user * in both the macro and the function, allowing it to track from which user the call was initiated:

long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
    struct open_flags op;
    int fd = build_open_flags(flags, mode, &op);
    struct filename *tmp;

    if (fd)
        return fd;

    tmp = getname(filename);
    if (IS_ERR(tmp))
        return PTR_ERR(tmp);

    fd = get_unused_fd_flags(flags);
    if (fd >= 0) {
        struct file *f = do_filp_open(dfd, tmp, &op);
        if (IS_ERR(f)) {
            put_unused_fd(fd);
            fd = PTR_ERR(f);
        } else {
            fsnotify_open(f);
            fd_install(fd, f);
        }
    }
    putname(tmp);
    return fd;
}

SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
{
    if (force_o_largefile())
        flags |= O_LARGEFILE;

    return do_sys_open(AT_FDCWD, filename, flags, mode);
}

How is system call return value passed back to user process?

2 Answers