33
votes

I want to write a piece of code that changes itself continuously, even if the change is insignificant.

For example maybe something like

for i in 1 to  100, do 
begin
   x := 200
   for j in 200 downto 1, do
    begin
       do something
    end
end

Suppose I want that my code should after first iteration change the line x := 200 to some other line x := 199 and then after next iteration change it to x := 198 and so on.

Is writing such a code possible ? Would I need to use inline assembly for that ?

EDIT : Here is why I want to do it in C:

This program will be run on an experimental operating system and I can't / don't know how to use programs compiled from other languages. The real reason I need such a code is because this code is being run on a guest operating system on a virtual machine. The hypervisor is a binary translator that is translating chunks of code. The translator does some optimizations. It only translates the chunks of code once. The next time the same chunk is used in the guest, the translator will use the previously translated result. Now, if the code gets modified on the fly, then the translator notices that, and marks its previous translation as stale. Thus forcing a re-translation of the same code. This is what I want to achieve, to force the translator to do many translations. Typically these chunks are instructions between to branch instructions (such as jump instructions). I just think that self modifying code would be fantastic way to achieve this.

11
just use another variable... x = value--;Gregory Pakosz
@Greg The purpose is not to achieve this effect. It is to write a self modifying code.AnkurVj
@AnkurVj, that is a fascinating project. stacker's link above will be useful to you.Jonathan M
Is there a reason you can't use x86 assembly directly? It seems like assembly would still be able to trigger the "force many translations" requirement.TheHans255

11 Answers

15
votes

You might want to consider writing a virtual machine in C, where you can build your own self-modifying code.

If you wish to write self-modifying executables, much depends on the operating system you are targeting. You might approach your desired solution by modifying the in-memory program image. To do so, you would obtain the in-memory address of your program's code bytes. Then, you might manipulate the operating system protection on this memory range, allowing you to modify the bytes without encountering an Access Violation or '''SIG_SEGV'''. Finally, you would use pointers (perhaps '''unsigned char *''' pointers, possibly '''unsigned long *''' as on RISC machines) to modify the opcodes of the compiled program.

A key point is that you will be modifying machine code of the target architecture. There is no canonical format for C code while it is running -- C is a specification of a textual input file to a compiler.

9
votes

It is possible, but it's most probably not portably possible and you may have to contend with read-only memory segments for the running code and other obstacles put in place by your OS.

9
votes

Sorry, I am answering a bit late, but I think I found exactly what you are looking for : https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/

In this article, they change the value of a constant by injecting assembly in the stack. Then they execute a shellcode by modifying the memory of a function on the stack.

Below is the first code :

#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>

void foo(void);
int change_page_permissions_of_address(void *addr);

int main(void) {
    void *foo_addr = (void*)foo;

    // Change the permissions of the page that contains foo() to read, write, and execute
    // This assumes that foo() is fully contained by a single page
    if(change_page_permissions_of_address(foo_addr) == -1) {
        fprintf(stderr, "Error while changing page permissions of foo(): %s\n", strerror(errno));
        return 1;
    }

    // Call the unmodified foo()
    puts("Calling foo...");
    foo();

    // Change the immediate value in the addl instruction in foo() to 42
    unsigned char *instruction = (unsigned char*)foo_addr + 18;
    *instruction = 0x2A;

    // Call the modified foo()
    puts("Calling foo...");
    foo();

    return 0;
}

void foo(void) {
    int i=0;
    i++;
    printf("i: %d\n", i);
}

int change_page_permissions_of_address(void *addr) {
    // Move the pointer to the page boundary
    int page_size = getpagesize();
    addr -= (unsigned long)addr % page_size;

    if(mprotect(addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC) == -1) {
        return -1;
    }

    return 0;
}
5
votes
5
votes

Depending on how much freedom you need, you may be able to accomplish what you want by using function pointers. Using your pseudocode as a jumping-off point, consider the case where we want to modify that variable x in different ways as the loop index i changes. We could do something like this:

#include <stdio.h>

void multiply_x (int * x, int multiplier)
{
    *x *= multiplier;
}

void add_to_x (int * x, int increment)
{
    *x += increment;
}

int main (void)
{
    int x = 0;
    int i;

    void (*fp)(int *, int);

    for (i = 1; i < 6; ++i) {
            fp = (i % 2) ? add_to_x : multiply_x;

            fp(&x, i);

            printf("%d\n", x);
    }

    return 0;
}

The output, when we compile and run the program, is:

1
2
5
20
25

Obviously, this will only work if you have finite number of things you want to do with x on each run through. In order to make the changes persistent (which is part of what you want from "self-modification"), you would want to make the function-pointer variable either global or static. I'm not sure I really can recommend this approach, because there are often simpler and clearer ways of accomplishing this sort of thing.

4
votes

A self-interpreting language (not hard-compiled and linked like C) might be better for that. Perl, javascript, PHP have the evil eval() function that might be suited to your purpose. By it, you could have a string of code that you constantly modify and then execute via eval().

3
votes

The suggestion about implementing LISP in C and then using that is solid, due to portability concerns. But if you really wanted to, this could also be implemented in the other direction on many systems, by loading your program's bytecode into memory and then returning to it.

There's a couple of ways you could attempt to do that. One way is via a buffer overflow exploit. Another would be to use mprotect() to make the code section writable, and then modify compiler-created functions.

Techniques like this are fun for programming challenges and obfuscated competitions, but given how unreadable your code would be combined with the fact you're exploiting what C considers undefined behavior, they're best avoided in production environments.

2
votes

In standard C11 (read n1570), you cannot write self modifying code (at least without undefined behavior). Conceptually at least, the code segment is read-only.

You might consider extending the code of your program with plugins using your dynamic linker. This require operating system specific functions. On POSIX, use dlopen (and probably dlsym to get newly loaded function pointers). You could then overwrite function pointers with the address of new ones.

Perhaps you could use some JIT-compiling library (like libgccjit or asmjit) to achieve your goals. You'll get fresh function addresses and put them in your function pointers.

Remember that a C compiler can generate code of various size for a given function call or jump, so even overwriting that in a machine specific way is brittle.

1
votes

My friend and I encountered this problem while working on a game that self-modifies its code. We allow the user to rewrite code snippets in x86 assembly.

This just requires leveraging two libraries -- an assembler, and a disassembler:

FASM assembler: https://github.com/ZenLulz/Fasm.NET

Udis86 disassembler: https://github.com/vmt/udis86

We read instructions using the disassembler, let the user edit them, convert the new instructions to bytes with the assembler, and write them back to memory. The write-back requires using VirtualProtect on windows to change page permissions to allow editing the code. On Unix you have to use mprotect instead.

I posted an article on how we did it, as well as the sample code.

These examples are on Windows using C++, but it should be very easy to make cross-platform and C only.

0
votes

This is how to do it on windows with c++. You'll have to VirtualAlloc a byte array with read/write protections, copy your code there, and VirtualProtect it with read/execute protections. Here's how you dynamically create a function that does nothing and returns.

#include <cstdio>
#include <Memoryapi.h>
#include <windows.h>
using namespace std;
typedef unsigned char byte;

int main(int argc, char** argv){
    byte bytes [] = { 0x48, 0x31, 0xC0, 0x48, 0x83, 0xC0, 0x0F, 0xC3 }; //put code here
    //xor %rax, %rax
    //add %rax, 15
    //ret
    int size = sizeof(bytes);
    DWORD protect = PAGE_READWRITE;
    void* meth = VirtualAlloc(NULL, size, MEM_COMMIT, protect);
    byte* write = (byte*) meth;
    for(int i = 0; i < size; i++){
        write[i] = bytes[i];
    }
    if(VirtualProtect(meth, size, PAGE_EXECUTE_READ, &protect)){
        typedef int (*fptr)();
        fptr my_fptr = reinterpret_cast<fptr>(reinterpret_cast<long>(meth));
        int number = my_fptr();
        for(int i = 0; i < number; i++){
            printf("I will say this 15 times!\n");
        }
        return 0;
    } else{
        printf("Unable to VirtualProtect code with execute protection!\n");
        return 1;
    }
}

You assemble the code using this tool.

0
votes

While "true" self modifying code in C is impossible (the assembly way feels like slight cheat, because at this point, we're writing self modifying code in assembly and not in C, which was the original question), there might be a pure C way to make the similar effect of statements paradoxically not doing what you think are supposed do to. I say paradoxically, because both the ASM self modifying code and the following C snippet might not superficially/intuitively make sense, but are logical if you put intuition aside and do a logical analysis, which is the discrepancy which makes paradox a paradox.

#include <stdio.h>
#include <string.h>

int main()
{
    struct Foo
    {
        char a;
        char b[4];
    } foo;

    foo.a = 42;
    strncpy(foo.b, "foo", 3);
    printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);

    *(int*)&foo.a = 1918984746;
    printf("foo.a=%i, foo.b=\"%s\"\n", foo.a, foo.b);

    return 0;
}
$ gcc -o foo foo.c && ./foo
foo.a=42, foo.b="foo"
foo.a=42, foo.b="bar"

First, we change the value of foo.a and foo.b and print the struct. Then we change only the value of foo.a, but observe the output.