Difference between API and ABI

224

votes

I am new to Linux system programming and I came across API and ABI while reading Linux System Programming.

Definition of API:

An API defines the interfaces by which one piece of software communicates with another at the source level.

Definition of ABI:

Whereas an API defines a source interface, an ABI defines the low-level binary interface between two or more pieces of software on a particular architecture. It defines how an application interacts with itself, how an application interacts with the kernel, and how an application interacts with libraries.

How can a program communicate at a source level? What is a source level? Is it related to source code in any way? Or the source of the library gets included in the main program?

The only difference I know is API is mostly used by programmers and ABI is mostly used by a compiler.

apiabi

by source level they mean something like include file to expose function definitions - Anycorn

Also see stackoverflow.com/q/2171177/632951 - Pacerier

59

votes

The API is what humans use. We write source code. When we write a program and want to use some library function we write code like:

 long howManyDecibels = 123L;
 int ok = livenMyHills( howManyDecibels);

and we needed to know that there is a method livenMyHills(), which takes a long integer parameter. So as a Programming Interface it's all expressed in source code. The compiler turns this into executable instructions which conform to the implementation of this language on this particular operating system. And in this case result in some low level operations on an Audio unit. So particular bits and bytes are squirted at some hardware. So at runtime there's lots of Binary level action going on which we don't usually see.

343

votes

API: Application Program Interface

This is the set of public types/variables/functions that you expose from your application/library.

In C/C++ this is what you expose in the header files that you ship with the application.

ABI: Application Binary Interface

This is how the compiler builds an application.
It defines things (but is not limited to):

How parameters are passed to functions (registers/stack).
Who cleans parameters from the stack (caller/callee).
Where the return value is placed for return.
How exceptions propagate.

51

votes

I mostly come across these terms in the sense of an API-incompatible change, or an ABI-incompatible change.

An API change is essentially where code that would have compiled with the previous version won't work anymore. This can happen because you added an argument to a function, or changed the name of something accessible outside of your local code. Any time you change a header, and it forces you to change something in a .c/.cpp file, you've made an API-change.

An ABI change is where code that has already been compiled against version 1 will no longer work with version 2 of a codebase (usually a library). This is generally trickier to keep track of than API-incompatible change since something as simple as adding a virtual method to a class can be ABI incompatible.

I've found two extremely useful resources for figuring out what ABI compatibility is and how to preserve it:

The list of Do's and Dont's with C++ for the KDE project
Ulrich Drepper's How to Write Shared Libraries.pdf (primary author of glibc)

23

votes

This is my layman explanations:

API - think of include files. They provide programming interfaces.
ABI - think of kernel module. When you run it on some kernel, it has to agree on how to communicate without include files, i.e. as low-level binary interface.

22

votes

Linux shared library minimal runnable API vs ABI example

This answer has been extracted from my other answer: What is an application binary interface (ABI)? but I felt that it directly answers this one as well, and that the questions are not duplicates.

In the context of shared libraries, the most important implication of "having a stable ABI" is that you don't need to recompile your programs after the library changes.

As we will see in the example below, it is possible to modify the ABI, breaking programs, even though the API is unchanged.

main.c

#include <assert.h>
#include <stdlib.h>

#include "mylib.h"

int main(void) {
    mylib_mystruct *myobject = mylib_init(1);
    assert(myobject->old_field == 1);
    free(myobject);
    return EXIT_SUCCESS;
}

mylib.c

#include <stdlib.h>

#include "mylib.h"

mylib_mystruct* mylib_init(int old_field) {
    mylib_mystruct *myobject;
    myobject = malloc(sizeof(mylib_mystruct));
    myobject->old_field = old_field;
    return myobject;
}

mylib.h

#ifndef MYLIB_H
#define MYLIB_H

typedef struct {
    int old_field;
} mylib_mystruct;

mylib_mystruct* mylib_init(int old_field);

#endif

Compiles and runs fine with:

cc='gcc -pedantic-errors -std=c89 -Wall -Wextra'
$cc -fPIC -c -o mylib.o mylib.c
$cc -L . -shared -o libmylib.so mylib.o
$cc -L . -o main.out main.c -lmylib
LD_LIBRARY_PATH=. ./main.out

Now, suppose that for v2 of the library, we want to add a new field to mylib_mystruct called new_field.

If we added the field before old_field as in:

typedef struct {
    int new_field;
    int old_field;
} mylib_mystruct;

and rebuilt the library but not main.out, then the assert fails!

This is because the line:

myobject->old_field == 1

had generated assembly that is trying to access the very first int of the struct, which is now new_field instead of the expected old_field.

Therefore this change broke the ABI.

If, however, we add new_field after old_field:

typedef struct {
    int old_field;
    int new_field;
} mylib_mystruct;

then the old generated assembly still accesses the first int of the struct, and the program still works, because we kept the ABI stable.

Here is a fully automated version of this example on GitHub.

Another way to keep this ABI stable would have been to treat mylib_mystruct as an opaque struct, and only access its fields through method helpers. This makes it easier to keep the ABI stable, but would incur a performance overhead as we'd do more function calls.

API vs ABI

In the previous example, it is interesting to note that adding the new_field before old_field, only broke the ABI, but not the API.

What this means, is that if we had recompiled our main.c program against the library, it would have worked regardless.

We would also have broken the API however if we had changed for example the function signature:

mylib_mystruct* mylib_init(int old_field, int new_field);

since in that case, main.c would stop compiling altogether.

Semantic API vs programming API vs ABI

We can also classify API changes in a third type: semantic changes.

For example, if we had modified

myobject->old_field = old_field;

to:

myobject->old_field = old_field + 1;

then this would have broken neither API nor ABI, but main.c would still break!

This is because we changed the "human description" of what the function is supposed to do rather than a programmatically noticeable aspect.

I just had the philosophical insight that formal verification of software in a sense moves more of the "semantic API" into a more "programmatically verifiable API".

Semantic API vs Programming API

We can also classify API changes in a third type: semantic changes.

The semantic API, is usually a natural language description of what the API is supposed to do, usually included in the API documentation.

It is therefore possible to break the semantic API without breaking the program build itself.

For example, if we had modified

myobject->old_field = old_field;

to:

myobject->old_field = old_field + 1;

then this would have broken neither programming API, nor ABI, but main.c the semantic API would break.

There are two ways to programmatically check the contract API:

test a bunch of corner cases. Easy to do, but you might always miss one.
formal verification. Harder to do, but produces mathematical proof of correctness, essentially unifying documentation and tests into a "human" / machine verifiable manner! As long as there isn't a bug in your formal description of course ;-)

Tested in Ubuntu 18.10, GCC 8.2.0.

9

votes

(Application Binary Interface) A specification for a specific hardware platform combined with the operating system. It is one step beyond the API (Application Program Interface), which defines the calls from the application to the operating system. The ABI defines the API plus the machine language for a particular CPU family. An API does not ensure runtime compatibility, but an ABI does, because it defines the machine language, or runtime, format.

Courtesy

9

votes

Let me give a specific example how ABI and API differ in Java.

An ABI incompatible change is if I change a method A#m() from taking a String as an argument to String... argument. This is not ABI compatible because you have to recompile code that is calling that, but it is API compatible as you can resolve it by recompiling without any code changes in the caller.

Here is the example spelled out. I have my Java library with class A

// Version 1.0.0
public class A {
    public void m(String string) {
        System.out.println(string);
    }
}

And I have a class that uses this library

public class Main {
    public static void main(String[] args) {
        (new A()).m("string");
    }
}

Now, the library author compiled their class A, I compiled my class Main and it is all working nicely. Imagine a new version of A comes

// Version 2.0.0
public class A {
    public void m(String... string) {
        System.out.println(string[0]);
    }
}

If I just take the new compiled class A and drop it together with the previously compiled class Main, I get an exception on attempt to invoke the method

Exception in thread "main" java.lang.NoSuchMethodError: A.m(Ljava/lang/String;)V
        at Main.main(Main.java:5)

If I recompile Main, this is fixed and all is working again.

8

votes

Your program (source code) can be compiled with modules who provide proper API.

Your program (binary) can run on platforms who provide proper ABI.

API restricts type definitions, function definitions, macros, sometimes global variables a library should expose.

ABI restricts what a "platform" should provide for you program to run on. I like to consider it in 3 levels:

processor level - the instruction set, the calling convention
kernel level - the system call convention, the special file path convention (e.g. the /proc and /sys files in Linux), etc.
OS level - the object format, the runtime libraries, etc.

Consider a cross-compiler named arm-linux-gnueabi-gcc. "arm" indicates the processor architecture, "linux" indicates the kernel, "gnu" indicates its target programs use GNU's libc as runtime library, different from arm-linux-androideabi-gcc which use Android's libc implementation.

2

votes

API - Application Programming Interface is a compile time interface which can is used by developer to use non-project functionality like library, OS, core calls in source code

ABI^[About] - Application Binary Interface is a runtime interface which is used by a program during executing for communication between components in machine code

0

votes

The ABI refers to the layout of an object file / library and final binary from the perspective of successfully linking, loading and executing certain binaries without link errors or logic errors occuring due to binary incompatibility.

The binary format specification (PE, COFF, ELF, .obj, .o, .a, .lib (import library, static library), .NET assembly, .pyc, COM .dll): the headers, the header format, defining where the sections are and where the import / export / exception tables are and the format of those
The instruction set used to encode the bytes in the code section, as well as the specific machine instructions
The actual signature of the functions and data as defined in the API (as well as how they are represented in the binary (the next 2 points))
The calling convention of the functions in the code section, which may be called by other binaries (particularly relevant to ABI compatibility being the functions that are actually exported)
The way data is represented and aligned in the data section with respect to its type (particularly relevant to ABI compatibility being the data that is actually exported)
The system call numbers or interrupt vectors hooked in the code
The name decoration of exported functions and data
Linker directives in object files
Preprocessor / compiler / assembler / linker flags and directives used by the API programmer and how they are interpreted to omit, optimise, inline or change the linkage of certain symbols or code in the library or final binary (be that binary a .dll or the executable in the event of static linking)

The bytecode format of .NET C# is an ABI (general), which includes the .NET assembly .dll format. The virtual machine that interprets the bytecode has a specific ABI that is C++ based, where types need to be marshalled between native C++ types that the native code's specific ABI uses and the boxed types of the virtual machine's ABI when calling bytecode from native code and native code from bytecode. Here I am calling an ABI of a specific program a specific ABI, whereas an ABI in general, such as 'MS ABI' or 'C ABI' simply refers to the calling convention and the way structures are organised, but not a specific embodiment of the ABI by a specific binary that introduces a new level of ABI compatibility concerns.

An API refers to the set of type definitions exported by a particular library imported and used in a particular translation unit, from the perspective of the compiler of a translation unit, to successfully resolve and check type references to be able to compile a binary, and that binary will adhere to the standard of the target ABI, such that if the library that actually implements the API is also compiled to a compatible ABI, it will link and work as intended. If the API is updated the application may still compile, but there will now be a binary incompatibility and therefore a new binary needs to be used.

An API involves:

Functions, variables, classes, objects, constants, their names, types and definitions presented in the language in which they are coded in a syntactically and semantically correct manner
What those functions actually do and how to use them in the source language
The source code files that need to be included / binaries that need to be linked to in order to make use of them, and the ABI compatibility thereof

Difference between API and ABI

10 Answers

API: Application Program Interface

ABI: Application Binary Interface