25
votes

I just came across an idea in The Structure And Interpretation of Computer Programs:

Data is just dumb code, and code is just smart data

I fail to understand what it means. Can some one help me to understand it better?

5
How can you say it's "elegantly phrased" if you don't know what it means?sykora
First I thought it is a chiasmus but now I think it is an antimetabole.tuinstoel
This was one of the most important insights I had as a programmer, and it's practically the paradigm exemplar of a white-board programming question. There's even a famous video series where' it's explained on a whiteboard. For the love of the machine, how can it be considered off topic here?Racheet

5 Answers

40
votes

This is one of the fundamental lessons of SICP and one of the most powerful ideas of computer science. It works like this:

What we think of as "code" doesn't actually have the power to do anything by itself. Code defines a program only within a context of interpretation -- outside of that context, it is just a stream of characters. (Really a stream of bits, which is really a stream of electrical impulses. But let's keep it simple.) The meaning of code is defined by the system within which you run it -- and this system just treats your code as data that tells it what you wanted to do. C source code is interpreted by a C compiler as data describing an object file you want it to create. An object file is treated by the loader as data describing some machine instructions you want to queue up for execution. Machine instructions are interpreted by the CPU as data defining the sequence of state transitions it should undergo.

Interpreted languages often contain mechanisms for treating data as code, which means you can pass code into a function in some form and then execute it -- or even generate code at run time:

#!/usr/bin/perl
# Note that the above line explicitly defines the interpretive context for the
# rest of this file.  Without the context of a Perl interpreter, this script
# doesn't do anything.
sub foo {
    my ($expression) = @_;
    # $expression is just a string that happens to be valid Perl

    print "$expression = " . eval("$expression") . "\n";
}

foo("1 + 1 + 2 + 3 + 5 + 8");              # sum of first six Fibonacci numbers
foo(join(' + ', map { $_ * $_ } (1..10))); # sum of first ten squares

Some languages like scheme have a concept of "first-class functions", which means that you can treat a function as data and pass it around without evaluating it until you really want to.

The upshot is that the division between "code" and "data" is pretty much arbitrary, a function of perspective only. The lower the level of abstraction, the "smarter" the code has to be: it has to contain more information about how it should be executed. On the other hand, the more information the interpreter supplies, the more dumb the code can be, until it starts to look like data with no smarts at all.

One of the most powerful ways to write code is as a simple description of what you need: Data which will be turned into code describing how to get you what you need by the interpretive context. We call this "declarative programming".

For a concrete example, consider HTML. HTML does not describe a Turing-complete programming language. It is merely structured data. Its structure contains some smarts that let it control the behavior of its interpretive context -- but not a lot of smarts. On the other hand, it contains more smarts than the paragraphs of text that appear on an average web page: Those are pretty dumb data.

7
votes

In the context of security: Due to buffer overflows, what you thought of as data and thus harmless (such as an image) can become executed as code and p0wn your machine.

In the context of software development: Many developers are very afraid of "hardcoding" things and very keen on extracting parameters that might have to change into configuration files. This is often based on the idea that config files are just "data" and thus can be changed easily (perhapy by customers) without raising the issues (compilation, deployment, testing) that changing anything in the code would.

What these developers don't realize is that since this "data" influences the behaviour of the program, it really is code; it could break the program and the only reason not to require complete testing after such a change is that, if done correctly, the configurable values have a very specific, well-documented effect and any invalid value or a broken file structure will be caught by the program.

However, what all too often happens is that the config file structure becomes a programming language in its own right, complete with control flow and everything - one that's badly documented, has a quirky syntax and parser and which only the most experienced developers in the team can touch without breaking the application completely.

2
votes

So, in a language like Scheme, even code is treated as first class data. You can treat functions and lambda expressions much like you treat other code, say passing them into other functions and lambda expressions. I recommend continuing with the text as this will all become quite clear.

2
votes

This is something you should come to understand from writing in a compiler.

One common step in compilers is to transform the program into an abstract syntax tree. Representation will often be like trees such as [+, 2, 3] where + is the root, and 2, 3 are the children.

Lisp languages simply treats this as its data. So there is no separation between data and code which are both lists that look like AST trees.

0
votes

Code is definitely data, but data is definitely not always code. Let's take a basic example - customer name. It's nothing to do with code, it's a functional (essential), as opposed to a technical (accidental) aspect of an application.

You could probably say that any technical/accidental data is code and that functional/essential data is not.