2
votes

I'm learning bison/yacc (and reviewing some c as well) and trying to build a json parser as simple test project.

Using the terminology found on http://www.json.org/ I have a struct pair that represents a string/value pair and a struct object representing an object that has a members field that contains basically a pointer to a linked list of pairs.

I have a simple c function (create_pair) that returns a new pair. I noticed a weird behaviour that I'm not able to explain:

  • If I call such function from the "main" and I print the memory address of returned structs their address is always different.
  • If I call the very same function inside a bison "action" I see that my function returns a pointer that happens to reside always on the very same memory address.

Does this make any sense?

Details/Code follow:

here's the code (the link, contains a list of 4 pastebin links pointing to the four different files included in the "project"):

you can compile and run it with:

lex t.l
yacc -d t.y
cc y.tab.c lex.yy.c t.c
./a.out

If you launch the code and run it with the following input:

{ "firstName": "A", "lastName": "B" }

you'll see that:

1) the code executed in "main" (check file t.y), creates four different pair objects, I then print their memory address and the output is something like (notice different addresses):

p 0x7fff52476be8 //(<-memory address for pair p)
print pair: P, Hellov
q 0x7fff52476bc8 //(<-memory address for pair q)
print pair: Q, Hellox

2) as soon as I paste the json sample above we hit the "pair" rule twice, the first time for "firstName": "A", the second time for "lastName": "B", I create a new pair in both cases and print the memory address, and they are the same:

Creating pair 0x7fff52475c88
print pair: firstName, A
Creating pair 0x7fff52475c88
print pair: lastName, B

Why does this happen?

2

2 Answers

2
votes

You should not care what the address of a pair is. It is irrelevant to the work being performed with them, and the addresses you are seeing are incidental and without consequence.

Your function create_pair does not return a pointer. It is declared with pair create_pair(…), so it returns a pair, by value.

In main, you define pair p = create_pair(l, v);. This creates an automatic object p, usually by setting aside space for it on the stack. Then it calls create_pair. The value returned by create_pair is copied into p. Later, when you print &p, you are printing the address of p, not an address that create_pair returned.

Similarly, when you define pair q = create_pair(l, x);, you create another object, q. Because the lifetime of this object overlaps the lifetime of p, they must be in different places, so they have different addresses. When you print &q, you see this different address.

Next, consider the code you placed in the Bison rule, pair p = create_pair($<u_string>1, $<u_value>3);. Bison executes this code when it is processing a rule. It creates an automatic object, and you print its address. Then execution leaves the scope of this code, and Bison undoubtedly goes on to do other things, and it exits the processing it is currently doing. The lifetimes of automatic objects end, and the data that was on the stack is popped. Later, Bison comes back to this rule. At that point, simply because computers are mechanical in operation, the stack pointer has the same address it did before. So, when a new p is created, it happens to be in the same place as the old p. Unlike p and q, which had to be in different places because they existed at the same time, this old p and new p exist only at different times, so they may be in the same place.

This will not necessarily always happen. If your grammar is more complicated, Bison might have other things in the stack at one time and not another (or perhaps not; the parsing machine Bison produces might not act that way; I do not know offhand). Or if you have the same code in another rule, the stack might be different when that rule is processed.

0
votes

So, what you are seeing is the address of a stack variable changing. Which is perfectly normal. If they all had the same address, you'd get the one value overwrite the other, which wouldn't be very useful.

Edit: When you call a function (from the same caller function, e.g. main), the address of a stack variable is always the same - since the stack started out being the same when the call is made [typically - of course, sometimes compilers do funny things with the stack, so it's not 100% guaranteed]

Edit2: To clarify, if the calls are the same CHAIN of calls, e.g. calling function A from function B from function C, then the stack is OFTEN the same in A, no matter where in B or C the call is made. Of course, if we call function A from function D from function C, then all bets are off as to the address of local variables in A [well, it's most likely fairly similar, but if function D has some huge local variables, it could be very different]. And the caveat that this is TYPICAL still applies. Compilers may well leave the cleanup of the stack "until it's got enough to bother", rather than cleaning up every single call, which means three calls to function A may accumulate some "garbage" on the stack, which doesn't get cleaned up until later.

I'm a little confused as to why you think it should be different from this?