Today I am in beautiful downtown Stockholm, having left sunny San Jose (highs hitting 70 in mid-Feb) for a balmy 28F (yes that’s the HIGH) for JFokus. If I don’t freeze to death (or die from the cold I’m suffering or lose my voice) I might actually give some talks. Wish me luck.

To make up for last month’s non-techy blog, this one will be over-the-top in techy theory AND be a multi-part’er. I’ve been wrestling with some tricky bugs and I’d thought I’d share the pain. For fun, I’m going to toss in evidence that All Your Existing Optimizing Compilers Are Broken, and least they are if they attempt a data-flow style Constant Propagation on a language with a class hierarchy and a distinguished NULL pointer (e.g. Java, C++). This includes the HotSpot server compiler (although I doubt anyone’s hit the problem in practice), and many many compilers I suspect. However, this blog is going to be at least a 2-parter, so you’ll have to wait until you’ve waded through 2 techy blogs before we get to the juicy (theoretical) bug. To start down this discussion we first need to talk about Constant Propagation and the notion of a “lattice”.

## CONSTANT PROPAGATION

First up: Constant Propagation. Basic stuff only, and we’re skipping over the practical concerns of how to do this fast & accurate (which is what is in HotSpot), and what kind of realistic performance you get from the code if you do (or do not) do it (trust me, it’s worth doing if you’re looking for the right kind of “constants”). Instead, lets just look at what CP is trying to accomplish, and we’re going to stick with ‘int’ types for awhile to get the basic ideas across.

a = 1; b = a+2; System.out.println(b);

Rather obviously we can propagate the constant **1** for **a** into the expression for **b**:

b = 1+2;

and fold the constant:

b = 3;

and propagate this constant forward:

System.out.println(3);

Easy, right? We now don’t need **a** or **b** so hopefully this program runs faster. Lets try a (slightly) harder example:

if( P ) { b = 3; } else { a = 1; b = a+2; } System.out.println(b);

Here **P** is some unknown predicate; we assume it’s beyond the ability of the compiler to deduce in advance (at compile-time) what values **P** will take on. Let us suppose **P** is something like **read()** where you can type a 0 or 1 at a prompt. Back to CP: since we don’t know what **P** might be, we have to assume it could be either **true** or **false** – and both sides of the** if()** expression might be taken. So we need to evaluate both; this whole style of evaluation is sometimes called Partial Evaluation. First arm of the **if()** is easy:

if( P ) { b = 3; // the obvious constant 3 } else ...

Now we can do the other arm – and we NEED to do it before the following println or else we miss a possible constant:

} else { a = 1; b = 3; }

And at this point we can see **b** is **3** no matter what value **P** takes on (and for those “in the know” please notice I’m not going to explain SSA theory and Phi functions in this blog; it’s got too much in it already):

if( P ) { b = 3; } else { ... b = 3; } System.out.println(3);

What happens if **b** takes on different values on both arms of the **if()**? Well that depends on how clever a ‘constant’ we are willing to work with.

if( P ) { b = 3; } else { b = 4; } X;

What can we say at statement **X**? **b** is not known to be either 3 or 4, but we can say that **b** is definitely positive (and that it is either **3** or **4**). This is useful if X is e.g. part of an array range check like so: **if( b >= 0 ) …** because now we can fold the control flow. OK, lets try something more tricky yet: a loop. First a loop for which “we got nuthin”: no constants.

a = 1; b = 1; while( P ) { tmp = a+b; a = b; b = tmp; }

Here we’re computing Fibonacci while predicate **P** holds true, at least until Java integers start overflowing. Pretty quickly **a** & **b** become big unknown integers (and then overflow). Here’s a loop where we CAN find an obvious constant:

b = 3; while( P ) { ... b = 3; ... }

Even though **b** is set in the loop, it’s set to the same constant we get on loop entry so **b** must be **3**. Here’s a loop where **b** is not a constant (but IS known positive):

b = 1; while( P ) { S1; // b is NOT the constant 1 here!!! b = 2; S2; }

Knowing **b** is positive might be useful if **b** is used as an array index in the loop. OK, now lets try our first really tricky case:

b = 1; while( P ) { S1; // b IS constant 1 here!!! b = 2-b; S2; } System.out.println(b);

Is **b** a constant on loop exit? And if so, what value? (yes & 1)

Why is it that in this example I can say **b** is a constant **1** on statement **S1** and yet it is **not** a constant in the prior example? BOTH code snippets start out with:

b = 1; while( P ) { S1; // b is ????

The cases are different obviously because the rest of the loop matters! But lets be explicit about it: at the loop head **while( P )** – we can see that **b** is **1** on entry… but we just don’t know what **b** will be around the loop backedge. It might be the desired constant **1** – which meshes with the entry value for **b**, or it might be some unrelated constant, or not a constant at all. We have to inspect the* whole loop* to discover if **b** is a constant.

Furthermore, **how** we inspect the loop will let us find more (or less) constants – for the same loop! Suppose we inspect code instructions one at a time, in order – “propagating” the constants like I did with **a** and **b** above. What do we do with loops? We can’t do the loop instructions “in order” because we haven’t done them yet when we want to know what constant comes around the loop backedge. The simple (*pessimistic*) approach is to say “*we don’t know*” at the loop entry about what **b** will be around the backedge. In this case we say “* b is not a constant*” as we enter the loop. Lets call “

*” by saying*

**b**is not a constant**b**has value

*(which means “*

**bottom***we don’t know what values*“).

**b**might take onb = 1; // b is the constant '1' here! while( P ) { // unknown backedge value, so b is <em><strong>bottom</strong></em> here! S1; // b is <strong><em>bottom</em></strong> b = 2-b; // 2 minus <em><strong>bottom</strong></em> is also <em><strong>bottom</strong></em>, so <strong>b</strong> is <em><strong>bottom</strong></em> here also } // we bring a <em><strong>bottom</strong></em> value around for <strong>b</strong> here

Our end result of bottom for **b** matches what we assumed it would be around the loop backedge and pretty quickly decide that we don’t know anything about **b**.

Lets try this loop again with an * optimistic* approach. When we hit the loop and we don’t know (yet) what

**b**will be, we optimistically assume

**b**will be exactly what we want it to be – and then we’ll confirm that. We use the special value

*to say “*

**top**

**b**is any constant I want it to be –**b**is ALL*CONSTANTS AT THE SAME TIME*” – an obviously unrealistic situation that we’ll have to rectify later:

b = 1; // b is the constant '1' here. while( P ) { // b is <em><strong>top</strong></em> around the backedge - which matches the constant '1' S1; // b is a '1' here b = 2-1; // b is STILL a '1' here! } // we bring a '1' value around for b here

HEY – we found a constant! (and we brought around the backedge **b**‘s value as both * top* when we started, but as a

**1**as we finished). This is the trick about optimistic-vs-pessimistic approaches: optimistic approaches can find strictly

**more**constants than pessimistic approaches (if any are to be found) BUT the analysis is in this unrealistic state for awhile – we have to confirm all our optimistic assumptions… and sometimes they don’t work out – and then we’ll have to re-compute over the code where we assumed wrongly. Pessimistic approaches always have a conservative but correct solution:

b = 1; // b is the constant '1' here. while( P ) { // b is <em><strong>top</strong></em> around the backedge - which matches the const '1' S1; // b is a '1' here b = 2; // b is now a '2' } // we bring a '2' value around for b here

Oops – **2** around the backedge meets **1** on entry – **b** is NOT a constant. So we need to continue the analysis with our new more realistic result:

... while( P ) { // b is 2 around the backedge and 1 on entry, so we set <strong></strong>b to <em><strong>bottom</strong></em> S1; // b is a <em><strong>bottom</strong></em> here b = 2; // b is now a '2' } // we bring a '2' value around for b here

This 2nd solution is the correct one.

In general, when running an optimistic solution you start with all values at * top* until computations force them “down” towards

*(or constants we hope)… but every time a value “falls down” you have to recompute all dependent calculations to see if they also “fall”. The number of times you might need to recompute depends on the number of times you can “fall”. There are efficient techniques for doing this analysis in time linear in the size of the program… times the number of times you might “fall”.*

**bottom**

## LATTICES

If we look at our choices of values for **b** graphically we get this:

<em><strong>top</strong></em> ... -2 -1 0 1 2 3 4 ..... <em><strong>bottom</strong></em>

This structure is called a lattice. In this lattice we can see the distinguished * top* and

*values along with all possible constants in the middle. We’ll start our optimistic Constant Propagation by setting all values to*

**bottom***. Statements like*

**top****b=3**will cause

**b**to “fall” from

*to the constant*

**top****3**in the middle row. If we have to merge two different constants for

**b**then we fall again to

*. Mixing*

**bottom***with anything else will always “fall” to*

**bottom***. We can only fall twice, so a good algorithm for CP will run in 2*size_of(program).*

**bottom**Lets look at another choice of lattice that sounds good but fails in practice. Suppose we’d like a lattice that allows **b** to take on any range of values, such as 1-10 or 7-23 or -99 to +99, or MININT to MAXINT – suppose we’re trying to get clever about finding better array index values (perhaps to remove more range checks). We can merge different values in the lattice by widening the ranges. Here’s a simple example:

if( P1 ) { b = P2 ? 0 : 4; } // b has range 0-4 else { b = P3 ? 1 : 5; } // b has range 1-5 // here b has range 0-5 - the widened version of 0-4 and 1-5

Looks good, right? Here’s the poster-child case:

for( int i=0; i<A.length; i++ ) { A[i]... }

Lets break that down a little more, removing syntactic sugar and exposing the tender bare insides of this little problem:

i = 0; // i is the constant '0', duh! while( i < A.length ) { // Assume 'i' is <em><strong>top</strong></em> on the backedge! Matches our 0 on entry ... if( i<A.length ) ... //... happy happy, i is 0 i = i + 1; // i is now: 0+1 == 1 } // bring '1' around the backedge.

Ok, on the 1st CP pass we assumed **i** was * top* on the backedge on entry and found a

**1**instead – so let us “fall”

**i**to the range

**0-1**(instead of

*) and try again:*

**bottom**i = 0; // i is the constant '0', duh! while( i < A.length ) { // Assume 'i' is 0 or 1 on the backedge! ... if( i<A.length ) ... //... still happy, i is 0-1 i = i + 1; // i is now: "0-1"+1 which gives us the range 1-2 } // bring '1-2' around the backedge.

Ok, 2nd CP pass and we assumed **i** was **0-1** on the backedge on entry and found a **1-2** instead – so lets “fall” to the range **0-2** and try again:

i = 0; // i is the constant '0', duh! while( i < A.length ) { // Assume 'i' is the range 0-2 on the backedge! ... if( i<A.length ) ... //... still happy?: i is 0-2 i = i + 1; // i is now 1-3 } // bring '1-3' around the backedge.

Hey… there’s a pattern here! And we’re never gonna converge… what’s going on? What IS the lattice for ranges?

TOP ... -1 0 1 2 3 ... // ranges with 1 value (i.e. constants) ... -1-0 0-1 1-2 2-3 3-4 ... // ranges with 2 values ... -1-1 0-2 1-3 2-4 3-5 ... // ranges with 3 values .... ... -1-99 0-100 1-101 ... // ranges with 100 values .... // ...many lattice layers later... // ranges with 1000 values, etc... .... // ...many MANY lattice layers later... MININT - MAXINT (<em><strong>bottom</strong></em>) // ranges with 4 billion values

Blah! That’s a heckava huge lattice! You might be “falling” for a long long time – 4 billion times roughly. Worst-case compile-time scenario: **4 billion times sizeof(program)**. OK – so THIS lattice doesn’t work (but actually there’s another one nearby that does which HotSpot DOES do but I’ll leave that one for serious compiler junkies to read from the OpenJDK code ).

Brow sweating, but still with me? Congratulations! I first saw this stuff as a PhD grad student. It would be time to move on to the next more *interesting* lattice – but I’m out of space. Next blog we will attempt to cover the fearsome ‘**null**‘ pointer! Sounds easy? Piece-o-cake? Trust me, it only ramps up from here! 🙂

Till next time,

Cliff

Very clear explanation and cool stuff!