tail-call optimization in Java

Doug Lea wrote:

Hi Cliff,  


Could you remind us why Java doesn’t allow tail-recursion optimization?  




Heh – *all* things allow “as if” optimizations, ’cause how you gonna know?     😉
Java likes to hand out decent stack traces, but these are specifically not language designed or required – but in practice decent stack traces ARE required as a primary debugging tool. 


You might be able to do tail-RECURSION optimization “as if” with counters somehow.
You might have more trouble doing tail-CALL optimization with counters – and that might mess with the usual Java security model – all things below a security call can get priviledges – which in practice means a stack-crawl the state of the stack around the security call being “known” to the VM (and to the compiler for optimizations).


So: do-able I believe, with some work to keep the security stuff functional & fast.  Stack-traces for recursion are probably OK truncated (since primary usage is debugging, and no language spec requirement).  Stack-traces for tail-calls get chopped is more problematic.  I’m open to suggestions here (maintain a fake 2nd lower-cost stack ?   append frame “tokens” into a buffer ?   )

Any Java language experts Out There want to chime in here?



Quick Post-JavaOne Update

I’m still de-compressing from JavaOne but have almost caught up.


I had a long talk with the CEO of SureLogic; I like their technology a lot, and they have the Bill Pugh stamp of approval.  We (Azul & SureLogic) are exploring some kind of connection; they find concurrency bugs, we have customers with concurrency bugs.


I also yakked with Fortify, another Java static analysis tool company.  Hakistan makes the best chocolate!    🙂


I gave the NonBlocking Hash Table talk; at last count I saw a report of 440 people attending.  Very few people walked out during the talk – unbelievable!  I had no idea so many people would be interested.


BileBlog mentioned our Testing Concurrent Software talk (no link handy from the JavaOne website???).  He said almost nothing, which for BileBlog is very good indeed!  Report claims ~750 people attended!  The Debugging Data Races talk was also well attended, given that it was at 10pm at night.


We (Azul) had a long chat with the JPC folks.  These guys are nuts!  They have an X86 / DOS emulator written entirely in pure Java.  They can boot & run old DOS games our OUR hardware!  I’ve played Prince of Persia and Mario on a 400-way Azul box!   I still don’t “get” their reason for existing (why is the Oxford physics dept paying for this?  I heard the reasoning but it still doesn’t resonate with me) – but the abandonware guys are going nuts with this stuff.  We (Azul) are interested because we could possibly run small bits of X86 JNI in Java instead of doing an RMI call.  In any case, they dropped by our office for a visit; we had a fun morning of hacking.  I gave them some blue-sky advice for speeding things up. They’re currently running maybe 10% of native speed – but maybe could reach 50% or so with some more work.  I can hardly wait!


Why does JavaOne offset the worst coffee on the planet?  At one point I got in the IBM espresso machine line, and the BEA rep behind me laughed.  He pointed out it was a 1/2hr line; I countered that I could go to Starbucks & back in that much time.  He basically dared me to do so & offered to watch the line (he was *bored*). Sure enough I was back in the pavilion with Starbucks in hand (triple mocha, all decaf) in 20 minutes and the person in front of me was still 5 people from the head of the line.      🙂



Hardware Visibility vs Finite State Machine

I’ve been having an interesting conversation about the NonBlocking or Lock-Free Hash Table, running down the tail end of this post; hopefully Kevin won’t mind if I repost some of his comments here.  Typepad does not allow comments to have HTML and our replies & counter-replies are getting hard to read!  So Kevin writes:

I would note there’s no way for you to “read the current K/V state”.   You have to agree with that. There’s no such thing or way.


i.e. if you do a load 1000 times, you might still get 1000 old values of K or V.


If you agree that the “actual” K/V state is just a concept that you can’t know then there’s no reason to believe it exists or even discuss it.  i.e. it provides no value if you can’t know it. 

There are 2 times where you DO get the “current K/V state” – (1) the clock cycle upon reading a K-that-is-an-X, a K-that-is-a-NULL, or a V (given you already read a not-null not-X K) – and CAS counts as a read. And (2) after a “long time” past the last update to that slot.


You can complete a CAS for a V and still have an old view of a K, for instance.

Yup; the FSM guarantees that at the time you attempted the CAS the “old value for K” is also the current value of K and also the future value of K for all time.  You don’t CAS on a V until K has hit it’s final value.


So the only thing that’s worth discussing is what views each thread has, and what the total legal system state is for all threads.


The proof requires a full enumeration of the legal states for all other threads, along with the description of a single threads FSM, and that no two threads have bad cooperation that causes an erroneous FSM transition.

Yup again; fortunately the FSMs for each K/V slot are independent so the proof of this is nothing more than the trivially replicated proof of each separate K/V FSM. 


Now I need to show that many threads running the same FSM are legal.  Again, this is trivial to reduce – by definition of an FSM a FSM does not care which thread makes transitions. So the proof devolves down to “is the FSM correct”.


This point bears repeating: I don’t care which thread makes any transition.  I rely on the hardware’s correct implementation of atomic CAS in order to make transitions.  I don’t care who makes the transitions or in what order they happen – as long as they follow the FSM’s “rules”.


Also, the hw doesn’t provide the abstraction that there is a single K/V state. It provides the behaviors the memory model describes, which is not a set of rules around single values …it’s a set of visibility rules, which inherently allow multiple values for a single address to  exist in a system for a long time.

Yes again – and these views are not contradictory.  i.e., all these statements are true all at the same time:

  • “hardware provides … a set of visibility rules”
  • “hardware allows multiple values for a single address to exist for a long time”
  • “hardware provides the abstraction of a single K/V state”


For the last 2 statements I’ll note that the abstraction of a single K/V state exists it’s just hard to witness.  But not impossible. I witness it via a successful CAS (FSM transition) or if I wait “long enough”.  (I’ll add that I’ve seen “long enough” run into the several minute range on real hardware!!!).


Perhaps it would help if you would provide a counter-example?  Some set of thread schedules & successful CAS’s that move outside the FSM?  You’re allowed to assume any amount of delay on K/V visibility but a successful CAS has to remain atomic.  Such a counter-example would of course blow my claim of correctness so I don’t think you can come up with such a thing.  But the attempt will be constructive for us both.  It’s always good for clarity when somebody else plays “The Counter-Example” game.



Travel Time

I’ve been doing Azul’s Java6 port intermixed with working on JavaOne slides and OOPSLA PC reviews.
Today I’m off to Hawthorne, NY to visit IBM for the OOPSLA PC meeting; when I come back I’m off to SFO for JavaOne.  I’ve got 3 talks at JavaOne this year; the workload is getting ridiculous!  Next time I post I’ll probably have a JavaOne writeup, and maybe post my “Debugging Data Races” BOF slides.  But first I gotta write those slides.  Fortunately I’ve got some downtime on a plane looming in my future…


If there’s any lessons for me about this whole Non-Blocking Hash Table Extravaganza, it’s this:
The topic is accessible enough, tough enough (interesting enough) and timely enough that a whole lot of very smart people have banged their heads upon this problem and hit upon solutions very near to mine.  I keep getting emails from people essentially saying “I did it this way”…. and they are 90% of the way there.




Stupidly, my main debugging-data-races technique amounts to use print statements(!)???

I’m looking over my notes for the talk and much of I have amounts to glorified forms of print statements, or tricks to get the same info as a print statement without getting the time-shift that hides the data-race.  I’ve got some notes on why we write dataraces (my armchair psychology) in the first place, and some notes  on spotting some of the symptoms that you are about to start creating (and debugging) dataraces    But for debugging them, I’m gonna end up recommending some very un-glamorous approaches.  I’ll try to get the slides posted before JavaOne, but I suspect they’ll be very fluid up till when the BOF starts.