Final Fields, Part 2

I’ve been having waaaaaay too much fun this month, dealing with “final” fields – final is in quotes because I’ve been finding waaaaay too many Generic Popular Frameworks (TM) that in fact write to final fields long long after the constructor has flowed under the bridge.  Optimizing final fields is in theory possible, but in practice it’s busting a Lot of Popular Code.

From Doug Lea:

It might be worse than that. No one ever tried to reconcile JSR133 JMM JLS specs with the JVM specs. So I think that all the JVM spec says is:
http://java.sun.com/docs/books/jvms/second_edition/html/Concepts.doc.html#29882
Once a final field has been initialized, it always contains the same value.

Which is obviously false (System.in etc).

De-serialization plays nasty with final fields any time it has to re-create a serialized object with final fields.  It does so via Reflection (for small count of objects), and eventually via generated bytecodes for popular de-serializations.  The verifier was tweaked to allow de-serialization generated bytecodes to write to final fields… so de-serialization has been playing nasty with final fields and getting away with it.  What’s different about de-serialization vs these other Generic Popular Frameworks?  I think it’s this:

De-serialization does an initial Write to the final field, after <init> but before ANY Read of the field.

These other frameworks are doing a Read (and if it is null), a Write, then futher Reads.  It’s that initial Read that returns a NULL that’s tripping them up, because when its JIT’d its the value used for some of the later Reads.

Why bother?  What’s the potential upside to using final fields?

  • Expressing user intent – but final fields can be set via Reflection, JNI calls, & generated bytecodes (besides the “normal” constructor route), hence they are not *really* final.  It’s more like C’s “const”, just taking a little more syntax to “cast away const” and update the thing.
  • Static final field optimizations (ala Java asserts).  For these, Java asserts crucially rely on the JVM & JIT to load these values at JIT-time and constant fold away the turned-off assert logic.
  • Non-static final field optimizations.  This is basically limited to Common Subexpression Elimination (CSE) of repeated load, and then the chance to CSE any following chained expressions.

I claim this last one is almost nil in normal Java code. Why are non-static final field optimizations almost nil?  Because not all fields of the same class have the same value, hence there is no compile-time constant and no constant-folding.  Hence the field has to be loaded at least once.  Having loaded a field once, the cost to load it a 2nd time is really really low, because it surely hits in cache.  Your upside is mostly limited to removing a 1-cycle cache-hitting load.  For the non-static final field to represent a significant gain you’d need these properties:

  • Hot code. By definition, if the code is cold, there’s no gain in optimizing it.
  • Repeated loads of the field.  The only real gain for final-fields is CSE of repeated loads.
  • The first load must hit in cache.  The 2nd & later loads will surely hit in cache.   If the first load (which is unavoidable) misses in cache, then the cache miss will cost 100x the cost of the 2nd and later loads… limiting any gain in removing the 2nd load to 1% or so.
  • An intervening opaque operation between the loads, like a lock or a call.  Other operations, such as an inlined call, can be “seen through” by the compiler and normal non-final CSE will remove repeated loads without any special final semantics.
  • The call has to be really cheap, or else it dominates the gain of removing the 2nd load.
  • Cheap-but-not-inlined calls are hard to come by, requiring something like a mega-morphic v-call returning a trivial constant which will still cost maybe “only” 30 cycles… limiting the gain of removing a cache-hitting 1-cycle repeated final-field load to under 5%.

So I’ve been claiming the gains for final fields in normal Java code are limited to expressing user intent.  This we can do with something as weak as a C++ “const”.  I floated this notion around Doug Lea and got this back:

Doug Lea:

{example of repeated final loads spanning a lock}

… And I say to them: I once (2005?) measured the performance of adding these locals and, in the aggregate, it was too big of a hit to ignore, so I just always do it. (I suppose enough other things could have changed for this not to hold, but I’m not curious enough to waste hours finding out.)

My offhand guess is that the cases where it matters are those in which the 2nd null check on reload causes more branch complexity that hurts further optimizations.

Charles Nutter added:

I’ll describe the case in JRuby…
In order to maintain per-thread Ruby state without constantly hitting thread locals, we pass a ThreadContext object along the stack for almost all calls.  ThreadContext has final references to the JRuby runtime object it is associated with, as well as commonly used literal values like “nil”, “true”, and “false”.  The JRuby runtime object itself in turn has final references to other common literal values, JRuby subsystems, and so on.
Now, let’s assume I’m not a very good compiler writer, and as a result JRuby has a very naive compiler that’s doing repeated loads of those fields on ThreadContext to support other operations, and potentially repeatedly loading the JRuby runtime in order to load its finals too.  Because Hotspot does not consider those repeat final accesses that are *provably* constant (ignoring post-construction final modification), they enter into inlining budget calculations.  As you know, many of those budgets are pretty small…so essentially useless repeat accesses of final fields can end up killing optimizations that would fire if they weren’t eating up the budget.
If we’re in a situation where everything inlines no matter what, I’m sure you’re right… the difference between eliding and not eliding is probably negligible, even with a couple layers of dereferencing.  But we constantly butt up against inlining budgets, so anything I can possibly to do reduce code complexity can pay big dividends.  I’d just like Hotspot in this case to be smarter about those repeat accesses and not penalize me for what could essentially be folded away.

To summarize: JRuby makes lots of final fields that really ARE final, and they span not-inlined calls (so require the final moniker to be CSE’d), AND such things are heavily chained together so there’s lots of follow-on CSE to be had.  Charles adds:

JRuby is littered with this pattern more than I’d like to admit.  It definitely has an impact, especially in larger methods that might load that field many times.  Do the right thing for me, JVM!

So JRuby at least precisely hits the case where final field optimizations can pay off nicely, and Doug Lea locks are right behind him.

Yuch.  Now I really AM stuck with doing something tricky.  If I turn off final field optimizations to save the Generic Popular Frameworks, I burn JRuby & probably other non-Java languages that emit non-traditional (but legal) bytecodes.  If I don’t turn them off, these frameworks take weird NULL exceptions under load (as the JIT kicks in).  SO I need to implement some middle ground… of optimizing final fields for people who “play by the rules”, but Doing The Expected Thing for those that don’t.

Cliff

Writing to Final Fields After Construction

[ed. Updated Oct 19, see below, heck it’s almost like THREE posts in the same month…]

Surprise!  TWO blog posts in the same month!!!

Surprise #2!  A popular open-source framework writes to final fields after object construction via generated bytecodes!  Final fields are supposed to be… well, final. I was quite shocked to realize the default Verification settings let such things go by (to support some common use-case in Reflection; de-serialization I think is the use-case).  Final fields are allowed to be written to exactly once in the constructor.  If you write to them either outside the constructor, or more than once in the constructor, or publish the “this” pointer of a partially constructed object with a final field, or write to the final via Reflection then…. All Bets Are Off.  See, for example, section 9.1.1. of this paper by Bill Pugh.  To quote:

“Another problem is that the semantics is designed to allow aggressive optimization of final fields.  Within a thread, it is permissible to reorder reads of a final field with calls to methods that may change final fields via reflection.”

In particular, the following code “behaves differently” (but legally) after the JIT kicks in:

  final Object _crunk;
  void someFunction() {
    if( _crunk == null ) // a First Read of _crunk
      init_crunk();      // Does this write to a final field?
    ..._crunk...         // Perhaps, a Second Read of _crunk or perhaps not.

In all cases there is a first-read of _crunk.  Before the JIT kicks in, there is a 2nd read of _crunk after the initialization call… because the interpreter executes each ‘getfield‘ bytecode in isolation.  The JIT, however, notices that there are 2 reads of _crunk outside the constructor, and therefore it is profitable to do common-subexpression-elimination of the loads of _crunk – and legal to do so across the not-inlined-call “init_crunk()” because _crunk is a final field.  The setting of _crunk inside the “init_crunk()” call is ignored until the CPU leaves the JIT’d version of someFunction().

This optimization of the final field is *the* main optimization performed on final fields.  It is a crucially important optimization for Java asserts – with Java asserts, all the asserts are guarded by a test of a static final field.  Since that field is set once and known statically, the JIT’s can optimize on it’s current value.  In particular, the guarding field is typically set to false (asserts are turned off), and the JITs read the value early (at JIT time) and constant-fold the guard test away.  I.e., there’s no performance penalty for having turned-off asserts…. because the final field is read very very early, and then optimized based on it’s value.

In the case of our open-source framework, the framework is lazily de-serializing an object with final fields.  They test for a null value in a final field, which is an indication of a lazily-not-yet-de-serialized object.  If the final field is null, they call out to a complex de-serialization routine (which is sometimes not inlined into the local compilation unit by the JIT) which sets the final field.  After the test & de-serialization call, they then begin using the final field assuming it is not null.  Those later uses are optimized by the JIT to use the original value loaded from the final field… i.e. null.  The observation of the initializing write is delayed until after the CPU exits compilation unit.  The NEXT call to the same routine then loads the final field again and now observes the not-null value.

So What Do We Do?

So what do we do?  Do we ship a JVM which does a legal (and typically very useful) optimization on final fields… that happens to kill this one open source framework occasionally (after sufficient JITing under heavy load, and the right random pot-luck of inlining to trigger the behavior)?  (btw, I have been in communication with the people in the project, and they are now aware of the issue, and are working on it)    Do I make a magic flag “-XX:-OptimizeFinalFields” that you need to use with this framework  (it will be needed at least to run our JVM on existing jar files)?  How long do we need this flag?  I *hate* user-visible flags… they stick around *forever*, in ancient scripts of antiquity never to be changed again…

Right now I am leaning towards a solution which simply observes if a field is ever set outside a constructor, and disables final-field optimizations on a field-by-field case if it ever sees such an update.  No flag needed and it will “Do The Right Thing” in this case, although the JIT’d code will be somewhat pessimistic (in that no final-field optimizations will happen for all instances of that field when a single out-of-bounds update is observed).  How clever do I need to get, to keep the performance there for Everybody Else, while allowing old jar files for this framework to keep on doing what it is doing?

Cliff

Update Oct 19-

Doug Lea writes:

I just saw your blog post …

Every time I see things like this I feel sad/guilty that because we don’t give people a good way to do what they want (mainly, lazily-initialize or deserialize finals), they do [unfortunate] things instead.  If you have any good thoughts on offering better ways to do this, please let me know.  In the absence of any, maybe I ought to try another push for Fences API.

I was struck while reading this that the main problem in your intro example is that Java makes it too easy for programmers not to notice that fields and local reads of those fields are different  things.  In my concurrency course, I’ve tried to force students to notice by telling them that for one assignment, they are required to always explicitly declare/use locals. I wish I could say that this experience generalizes, but I still see too many cases of, for some volatile field:

if (field != null) field.f();

and similar brokennesses. When a language makes it hard to think clearly about something, it’s not surprising that people don’t.

Cliff writes:

So disallow mentioning a volatile variable “bare”.   It needs to be decorated, to make it more clear you are performing an action on access.  Require that for all volatile fields, you are limited to:

    "...field.<strong>vol_read</strong>() ..." and "field.<strong>vol_set</strong>(x)"?

Too bulky syntax… how about:

    "...field.<strong>get_acq</strong>() ..." and "field.<strong>set_rel</strong>(x)"?

No better.  So we switch to styles, how about Itanic load-acquire and store-release style:

    "... x.<strong>ldacq </strong>..."  and "x.<strong>strel</strong> = ..."

Or using language more suitable to Java programmers:

    "...x.<strong>get</strong> ..." and "x.<strong>put </strong>= ..."

So your example below becomes:

    "if( field.<strong>get</strong> != null ) field.<strong>get</strong>.f();"

Obviously 2 “get” actions in a row.

But proposing new syntax for Java is water-under-the-bridge.  I’m with you on the “sigh”.  This can only be fixed in a new language.

The Greatest Trip Report

Divorce.

So ends the Greatest Trip of my life.  22 years ago I fell in love and married the woman of my dreams.  I was happy and content in love, full of energy and hope for the future.  I had a needed talent; I sallied forth to Save The World or at least make it a better place – while building a safe, secure and rich life for us.  We had a child, then another, and still more until we had 4 – each child just as precious as the last.  They are great kids.  I found my talents for programming, for complex computer language implementation skills,  for singing and for public speaking.  Life was rich, busy and fulfilling – to me.

To the other person in my life, however, things were different.  20 years later I discover she felt that her life was stifling and constraining – her dreams too long had gone unfulfilled, nay, unnoticed.  Her pain, held inside too long, turned to anger and resentment.  Too late we tried marriage counseling, honeymoon style vacations, long heart to heart talks- but too much water had flowed under that bridge.  About two years ago we gave up the hope of reconciliation and started the process of dividing two lives that had lived together for 20 years.

That process is finally ending.  We have a final legal resolution now and all that remains is the raising of our 4 kids independently.  The situation for the last two years has taken most of my time and all of my emotional energy.  It has been a place of personal growth and introspection, of deep thinking, of tears and sadness.  It has also been (eventually, after a long time of sadness) a journey of joy and discovery; of doing things I have long denied myself (just returned from my 2nd year at Burning Man!!!); and of new celebrations of life.

I’m enjoying my half-time with the kids and becoming more of a Dad and more a part of their lives.  I’m getting my energies back and am feeling more ready to slay more dragons than I have in years.

Look Out World, Here I Come!
Cliff

No JavaOne This Year…

UPDATE Oct 4 – I had a blast talking with the Disruptor dude, Martin Thompson, so I’m back in SF on Oct 5.  I’ll be hanging out in the Starbucks at 262 O’Farrell from 11 on.  Stop by if you want to chat.  (Blog material that came from the barroom discussion on Oct 2: spinning on Thread.yield probably is in fact “fixed” with some kind of exponential back-off strategy; tight spin loops need an X86 “PAUSE” instruction; locked-add and locked-CAS on Intel’s new SandyBridge do NOT scale – I should use locked-xadd for a fast X86 fence op, talks about the state of CPU affinity on Linux…).  Stop by Starbucks if you want to partake of more of these conversations!

No JavaOne this year, at least for me. I am boycotting it because of the recent shabby treatment of both the conference AND Java.  Please let me know how you feel the conference is going, because right now I am strongly thinking of putting my weight behind some other conferences and not doing JavaOne ever again.  Got any suggestions?  Must be a Bay Area conference – how about a JavaTwo?  JavaPlusPlus, anybody?    (specifically excluding the totally fabulous JVM Language Summit which is small-by-design.  If Brian could figure out how to scale that sucker up, but keep the great interactions & talks going…..)

Despite no JavaOne, I am still in the Bay Area and would enjoying visiting with a lot of people I normally only see when they visit JavaOne… so I will be in the Hotel W bar starting around 4pm TODAY (Oct 2, 2011), until dinner.  Drop by & share a drink and a war story!  (too short a notice? I am thinking of doing another geek’s drink/dinner on Weds… where’s Ron when you need him?  In any case, email or blog-comment and if the head count exceeds some modest threshold I’ll throw one together).  Here’s a war-story to get started: contrary to rumors, Azul Systems is doing quite well… and our next-gen X86 JVM is happily running 100Gig+ heaps with max-pause times in the handful of milliseconds range.  Not that I would be guilty of gloating over some Big 6-Letter Company’s difficulties in delivering a large-heap low-pause GC or anything… OK, so I am.  Come buy me a drink and I’ll tell you another story, or better yet you can tell me yours!

Cliff

PS: To David Dice: I think you’re being too modest with that little toy you sent me slides on… and in any case it makes for great blog material even IF it’s half-baked.  That’s the whole point of blogs! I think you should open up on it.  The Disruptor guys are not the first people to go there, nor will they be the last…   🙂

Me: I got my own take on where the Disruptor guys should go next for NUMA-tolerant performance hacking.