JVM Language Summit

I spent last week at the JVM Language Summit, a small conference focused on non-Java JVM-based languages.  It was populated almost solely by either JVM implementors or bytecodes-are-my-new-assembler types.  It was a very fun & geeky conference for me; some of the best technical discussions I’ve had in a long time.  My favorite philosophical talk was definitely Erik Meijer talking about Fundamentalist Functional Programming.  In the end, he’s all for non-pure side-effecting programming BUT he wants the type system to “stop lying” about side-effects.  We both agreed that we’d love to see a functional programming language which had types for side-effects (e.g. Haskell’s IO “sin bin”), but with more elaborate side-effects.  I.e., if I’m writing my program with few-to-none side effects I might want to track them exactly (e.g. this function has type ‘side-effects field _code’) but if I call into the Swing or AWT libs I might want a type like ‘side-effects all Java variables’.


My favorite techy/JVM talk was Bernd’s Maxine VM.   I think he’s got a really good nearly pure-Java layer figured out for generating ASM code.  You want something that looks like running plain Java code (and does field reads & writes) that guaranteed must inline down to where the JIT turns it into atomic machine code loads & stores.  He’s missing lots & lots of major features (e.g. good JIT, good GC) but the framework looks nice.  If we ever are to get a true JVM-in-JVM (and skip bootstrapping your JVM via a C++ compiler which is what HotSpot does), I think Maxine has the best chance.


Most desired JVM feature: invokedynamic; this would cut huge chunks of evil code out of most JVM-based languages (but not Clojure & JPC, both of which target 1.5 JVMs more closely).
Most needed JVM feature without changing the spec: a trivial Escape Analysis suitable for dealing with endless Fixnum/BigNum math.  Most of these non-Java JVM languages do fixnum/bignum math (automatically overflowing fixed-int math into bignums) – which means every trivial bit of integer math creates a new Fixnum object.


I showed off Azul’s RTPM and got lots of oooo’s and aahhhh’s.  I convinced lots of people to sign up for academic accounts mostly for access to RTPM.  I was able to quickly diagnose a big problem w/JRuby (failure to inline a key trampoline, masked by +PrintInlining lying), and also got 20% speedup on the ASM library – the core bytecode parsing library in widespread use.


My own talk was well received:

Brian Goetz wrote:

your talk was both the most popular and highest rated, according to the comment cards. 

The slides are here although the postscript conversion mostly screwed up my transparent ovals.  Mostly I ran this trivial program on Azul’s JVM using these languages: Scala, Clojure, Jython, JRuby, JavaScript/Rhino, & JPC. I profiled them using RTPM and report on the profiling.  See the slides for the in-depth discussion.  3 things popped out:

  1. Clojure, JRuby, Jython, Javascript/Rhino all using fixnum/bignums for every trivial piece of math – which ends up allocating new FixNums for each simple integer operation. A major fix for this would be to implement the simplest possible Escape-Analysis. You’ll still end up with the overflow checks but the major cost here is the Fixnum allocation (and NOT the GC costs; the fixnums trivally die in the young-gen; direct GC costs are quite modest).
  2. JRuby was thinking that a key trampoline function was inlining – and it wasn’t.  See the slides for details, but hopefully we’ll see a JRuby with a substantial performance boost soon.
  3. I ran Clojure’s STM Traveling Salesman Problem using 600 worker ‘ants’ to optimize the path.  It ran straight-up without a hitch, no tweaks – and with great scaling (it did allocate about 20Gig/sec on a 200Gig heap but this is easy for an Azul box to cope with).  There may be hope for STM’s after all – IF the set of STM variables is limited to a handful of named shared variables, and not for dusty-desks that must assume all-is-shared.


A Plea For Programs

[Update 9/21/2008: I’ve got a simple sample program for people to port, plus I’ve got at least some code for – Clojure, JavaScript/Rhino, JPC & JRuby; missing Scala at least – thanks].


I would like some non-Java Java-bytecode programs to do performance testing, for a talk I’m giving this coming Friday (my bad for starting this late) and I’m hoping my gentle readers can supply some.  I’d like programs in different languages, but ones that are easy to setup and run.  I’m going to do internal JVM profiling, so I’m not all that concerned with the output or “Foo-per-second” results.  Ideally, my programs would be:

  • Non-Java.  Clojure, Scala, JPython, JRuby all come to mind.  The more variation, the merrier!
  • Easy setup.  I’m not an expert in any of these, so the resulting program has to be easy to setup and run.  Perferably a simple “java -cp Weirdo.jar FunnyProgram” command line.
  • Plain JVM.  Note that the ‘java‘ command has to be there; I intend to use Azul Systems’ JVM for profiling and we have our own.  Any kind of odd-ball jar or class files should be fine.
  • Long enough.  The program has to run for several minutes at least, without “babysitting”.  Long enough for the JIT to settle down (if it’s going to), and long enough for decent profiling.
  • Little I/O.  Besides DBs being a pain to setup, I’m really looking for CPU-bound programs.  Plain file I/O is fine, if the files are provided and can be scripted easily (e.g. “java -cp Weirdo.jar FunnyProgram < BigInput.dat > /dev/null”).
  • Be multi-threaded.  Not a requirement, but a definite nice-to-have.  Several of these languages support alternative threading & coherency models and I’d like to test these features.
  • Be Open Source, so I can post the collection for others to compare against.  This is NOT a hard requirement; I’m all fine with keeping private anything you request be kept private.  Performance profiling data *will* be released, as that is what the talk is about!  (I’m also fine with signing NDA’s but that’s probably not going to be an issue with this crowd).
  • An example: A multi-threaded Mandelbrot program would be fine, computing a 1000×1000 grid of points centered around (1.0,1.0) with a spread of (1.0,1.0) – so fill in the grid (0.5,0.5) to (1.5,1.5), using your choice of thread controls.
  • Please include any names, so I can give credit where credit is due.


I hope to discover things like:

  • How close does “plain code” match the JVM/JIT’s expectations?  How well does the JIT turn “plain code” into machine instructions?  I hope to present the JIT’d code for sample language constructs and detailed profiling data.
  • How well does the function-call logic match the JVM/JIT’s expectations?  Can trivial functions be inlined?  What’s the cost of a not-inlined function-call?
  • Other interesting costs?  (e.g., endless new-Class churning, endless new-bytecode churning causing endless JIT’ing; endless new weak-ref or finalizer creation causing GC grief, etc)
  • How well does the alternative threading & coherency scale?  Can Mandelbrot run on a thousand CPUs?  (I expect: trivially yes). How about programs with more interesting coherency requirements? 


I put a sample Java program here, if you’d like to port something really simple.  The inner loop of this program looks like: “for( i=0; i<1000000; i++ ) { sum += ((int)(sum^i)/i); }“.  The JIT’d assembly code from HotSpot’s server compiler looks like this, unrolled a few times:

2.83% 243 0x12d93878 add4      r5, r4, 1 // tmp=i+1; unrolled 8 times, this is #1
0.06% 5 0x12d9387c xor       r3, r5, r1 // sum in r1, tmp in r5                  
0.06% 5 0x12d93880 beq       r5, 0, 0x012d93b40 // zero check before divide             
0.35% 30 0x12d93884 div4      r0, r3, r5 // divide, notice cycles on next op      
2.64% 227 0x12d93888 add4      r1, r0, r1
// sum += (sum ^ tmp)/tmp               


As expected, there’s a pretty direct mapping from the source code to the machine code.  I’d like to see how other JVM-based languages stack up here. Email me directly with small programs, or post links here.



Chrome is at least Slightly Evil

Chrome did a number of Evil Things to me recently, including killing my FirePass VPN and AgeOfConan – even after a full reboot and never launching Chrome again…. but I get ahead of myself.  Here’s what happened:


I saw the Chrome announcement on SlashDot, amongst other places, read the comic and pretty much agreed with what the developers were saying.  I installed Chrome to give it a test drive.  Things went pretty well at first: the new interface was clearly going to take getting used to but also had a lot of strong points going for it.  Then I hit one of my not-so-techy websites.  Gack!  Screaming Flashing Flash ADS IN YOUR FACE, Enlarge Your X Life, Your Small Ego Needs A Bigger SUV, etc… Quick, turn on AdBlock!  But no, not available yet.


Without AdBlock, clearly Chrome isn’t suitable for general surfing yet.  So I went hunting for Chrome-friendly AdBlock replacements; I only found a gizmo called Privoxy that provides a filtering proxy service.  It was waaay less easy to use than AdBlock and didn’t provide that “you never even knew the ad was missing” experience that AdBlock achieves (i.e., I was left with lots of big holes saying “Prixovy chopped this ad out”).


So I decided to give Chrome a rest and wait for post-Beta (and presumably give the AdBlock guys some time to work their magic).  Meanwhile, Microsoft installed SP3 on my AMD/HP XP machine (Red Herring #1).  FireFox upgrades to 3.0.  Red Herring #3: I fiddled with my router’s port-forwarding rules in an effort to get 5 people behind the same router to play Diablo 2 on Battle.net (me, my 4 kids, plus also my brother Uncle Eric in Atlanta and our lifelong friend in Texas so no LAN party suggestions please)   – BTW D2 works fine for 4 players behind our router and the 5th can play in an unrelated D2 game, but when the 5th player attempts to join the same game Blizzard kicks out the last prior player to join – suggestions welcome on how to fix this!).


A day later and many reboots of both Man and Machine (and Router) I’ve given up on 5-player D2, and decide to play a little Age of Conan.  But nope, some weirdo message pops up about not connecting to the patch server.  I figure the AoC servers are down (no mention on the forums though) and will try again later.  So I try to log into work via FirePass & VPN – it’s also a no-go, with some weirdo message about having IE4.0 or later installed (gack).  I file an IT trouble ticket and go to bed.


Next morning I spend 1/2 the morning failing to get FirePass to work.  I spend hours surfing the web for people with similar problems; there’s a fair number of not-quite similar situations.  IT has a bunch of bland unhelpful suggestions, but FirePass is working for some folks with FF 3.0 and not others.  In retrospect it HAD been working for me, for about a day (the FF upgrade came in before the Chrome try).  The Web turns me on to the Microsoft XP SP3 + AMD debacle; I chase that one down all the next day – but that bug causes your machine to boot-to-blue-screen and thankfully I don’t have that problem. 


Web surfing with FF is fine (and ad-free!); telnet is fine; email is fine; nslookup & tracert is fine.  I get a putty/SSH channel going so once again I can (text-mode only) log in to work.  Good thing I’m an old-school Emacs-ian; text-mode Emacs is not quite a windowing O/S in-and-of itself; I manage to get some work done that day.  Still AoC is down and in my evening play hours I decide to try and figure out what’s going on.  I’m not alone; plenty of people are complaining about this no-connect-to-patch-server problem, but the AoC tech folks keep claiming the patch servers are up and the fault is at the client end.  And somebody mentions a “patcher.log” file dumped out when AoC startup does it’s patch attempt. 


This “patch.log” file provides the little bit of evidence I need to break the case.  Buried in there are successfully connect attempts from days of yore, plus my more recent failures.  The recent failures all mention – *TaDa!* – a proxy attempt using port 8118.  ?Huh?  Where did this funny port number come from?  I remember it from somewhere… aha – Privoxy’s default port.  Now I un-installed privoxy days ago; who can be remembering it’s port number (and feeding it to the AoC Patcher?)… only Chrome.


I have 4 browsers on my machine (IE, FF, Safari, Chrome) but only Chrome had a port 8118 typed into it and nary another mention of that port number has slipped my fingers anywhere.  Somehow Chrome is feeding AoC port 8118 – and nothing’s there on that port so AoC times out!  I go check out my default browser settings.  They say “Use my current Web browser”.  IE is disabled.  Chrome, FireFox & Safari are enabled.  Just to be sure, I click on FF as the default browser…. and Microsoft pops up and says that Chrome has to be de-installed to be made no longer available!  Safari can politely be made not-the-default, but not Chrome?


So I de-install Chrome.  During the de-install, Chrome asks for feedback using a web-browser… using the disabled IE instead of my (I thought!) default of FireFox.  I exit out of IE as fast as possible.  Now with Chrome removed, AoC starts fine.  So I check out FirePass – and Yup, it’s fine as well.


Evil #1 – Chrome became the default browser without permission.  If they asked to become the default, they hid it well.  I never allow new browsers to become the default without a few days prior surfing.
Evil #2 – Chrome silently prevented me from running FirePass VPN, hence telecommuting.  I assume they did it by handing FirePass the same crapolla proxy port somehow.  This happens even after a full power-cycle/reboot and not launching Chrome ever again.
Evil #3 – Chrome silently prevented AoC from launching, again by passing bogus proxy port numbers out.
Evil #4 – Chrome cannot be disabled by unchecking “Enable access to the program”, although Safari & IE can.
Evil #5 – The Uninstall launches IE, even though it’s flagged as “disabled access” and it wasn’t the prior default browser.  Typically I launch IE less than once/year and always because some other program ignores the settings and launches it anyways.
Evil #6 – No AdBlock. Not really a big Evil, just a big Annoyance.


Postlude – I attempt to file a bug with Chrome using FireFox.  But the Google site only allows non-crasher bugs to be filed via Chrome itself – and Chrome never crashed for me, it just broke other programs left and right.  No way I’m going to install it again so instead I attempt to file a “Chrome crashed” bug via the website.  Options are either “you got blue-screened” or “chrome died but you survived” and neither really fits the bill.  I fill in a somewhat abbreviated version of the above blog, then hit submit.  No go!  Google insists I download Yet Another Program and run it and report the results before I can file.  No Way, Google!  I just got big-time burned from the last program I downloaded from Google. 


Evil #7 – I can’t even file a bug report with Google (without downloading & running at least some code from Google!).  I know Googly’s read my blog.  Consider this your Bug Report.



Death by Decompilation

I recently got involved in trying to improve performance of a large customer application.  Azul’s JVMs come with an always-on profiling tool – RTPM – so it was easy to spot where the time was going.  The name has been changed to protect the guilty but the name might as well have been “addToSet”.  A quick check of the JIT’d assembly showed that all time was spent in spinning down a long array (an ArrayList actually), checking for duplicates.  In the end, no dup gets found and the new element gets appended.  Adding N elements gives us a classic O(N^2) algorithm.  The array was already >32000 elements in length at the time we profiled it, so time is proportional to (32K^2) ~ 1billion.  Ugly stuff.


Naturally (grumble) this code is in a large 3rd party not-open-source application, where the owning corporation has been recently purchased by an even larger Faceless Corporation.  The “proper approach” would be to file a performance bug and wait and wait and …


We voted to try and work around the problem by doing a little decompilation, hacking the ArrayList into a HashSet, and inserting a new jar file in the bootclasspath.  So we pulled this class file out of the pile-o-jar’s and hit it with the standard decompiler jad (www.kpdus.com/jad.html).  Presto!  About 2000 lines of fairly legible Java code.  Plus a pile-o-warnings about un-decompilable stuff (Ut-oh!) mostly around nested try/catch/finally clauses and inner classes.  I surf the Java code looking for other uses of the offending ArrayList – there are definitely some – and start to see places where jad bailed out.  When I turn javac loose on the new source file, I get lots of error messages.


No problem thinks I, I’ll just get another decompiler.  I immediately google-up cavaj (cavaj-java-decompiler.en.softonic.com).  A few moments later I get another Java file – and whoops!  It’s header mentions jad by name down to the exact same revision.  Sure enough this decompiler is a GUI front-end for jad complete with identically broken Java code.  Turns out there are a number of such GUI wrappers for jad.  So I search some more, discarding mocha (www.brouhaha.com/~eric/software/mocha) for being abandoned, ClassCracker (mayon.actewagl.net.au/index.html) because the demo version didn’t produce any Java code at all, and IceBreaker (www.breakertech.com) because their web site was down.  I also tried Java Decompiler (java.decompiler.free.fr) but it also produced loads of syntax errors although in different places than jad.  DJ (members.fortunecity.com/neshkov/dj.html) also choked on my class file but just barely.  Very long forward jumps spanning nested try/finally blocks would confuse it – but the inner classes came out looking nice.  A colleague also tried JDec (sourceforge.net/projects/jdec) with limited success.  I also stumbled across this nice list of decompilers from google.


In the end, I hand-integrated the code from JDec & DJ to get something that would compile.  I then hand-refactored the code to insert (probably re-insert) generics and Java6 style iterators – this step was entirely mechanical on my part but no decompiler I tried did it.  The resulting code was much smaller and more readable… and made a bug in the code more obvious (there are “before” & “after” Sets that periodically get unioned and during the union the dup-elimination code is broken and allows dups).  The actual replacement of ArrayList with a HashSet took only a few minutes. 


In summary: decompiling a single class file to reasonable-looking java code took about 6 hrs (and 7 decompilers looked at and 5 actually turned loose on the class file), while the refactoring to correct the egregious performance bug took maybe 10-minutes (yanked all the search loops and just used ‘contains’ and ‘addAll’) and included fixing the dups-allowed bug as a pleasant side-effect. 

Is it time for another Java decompiler bake-off?  Has somebody got a recent comparison of decompilers floating about?