Andrew Myers

Security, programming languages, and computer systems


Java vs. OCaml vs. Scala

I just inadvertently ran a poorly controlled experiment on the relative virtues of these three programming languages, at least for the job of writing compilers. Despite the nonexistent experimental protocol, I thought the results were pretty interesting—even if they may irritate some of my PL colleagues.

In my compilers course at Cornell (CS 4120), I let the student project groups choose whatever language they wanted to implement a complete optimizing compiler that translated an object-oriented language into x86-64 assembly code.

There were 26 project groups. Most of them used Java as the implementation language, following our suggestion. But there were a few bold and enthusiastic groups that chose other languages. Three groups used OCaml and two groups used Scala. The support for Java was slightly better—they were able to use our latest parser generator tech—but I don’t think this was a huge leg up.

Interestingly, the choice of language did not seem to be a big factor. The four groups whose compilers did best on our fairly extensive suite of tests all used Java. I also measured the size of the compiler code in non-comment tokens, since that seems like a better measure of code size than characters or lines. Across all project groups, the average length of the compiler code was 93k tokens, and the four top groups wrote compiler code comprising an average of 97k tokens. (Some of them implemented extra features such as optional optimizations, so it’s not surprising they were a bit longer on average.) On the other hand, the OCaml groups wrote 90k tokens on average; the Scala groups were a bit shorter at 74k tokens. Not the big difference some people might expect.

Obviously, there are a lot of confounding factors here: maybe students have more experience with Java, though almost everyone in the class had taken an OCaml programming course. Perhaps the better IDE support for Java makes a big difference. On the other hand, we might expect that students bold enough to use a different language are stronger programmers on average.

But for all the flak that Java takes, it seems to have served the students in the course well.

Limits of Heroism

There has been much nice work lately on proving that complex software is written correctly, including components like operating systems, compilers. But it’s hard to see how to scale these heroic efforts to the vast amount of software that is being written every day. It’s simply too hard to build software that is correct and secure. Much of the problem arises because of the low level of abstraction at which software is written. We’re fighting a war on which (not to be dramatic!) the future of civilization may depend, and right now we’re only winning scattered battles. The problem is that we’re fighting this war on the wrong terrain—such as the terrain of C and C++ code, where even the most basic security and correctness guarantees are absent. We need new language design research that moves the field of engagement to more favorable terrain, because your strategy for winning a war can’t rely on the existence of heroes.