On Safety and Speed

In Masterminds of Programming, James Gosling says:

  • One of the magics of modern compilers is that they’re able to "theorem-prove away" potentially all [array] subscript checks... You might do a little bit of checking on the outside of the loop, but inside the loop, it just screams.
  • [The VM] had a crew of really bright people working on it for a decade, a lot of PhD compiler jockeys.

To underline this, the following experiment can be done (in Z423)

Start with this typical piece of imperative code

for (int i = 0; i<n; i++) {
    for (int k = 0; k<n; k++) {
        int s = 0;
        for (int j = 0; j<n; j++) {
            s += a[i][j] * b[j][k];
        }
        res [i][k] = s;
    }
}

This is a naive implementation of matrix multiplication (full source code). It does not matter what the code actually does, the point is that the statement in the innermost loop

s += a[i][j] * b[j][k];

contains one multiplication, one addition, and 4 array indexing operations. Now Java array access is safe: the language requires that for each access, the index value must be checked to be in range.

We view the Java byte code with javap -c Run.java. The above loop gets compiled to

  static int[][] amul(int, int[][], int[][]);
          ...
      31: iload 7 ; iload_0 ; if_icmpge 63
          iload 6
          aload_1 ; iload 4;  42: aaload; iload 7;  45: iaload 
          aload_2 ; iload 7;  49: aaload; iload 5;  52: iaload 
      53: imul ; iadd ; istore 6 ; iinc 7, 1
          goto          31
      63: ...

This has 4 array access operations (aaload at 42 and 49, iaload at 45 and 52) and each is required to do two checks (compare to lower bound, compare to upper bound). A straightforward compilation would produce 8 comparisons.

The "magic of modern compilers" is in the Java virtual machine: it will hot-spot compile this code. We can observe actual assembly code with

java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Run

(notes on installing the disassembler) and the relevant part of the output (note the bytecode labels coincide with the above) is

  0x00007f0745112ce3: mov    0x10(%rdx,%r10,4),%r13d  ;*aaload
                                                ; - Run::amul@49 (line 9)
  0x00007f0745112ce8: mov    0xc(%r13),%eax     ; implicit exception: dispatches to 0x00007f0745112e2c
  0x00007f0745112cec: mov    0x10(%rsi,%r10,4),%r8d  ;*iaload
                                                ; - Run::amul@45 (line 9)
  0x00007f0745112cf1: cmp    %eax,%edi
  0x00007f0745112cf3: jae    0x00007f0745112d9d  ;*iaload
                                                ; - Run::amul@52 (line 9)
  0x00007f0745112cf9: imul   0x10(%r13,%rdi,4),%r8d
  0x00007f0745112cff: add    %r8d,%ebx          ;*iadd
                                                ; - Run::amul@54 (line 9)
  0x00007f0745112d02: inc    %r10d              ;*iinc
                                                ; - Run::amul@57 (line 8)
  0x00007f0745112d05: cmp    $0x1,%r10d
  0x00007f0745112d09: jl     0x00007f0745112ce3  ;*if_icmpge

This is our inner loop, and there are three computations (imul and add, as well as inc for increasing the inner loop variable) and just two tests (cmp). One (the last one) is needed for exiting the loop, so our observation is:

  • of the 8 required access checks, just one remains, and indeed the others had been theorem-proved away.