In Masterminds of Programming, James Gosling says:
To underline this, the following experiment can be done (in Z423)
Start with this typical piece of imperative code
for (int i = 0; i<n; i++) {
for (int k = 0; k<n; k++) {
int s = 0;
for (int j = 0; j<n; j++) {
s += a[i][j] * b[j][k];
}
res [i][k] = s;
}
}
This is a naive implementation of matrix multiplication (full source code). It does not matter what the code actually does, the point is that the statement in the innermost loop
s += a[i][j] * b[j][k];
contains one multiplication, one addition, and 4 array indexing operations. Now Java array access is safe: the language requires that for each access, the index value must be checked to be in range.
We view the Java byte code with javap -c Run.java. The above loop gets compiled to
static int[][] amul(int, int[][], int[][]);
...
31: iload 7 ; iload_0 ; if_icmpge 63
iload 6
aload_1 ; iload 4; 42: aaload; iload 7; 45: iaload
aload_2 ; iload 7; 49: aaload; iload 5; 52: iaload
53: imul ; iadd ; istore 6 ; iinc 7, 1
goto 31
63: ...
This has 4 array access operations (aaload at 42 and 49, iaload at 45 and 52) and each is required to do two checks (compare to lower bound, compare to upper bound). A straightforward compilation would produce 8 comparisons.
The “magic of modern compilers” is in the Java virtual machine: it will hot-spot compile this code. We can observe actual assembly code with
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Run
(notes on installing the disassembler) and the relevant part of the output (note the bytecode labels coincide with the above) is
0x00007f0745112ce3: mov 0x10(%rdx,%r10,4),%r13d ;*aaload
; - Run::amul@49 (line 9)
0x00007f0745112ce8: mov 0xc(%r13),%eax ; implicit exception: dispatches to 0x00007f0745112e2c
0x00007f0745112cec: mov 0x10(%rsi,%r10,4),%r8d ;*iaload
; - Run::amul@45 (line 9)
0x00007f0745112cf1: cmp %eax,%edi
0x00007f0745112cf3: jae 0x00007f0745112d9d ;*iaload
; - Run::amul@52 (line 9)
0x00007f0745112cf9: imul 0x10(%r13,%rdi,4),%r8d
0x00007f0745112cff: add %r8d,%ebx ;*iadd
; - Run::amul@54 (line 9)
0x00007f0745112d02: inc %r10d ;*iinc
; - Run::amul@57 (line 8)
0x00007f0745112d05: cmp $0x1,%r10d
0x00007f0745112d09: jl 0x00007f0745112ce3 ;*if_icmpge
This is our inner loop, and there are three computations (imul and add, as well as inc for increasing the inner loop variable) and just two tests (cmp). One (the last one) is needed for exiting the loop, so our observation is: