In Masterminds of Programming, James Gosling says:
To underline this, the following experiment can be done (in Z423)
Start with this typical piece of imperative code
for (int i = 0; i<n; i++) {
for (int k = 0; k<n; k++) {
int s = 0;
for (int j = 0; j<n; j++) {
s += a[i][j] * b[j][k];
}
res [i][k] = s;
}
}
This is a naive implementation of matrix multiplication (full source code). It does not matter what the code actually does, the point is that the statement in the innermost loop
s += a[i][j] * b[j][k];
contains one multiplication, one addition, and 4 array indexing operations. Now Java array access is safe: the language requires that for each access, the index value must be checked to be in range.
We view the Java byte code with javap -c Run.java
. The above loop gets compiled to
static int[][] amul(int, int[][], int[][]);
...
31: iload 7 ; iload_0 ; if_icmpge 63
iload 6
aload_1 ; iload 4; 42: aaload; iload 7; 45: iaload
aload_2 ; iload 7; 49: aaload; iload 5; 52: iaload
53: imul ; iadd ; istore 6 ; iinc 7, 1
goto 31
63: ...
This has 4 array access operations (aaload at 42 and 49, iaload at 45 and 52) and each is required to do two checks (compare to lower bound, compare to upper bound). A straightforward compilation would produce 8 comparisons.
The “magic of modern compilers” is in the Java virtual machine: it will hot-spot compile this code. We can observe actual assembly code with
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Run
(notes on installing the disassembler) and the relevant part of the output (note the bytecode labels coincide with the above) is
0x00007f0745112ce3: mov 0x10(%rdx,%r10,4),%r13d ;*aaload
; - Run::amul@49 (line 9)
0x00007f0745112ce8: mov 0xc(%r13),%eax ; implicit exception: dispatches to 0x00007f0745112e2c
0x00007f0745112cec: mov 0x10(%rsi,%r10,4),%r8d ;*iaload
; - Run::amul@45 (line 9)
0x00007f0745112cf1: cmp %eax,%edi
0x00007f0745112cf3: jae 0x00007f0745112d9d ;*iaload
; - Run::amul@52 (line 9)
0x00007f0745112cf9: imul 0x10(%r13,%rdi,4),%r8d
0x00007f0745112cff: add %r8d,%ebx ;*iadd
; - Run::amul@54 (line 9)
0x00007f0745112d02: inc %r10d ;*iinc
; - Run::amul@57 (line 8)
0x00007f0745112d05: cmp $0x1,%r10d
0x00007f0745112d09: jl 0x00007f0745112ce3 ;*if_icmpge
This is our inner loop, and there are three computations (imul
and add
, as well as inc
for increasing the inner loop variable) and just two tests (cmp
). One (the last one) is needed for exiting the loop, so our observation is: