Let us count the 1-bits in a natural number. Following example code can be typed in a C# CLI. I am using Mono.
csharp> Func<int,int> bc = (x) => { int c=0; while (x>0) { c+=x%2; x/=2; } return c; }
Add all these counts over the range 0 (inclusive) to 2^25 (exclusive):
csharp> Enumerable.Range(0,1<<25).Select(bc).Sum() 419430400
How much time did we use?
csharp> Time (() => Enumerable.Range(0,1<<25).Select(bc).Sum()) 00:00:02.5347170
But now we’re seeing the time but not the result. Better do this:
csharp> Time (() => Console.WriteLine(Enumerable.Range(0,1<<25).Select(bc).Sum())) 419430400 00:00:02.6447220
Now - the punchline: by adding just one word, .AsParallel()
, in the right place, we can execute this computation on a multi core machine, and be faster (and get the same result, of course):
csharp> Time (() => Console.WriteLine(Enumerable.Range(0,1<<25).AsParallel().Select(bc).Sum())) 419430400 00:00:00.9643620
Measurements were done with mono-5.4.0.201
on a machine with i7 920 at 2.67 GHz. Of course, absolute run times do not concern us - what’s interesting is the speed-up of roughly 3 in this case. The machine has 4 actual cores, so the program made nearly optimal use of the hardware, and it was trivial to write.
This is a nice showcase of (deterministic) parallel programming by annotation. We can view the .AsParallel()
as a hint to the compiler.
Actually, the annotation does change the type, as it turns an IEnumerable<int>
into a ParallelQuery<int>
, which we can think of as partition into several sub-enumerations, and which has a clever implementation of Sum()
.
There are better ways to count bits in a number, and the value of the sum can be determined without actually counting, but that’s beside the point. The point is that bc
burns some CPU cycles that no compiler knows how to optimize away, and it’s an ideal test case since we can easily compute the result (by one multiplication).
The version of Haskell implemented by the GHC compiler provides library and runtime support for actual strategy annotations (that don’t change the type). If x
is some expression that denotes a lazy data structure of type a
(in the simplest case, a list), then withStrategy s x
also has type a
, and the same value as x
. Here, s
is of type Strategy a
and describes what parts of the structure should be evaluated how (and some parts could be evaluated in parallel).
The following program is equivalent to the above:
bc x = if x > 0 then mod x 2 + bc (div x 2) else 0 main = print $ sum $ map bc $ take (2^25) [0 :: Int .. ]