Software Engineering by Example

I develop and use the Leipzig Autotool system for automatic grading student homework in (theoretical) computer science.

The success of a project of that scope and size is based on several things. Here, I will focus on software engineering and tools (and ignore design, programming, and people).

Language, Compiler, IDE

Haskell, GHC, Without going into detail about language features, I just want to stress that static type checking is tremendously helpful when building large software systems.

For interactive development, the "read-eval-print loop" of GHC - which actually is "read-typecheck-eval-print", can be used to infer types. A nice (and recent) feature is holes, where a hole is a placeholder for an unspecified subexpression.

Here is an expression with a typed hole: _ > 8. When we enter this in ghci, we get Found hole: _ :: Integer. The hole stands for an expression, whose type is inferred and printed.

Here is an expression with a type hole (note the difference): [(True,8)] :: [ (_, Int) ]. When we enter this in ghci, it says Found type wildcard ‘_’ standing for ‘Bool’. The hole stands for a type, which is inferred and printed.

Of course, previous examples are trivial, and the feature shines when type expressions get larger.

For writing code, Intero for Emacs makes type information available when typing.

Documentation

From Autotool source code, we extract API documentation (showing data types and functions) with haddock (output). API documentation includes links to source code.

Packages and Build Tools

The language standard defines what a module is (a set of declarations that are located in one file). To make large projects manageable, they will consist of packages, where a package is a set of modules. A few basic packages come with the compiler, but far more are needed for building practical software. For Haskell, thousands of packages are available from Hackage.

Each package also specifies the set of packages (with version ranges) that it needs. The autotool application is itself a set of 20 packages that transitively depend on more than 100 third-party packages.

The cabal build tool computes the dependency graph, solves the version constraints (that is, determines which version from a range of acceptable version will actually be chosen), downloads source code archives and then calls ghc to compile packages in topological order.

For autotool, we do not use cabal directly, but stack. It serves the same purpose of building a set of packages. Cabal solves dependencies based on the current state of hackage, which is, in general, unpredictable (package authors can upload versions that contain breaking changes). Stack achieves deterministic (repeatable) builds by referring to a resolver. A resolver is curated set of packages: a subset of hackage with fixed versions that are known to work well together.

Source Code Management

Autotool sources are stored in git format for distributed version control. Ignoring a lot of technical detail, I just mention that one useful feature of git is that branches are cheap and easy. E.g., to repair a bug, we make a branch, change source files on that branch, and merge back to the main branch when we're convinced the bug is fixed. That way, we don't block other's work (by pushing unfinished work to the main branch).

Issue Tracking

Autotool has an Issue Tracker. The general idea is that we work on the source code only after we have submitted an issue that describes the problem that the change in the source is meant to solve.

We use a local instance of the gitlab platform to manage sources and issues (and CI, see below), and it provides nice integrated services. E.g., from the issue tracker, you can make a topical branch for this issue; when you reference an issue in a commit message, the link to the commit will show up in the issue tracker; and you can also auto-close issues if the message says "fix #399".

Earlier, Autotool used CVS (since 2002) for source control, and Bugzilla (since 2005) for tracking. When changing to git (in 2010) and gitlab (in 2015), we took care to preserve history. This worked fully for source control (initial commit) and mostly for issues (except that historic issues now had their date changed).

Automated Tests

Of course, the goal is to formulate all properties of software as types, so they can be checked statically. But Haskell's type system does not fully allow that (we have no dependent types, currently), so we need to resort to dynamic checks. Autotool uses

  • hspec for organising tests
  • smallcheck for property-based testing with automated type-directed generation of test cases (example)
  • fuzz-testing for generating interesting test cases by modifications of existing ones (more detail and examples)

We also compute test coverage with hpc via hpc integration in stack. Here is an example coverage report

Continuous Integration (CI)

Autotool uses Gitlab's CI mechanism: upon each commit, the project is built and tested, starting in a standard docker container. (Configuration A full run of this takes more than 1 hour.