Lilypond for Programmers

In the words of the authors, “LilyPond is a music engraving program, devoted to producing the highest-quality sheet music possible.”

Lilypond has a wealth of documentation. Most of it consists of examples that show syntax and semantics of individual features. I find it difficult to infer from that the underlying principles of the semantics of the language.

It seems that the documentation targets (semi) professional musicians that are hobby programmers (at best). I am a professional computer scientist and a hobby musician (at best). I find that lilypond docs are lacking in structure and exactness when compared to definitions of general or domain specific programming languages that I usually work with. (Of course there are enough under-documented programming languages and libraries out there, but I avoid them.)

There is a subset of lilypond documentation targeted at programmers that want to extend lilypond. We can infer some of the underlying semantics from that, but the description is mixed with implementation detail that a plain “user” does not need. Still, interesting reading.

So, the following is my attempt to explain lilypond in a “semantics first” way.

And I do think this kind of documentation would serve both groups of users well. Certainly a musician is capable of structured thinking? Music is structure that you can hear. But I am digressing.

I should add that my application area is writing scores in the style of “real books” (for jazz and rock), to be used in a hobbyist band. That is, a song absolutely must fit on one page, and you want to see global structure, chords, voice; plus perhaps some extra snippets for a typical instrumental or vocal accompaniment, or break. For reasons of copyright, I don’t publish these scores. But I show my default empty score. So, my typical score will exercise most of the basic features of lilypond, but probably just very few of the advanced ones.

While writing this text here, I am trying to remember how I learned lilypond, and what was the information that I was missing the most. In several cases, I later found that the information had been in the docs all along, but I did not know where to look. Sometimes, I asked on the mailing list and I do appreciate the help I got.

Also, I invite comments (by email) on this text. I do not necessarily want to make this text longer than it already is, but I do want to repair errors, and perhaps add specific links to original documentation.

Lilypond as a Compiler

lilypond is a command line program that reads a textual description of music and produces a graphical description and MIDI.

A very short lilypond program is

\score{  \new Staff { c d e f } \layout{} \midi{} }

Put this in foo.ly, call lilypond foo.ly, it will produce files foo.pdf and foo.midi.

Notes:

  • You can add extra parameters that control the output, in a lot of places (e.g., before \score, inside \layout, inside \midi).
  • You can leave \layout{} (it will still produce pdf).
  • If you remove \mid{}, it will not produce MIDI.
  • There can be more\score{} blocks in one lilypond program, each one will produce separate output file(s).

The PDF file can be rendered on screen, or be sent to a printer The MIDI file can be rendered (to sound that you can hear) by an external (hardware) synthesizer device, or by a software synthesizer running on your computer (e.g., timidity). Details of rendering are outside the scope of this document. There are WYSIWYG environments (IDEs) that integrate a text editor, and hide the actual command line call. Of course I never use these.

Structured Music

Music is one of

  • an atom:
    • a single note (c4) that has pitch (c) and duration (4)
    • a visible rest (r4) that has a duration (4)
    • an invisible rest (silence) (s4) that has a duration (4)
  • a sequential composition of scores (the parts should occur one after another, the composition operator is { .. })
  • a parallel composition of scores (the parts should occur (start) at the same time, the composition operator is << .. >>, the semantics is “start all components at the same time, and finish when the last component finishes (?)”)
  • a transformation of a score, e.g., \transpose c d { .. }, There also is \shiftDurations to scale the duration of notes There probably are others, e.g., see \repeat unfold below.

This is the “pure data” aspect of music.

For reference, here is the prototypical example of an algebraic (tree-structured) data type (ADT) that represents music, see paper by Paul Hudak et al., 1996 usage example. I guess that internally, lilypond does much the same, but since it’s LISP (not Haskell), the internal representation is only weakly (dynamically) typed (not statically).

In fact here is an ADT that represents lilypond’s concept of music. (by Rohan Drape)

Rendering of Structured Music

There is also a “processing” aspect for scores.

Music, as defined above, is ultimately put inside the \score{ } environment for rendering.

For production of the PDF, musical atoms will be assigned to specific parts of the page: notes appear on staffs, chord symbols appear above staffs, lyrics appear below staffs. Also, for the production of MIDI, atoms will be assigned to specific parts of the output: notes appear on certain MIDI channels.

These assignments are handled by contexts. A Context is described by (or associated with?) a block declaration like \new Staff { ... } or \new Voice { ... }.

There seems to be an implied, or intended, nesting of contexts:

  • a score consists of staffs (e.g., for different voices of a choir),
  • a staff consists of voices (they share the staff, but they could be distinguished by having stems up for one, down for the other),
  • and there can be parallelism inside one voice (e.g., sounding two strings of a guitar)

This interacts (sometimes in inexplicable ways) when nested arbitrarily - which is allowed by the language, and sometimes required to get a specific result. E.g., this works (\new Staff inside a Voice):

c d e f << { g a b c } \new Staff { g g g c } >>

(it produces an extra staff, extending over one bar, running in parallel) but this does not:

c d e f << { g a b c } \new Staff { g g g c } >>
c d e f << { g a b c } \new Staff { g g g c } >>

(it produces another extra staff but it’s on a different level, so overall it wastes space). Work-around suggested here

There is some magic: in some places, specification of context can be omitted, and then some default context is re-used, or created. E.g., a Staff already has an implied Voice, so if you \new Voice inside it, then you actually get the second voice (?)

For me it is difficult (or, “magic”) to refer to contexts, but in some cases it works. E.g., it is possible to add lyrics to a voice, where the syllables of the text are aligned with the notes in the voice.

In some places, elements inside one context do appear to influence the processing of others contexts. E.g., if we have a parallel composition of two staffs, each with a voice, and one voice contains a \repeat volta, then this is actually displayed also for the parallel staff. (But I think it does not effect a repeat when the MIDI for the other staff is produced.)

Abstraction Mechanisms inside Lilypond

In addition to writing scores literally (each note/rest that you write in the .ly file, creates one visible note/rest in the .pdf file) we can use mechanisms that save typing by re-using code.

The most important one (and its usage is highly recommended in the docs) is: you can define names that then denote other things (typically, music):

foo = { c d }
bar = { e f \foo }
\score{ \new Staff { \foo \bar } }

I habitually use this feature for

bpm = 120
...  \new Staff { \tempo 4 = \bpm ... }
...  \midi { \tempo 4 = \bpm }

This sets the tempo for the MIDI file, and also makes it visible on the score sheet.

I think the following properties are true for names:

  • names must be global (you cannot define a name inside a context?)
  • names cannot have arguments (they denote constants, not functions?)
  • there is no branching (no conditional) (but you can use it in embedded LISP code)

These are heavy obstacles when trying to use lilypond in a “programming language” way.

Note: also \repeat unfold is a notation for re-using code, see next item.

A Note on Repeats

\repeat unfold N { .. } where N is a number, and the argument will be composed sequentially with itself N times.

Note: composition is sequential. These two are not equivalent (the first one does not give two parallel staffs):

\score{ << \repeat unfold 2 \new Staff { c } >> }
\score{ << \new Staff { c } \new Staff { c } >> }

Lilypond has several ways to denote repeats. If you want a compressed representation on the page, but all notes in the MIDI, then you must do

music = << ... >> % <-- bulk of the text goes here
\score{\music \layout{...}}
\score{\unfoldRepeats \music \midi{...}}

This is documented (but I did not know it for a long time.)

The abstract syntax for repeats with alternative endings

\repeat volta 2 { c4 d e f }
\alternative {{ g a b c} {f e d c }}

is broken: the semantics (for both graphics and MIDI rendering) of the first part (first line) depends on the second, while the appear syntactically separate. But MIDI rendering is correct (with \unfoldRepeats as described above).

Abstraction Mechanisms outside Lilypond

There is a way to write more powerful programs: lilypond programs can contain LISP code, since the lilypond compiler is using this anyway under the hood (actually: the GUILE Scheme dialect of LISP).

Analogy:

  • a C program can contain assembly code - you can embed LISP code that describes music
  • also: a C program can use the CPP preprocessor - you can embed LISP code that transforms music given in lilypond notation.

Inside “lilypond mode” you can embed a range in “LISP mode” by #( .. ). Inside LISP mode, you can embed lilypond mode by #{ .. #}.

Combined example:

#(define (foo x y) (if x y) )
\score {
  #(foo #t #{c d e#} )
  #(foo #f #{c d e#} )
}

Inside LISP mode, you can use functions that you defined, and also functions from libraries

There is also $( .. ) to embed LISP but it has somewhat different semantics: code embedded that way is executed already while parsing (?) so you’ll be surprised by visibility problems. Some declarations may not have been elaborated when you think they should be.

Built-in Support for Writing Notes

Quite often, the basic building blocks of scores are sequences of notes. Lilypond has invented notation for writing these while saving keystrokes.

This is generally well-documented already, as it is a purely local thing, and does not refer to any global structural concept.

Relative pitch: pitch is denoted by letters c,d,e,.. The octave is left implicit.

If notes x y are adjacent (subsequent) in the text, then the octave of y is chosen in such a way that the resulting pitch is nearest to the pitch of x. This means that g a b c works (it is an ascending sequence, and the finalc is from the next octave, one half tone above b). If you don’t want this, then you can jump an octave higher with y', or lower with y,. This works nicely except that I think these annotations are in the wrong place - it should be 'y and ,y because you are first changing the octave, then setting the note. A.G. comments that in fact it should be ' y (and , y), i.e., two separate items: the first is an invisible, inaudible, zero-time atom that just moves the “current position” by an octave, the second one then is relative to this.

Another problem is that the “x before y” relation refers to the order in the source code. So if one voice ends with x, and next note y that is in the next line of the source code may be for a different staff, different voice, at a totally different time - but it is still pitch-relative to x. The recommended work-around is to put notes inside a block \relative c' { ...} - this relates the block contents to the given absolute pitch.

Default duration: duration is denoted by numbers: c4 denotes a fourth, c4. denotes an extended fourth (c4 ~ c8). If you omit the denotation of the duration, then the duration of the previous note is used. So you can write c4 d e f as a shorter equivalent of c4 d4 e4 f4. Again, this has the same problem as relative pitch: if you omit the declaration, then lilypond uses whatever it has read previously, even if that’s unrelated musically. The recommended work-around is to state the duration for the first note in a bar, or any group of related notes.

? It is also possible to change the duration by *, as in s1*10 (this gives 10 bars of silence) but this does not work for visible notes as it does not change their appearance (so you don’t know that duration is multiplied)

Bar checks: Useful feature! Each staff has a time, and you can set it as in \time 6/4. If you don’t set it, it is \time 4/4. This defines the length of a bar. When you write a sequence of notes, you can add | (vertical bar). The semantics of a “bar check” is: if we are indeed at the start of a bar, then remain silent, else complain. So, if you use bar checks, you don’t have to mark each bar, but when you put a mark, it will be checked. As I said, very useful.

A.G. reminds me that this also works with ties across bars: we can write

{ \time 4/4 c4 d e f8 g8 ~ | g1 } }

I must have assumed that ~ and | are both operators, and concluded that you could syntactically use only one of them.

Now, what I would want from lilypond is checks for

  • a music expression does have a specified total length, something like { c4 d .. } :: 2 (meaning 2 whole notes)
  • music expressions inside << .. >> have equal length.

Discussion

The big challenge in the design of a textual music (representation) language is the following: You have to go from source text (what you type when you create the score) to score sheet (what you look at when you play) to actual music (what you hear). You cannot put each note in the source by itself, and also not in the score sheet, so you use abstractions. There are two kinds:

  • abstractions that appear on the sheet (e.g., volta signs, chord names, staffs) and are interpreted by the musician,
  • abstractions that appear in the source (e.g., \repeat unfold, named music expressions) and are interpreted by lilypond.

Why the difference?

  • “visible abstractions” (that is, notational conventions for sheet music) are well-established over hundreds of years, so every-one knows them, but no-one would agree on extensions,
  • and every extension would run into the limitation that the human brain has bounded processing power (while playing an instrument, you cannot usually run an interpreter for full lambda calculus in your brain - so lilypond did that for you)

My main critique of the lilypond language design is

  • it does not provide flexible enough abstractions (as I said above, we can have names, but only globally, and without arguments). But I can always write LISP directly.
  • Also, there is a “magic” mix of pure data (music as a tree) and processing (music events processed by contexts). I do not know what can I do (programmatically) with the music after it has “gone through” a context? Is this even a sensible question to ask?

I’d like to end this with a positive statement: Above all, lilypond does work as advertised. I can write “real book”-like lead sheets “from scratch” (that is, from the audio) with a reasonable amount of work (say, 1 hour per song, for a first playable approximation), then lilypond renders them nicely, and we use them for practice in the band. I can also produce MIDI and play along for self-study.

The “pain” of using lilypond was never strong enough for me to say “yikes, I’d rather write Haskore and have that compiled to lilypond”. Of course, it cannot hurt to try alternatives, say, hly and others, also mentioned there.