Index of /~waldmann/talk/22/isr

Name	Last modified	Size

Parent Directory		-
ISR22CPR.hs	2022-09-22 18:00	190
README.html	2022-09-30 13:17	50K
blast-fixed-z001.hs	2022-09-22 18:00	2.9K
blast-naive-z001.hs	2022-09-22 18:00	1.4K
blast-sym-z001.hs	2022-09-22 18:00	3.2K
constraint.pdf	2022-02-15 14:01	81K
ex1.hs	2022-09-22 18:00	369
ex3.hs	2022-09-22 18:00	557
ex6a.hs	2022-09-22 18:00	1.8K
g03.hs	2022-09-22 18:00	1.0K
g03sol.hs	2022-09-22 18:00	1.1K
g10.hs	2022-09-22 18:00	1.2K
qfbv-z086-fixed.hs	2022-09-22 18:00	4.2K
qfbv-z086-wrong.hs	2022-09-22 18:00	2.2K
qfnia-z086.hs	2022-09-22 18:00	2.5K

README

Constraint Programming for Analysis of Rewriting

Examples and Exercises for course “Constraint Programming for Analysis of Rewriting” at Intl. School on Rewriting (ISR 22), Tbilisi, Georgia.

Get the current version of this document, plus all examples, from

git clone https://gitlab.imn.htwk-leipzig.de/waldmann/isr22-cpr.git

Rendered version of this document is at https://www.imn.htwk-leipzig.de/~waldmann/talk/22/isr/

Introduction

Constraint Programming

A Constraint Programm (CP) is a logical formula ∃x : F(x) where x is a vector of variables (unknowns) and F is a quantifier-free formula (and x is the set of free variables of F)

A Constraint Solver gets a CP and determines its truth.

CP is true: F is satisfiable: solver outputs a satisfying assignment for x
CP is false: F is not satisfiable: solver outputs a proof.

A solver contains advanced algorithms and optimized implementations for domain-specific searching. Thus, CP frees the application programmer from programming application-specific search.

For analysis of rewriting, we focus on satisfiability. I don’t know any application (in rewriting) where we can conclude some interesting property P of a rewrite system from a proof of unsatisfiability of some CP that we write. The reason is that usually, we have CP ⟹ P but not the other way around.

Example: P is termination of some rewriting system R, and CP is existence of a specific proof (certificate) for termination of R.

Boolean Satisfiability (SAT)

A SAT program is a CP where the domain of each unknown is Boolean, and F is a formula in propositional logic.

Example: ∃p, q ∈ B : (p∨¬q) ∧ (¬p∨q)

The DIMACS format defines concrete syntax for F in conjunctive normal form (CNF): Example

p cnf 2 2
 1 -2 0
-1  2 0

Kissat solves this:

$ kissat -q ex1.cnf 
s SATISFIABLE
v 1 2 0

Discuss: why CNF, and not DNF?

Answer: CNF is a low-level language (good for solvers) that still is expressive enough:

there is a linear time algorithm (Tseitin Transform) that converts an arbitrary F to some G in CNF (with extra variables) such that ∀x : F(x) ⇔ ∃y : G(x,y).

For our course, we use the Ersatz library https://hackage.haskell.org/package/ersatz that does this transform (we never need to think about it).

DNF is equally low-level, but it has no such algorithm (unless P = NP). Discuss: what about the standard algorithm that computes an equivalent DNF?

In this course, we will see these applications of SAT for analysis of rewriting

(finite) abstract rewriting systems (unknowns denote edges of directed graph)
derivations in string rewriting (unknowns denote letters)
bit-blasting for matrix interpretations (unknowns denote binary digits)

Satisfiability modulo Theories (SMT)

A Theory on a domain D (example: the theory of linear inequalities on R) is given by a signature (symbols for functions and relations on R) and their intepretation.

A SMT program (for theory T on domain D) is a CP where the domain of each unknown is D, F is a Boolean combination of atoms, where
an atom is a relation symbol applied to terms, where
a term is a variable (an unknown) or a function symbol applied to a list (possibly empty) of terms.

Example: ∃x, y ∈ R : 0 ≤ x ≤ y ≤ 1 ∧ 2x + y ≥ 1

SMTLIB (Clark Barrett, Pascal Fontaine, and Cesare Tinelli, 2016–) https://smtlib.cs.uiowa.edu/ defines a concrete syntax, Example

(set-logic QF_LRA)
(declare-fun x () Real)
(declare-fun y () Real)
(assert (and (<= 0 x) (<= x y) (<= y 1)))
(assert (>= (+ (* 2 x) y) 1))

and Z3 (Nikolaj Björner, Lev Nachmanson, Christoph Wintersteiger, Leonardo de Moura, 2008–) https://www.microsoft.com/en-us/research/project/z3-3/
solves it

$ z3 ex2.smt2 
sat
((x (/ 1.0 3.0))
 (y (/ 1.0 3.0)))

In practice, one always allows Boolean unknowns as well (because they could be simulated by theory atoms), example

∃p ∈ B, x ∈ R : (p⟹x≥1) ∧ (¬p⟹x≤0),

and sometimes, allows other combinations of theories.

In this course, we will use SMT to specify coefficients of

linear interpretations
matrix interpretations

for termination certificates for string rewriting.

Why do SAT/SMT solvers exist (who benefits, who pays)?

Industrial applications (therefore, sponsorship) in

hardware design/analysis (Boolean circuits)
software design/analysis (bounded model checking, properties of (abstracted) programs)

(see affiliations/co-operations of (current/former) authors of solvers)

Second item (“software”) includes rewriting, as it is a model of (possibly non-deterministic, concurrent) computation. Our course shall illustrate the thesis: some properties of rewrite systems can be analyzed by methods from constraint programming. Example: most of current termination provers do this, https://termcomp.herokuapp.com/Y2022/.

Methods of rewriting are being used in SAT/SMT solvers (Manipulation of syntax trees that represent formula or proof).

SAT

How to use Ersatz for SAT encoding, and run examples

ex1.hs realizes the previous example programmatically (i.e., the formula is written by a program).

How to build and run:

$ cabal update
$ cabal build        -- build dependencies
$ emacs ex1.hs &     -- start editor, as background process
$ cabal repl         -- start interpreter
ghci> :l ex1.hs      -- load source file
ghci> :i exists    -- information on name (value, type)
ghci> :doc (&&)  -- documenation of name (if available)
ghci> :t reverse "foo" -- type of expression
ghci> reverse "foo"  -- value of expression
ghci> :main          -- run main function
-- change file in editor, then
ghci> :r             -- reload

alternatively,

$ stack haddock   -- build dependencies and documentation locally
$ firefox .stack-work/install/x86_64-linux/*/9.2.4/doc/all/index.html
$ stack repl      -- as above

Concrete Rewriting

Motivating problem: is the string rewriting system Gebhardt/10 terminating?

(RULES  0 0 0 0 -> 0 1 1 1 , 1 0 0 1 -> 0 0 1 0)

is length-preserving, non-terminating iff cycle
alphabet has two elements, unknown letter is unknown Boolean

implementation g03.hs includes easier test case (shorter loop) Gebhardt/03 as well.

In the code, you will see underscores, called “holes”. You task is to replace these with actual expressions. With -fdefer-typed-holes, GHC will warn about holes (it shows their type, and proposes some expressions of that type) but will accept the code statically. Dynamic semantics of a hole is to raise an exception. You can evaluate expressions that don’t involve holes. This allows for step-wise solution of exercises. Cf. https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/typed_holes.html

For this code:

look at the types of any, all, and the type class Boolean that is used there. Explain import Prelude hiding (any, ...).
look up the type of (===). Why does it work for lists (where is the implementation)?

Cf. Harald Zankl et al., Finding and Certifying Loops, SOFSEM 2010, https://dblp.org/rec/conf/sofsem/ZanklSHM10 http://cl-informatik.uibk.ac.at/users/ami/research/publications/proceedings/10sofsem Section 2 shows a SAT encoding for loops in typical “assembly language” style (each variable and clause is given explicitly). Our description is a functional program that is evaluated symbolically (it produces a formula). Discuss the differences:

Zankl et al. treat the general case, we don’t.
We cannot easily give the size of the formula.
Spot the in-efficiency in our code. Some comparisons are duplicated, the Tseitin transformer in Ersatz does not detect this.

Some Background. Previous example can be seen as an illustration of a proof of the Cook-Levin-Theorem (1971) (SAT is NP-complete), where the computation of a non-deterministic polynomial-time Turing Machine M on a given input gets encoded in a polyomial-size formula F (computation is successful (that is, input is accepted by M) iff F is satisfiable). The unknowns of F encode tape contents, head position, state.

Back to Rewriting. It was conjectured by Y. Metevier (proved by A. Bertrand, TCS 1994, https://doi.org/10.1016/0304-3975(94)90066-3) that each terminating one-rule length-preserving SRS has at most quadratic derivation lengths.

Exercise (homework): Modify previous program so that the length-preserving one-rule SRS, and the derivation, are unknown. Then find SRS with longest derivations, for parameters: width of rule, width of start string. Compare with Bertrand’s conjecture.

Abstract Rewriting

An abstract rewriting system (ARS) is a collection of relations (directed graphs). For this course, we consider finite ARS only. That is a severe restriction but we still get some interesting examples.

Exercise ex3.hs: find relations R, S such that R is transitive and S is transitive but R ∪ S is not transitive.

Discuss: what about the same question but for R ∩ S.

Hint: use data type, operations, and properties, of Ersatz.Relation https://hackage.haskell.org/package/ersatz-0.4.12/docs/Ersatz-Relation.html

Discuss: A relation is represented as a matrix of (unknown) Booleans.

implementation of transitive r = implies (product r r) r
implementation of implies r s (answer: point-wise implication)
implementation of product r s (answer: Boolean matrix multiplication)

Discuss: transitive closure of a relation R.

Not like this (what’s wrong with it?)

transitive_closure r = do
  s <- relation (bounds r)
  assert $ transitive s
  assert $ implies r s
  return s

but like this: if the domain has n elements, then R⁺ = ⋃_{1 ≤ k ≤ n}R^k.

More abstract rewriting properties: following examples taken from: Hans Zantema: Finding small counter examples for abstract rewriting properties (MSCS 2018 https://dblp.org/rec/journals/mscs/Zantema18) https://www.win.tue.nl/~hzantema/carpa.html

Discuss implementation of

diamond property
local confluence
confluence

and check with actual library (or fill in missing detail)

Exercise ex6a.hs find R that is locally confluent but not confluent.

Exercise: implement: R is terminating (has no infinite paths) Hint: domain of R is finite. Then R is terminating iff R has no cycle (closed path).

Exercise: are there R, S such that R ∘ S is terminating, but S ∘ R is not? - No. This is a test for how good the SAT solver is at detecting unsatisfiability.

Discuss: how to compute unions of powers R^k efficiently?

What does “efficient” mean, anyway?

solver runs fast (what we ultimately want)
formula is small (what we can measure in isolation)

Exercise: implement R⁺ with n³log n formula size (it uses log n matrix multiplications).

Exercise: implement R⁺ with size n³ only. Hint: you learnt this in formal language theory (transform NFA to Regexp: L_p, x, q is the language of all paths from p to q with intermediate states ≤ x) We only need a very simple form of that idea here.

Exercise (homework): do some of the examples in Zantema’s paper that require larger universes, e.g.,

S confluent, R confluent, R⁻ ⋅ S ⊆ (S∪R^∗) ⋅ (R^∗)⁻, (R∪S) not confluent
S complete, R complete, rest as above.

Use the (sometimes naive) encodings from our class, ignore tricks (e.g., encoding with extra relations) from the paper. Email me your code (tonight) so we can discuss tomorrow.

Remark by Aart Middeldorp: instead of confluence, use semi-confluence. This is equivalent, but often makes better (for solving) formulas.

How does a SAT solver work

SAT is NP-complete, so we don’t know how to solve it efficiently in all cases (else, we’d have proved NP = P). The obvious complete enumeration of all assignments takes exponential-time. It can be enhanced with clever data structures, and heuristics, resulting in solvers that are surprisingly powerful (for a surprisingly large number of formulas). As a rough orientation, formula with 10³, 10⁴, 10⁵ variables can usually be solved (decided) in one second, minute, hour. Of course, there are exceptions (large easy formulas, small hard formulas).

Regular SAT competition evaluation http://satcompetition.org/ shows state of the art, and progress. We can just pick current year’s winner, or the solver that’s easiest to use programmatically, see below. Historically, that’s been minisat (Een and Sorensen 2003) http://minisat.se/ now it seems to be cadical/kissat (Biere 2020) http://fmv.jku.at/kissat/. I use kissat in my matchbox termination prover https://gitlab.imn.htwk-leipzig.de/waldmann/pure-matchbox.

Implementation of SAT solves is not the topic of this course, so we mention just three basic ideas:

DPLL (Davis, Putnam, Logemann, Loveland, 1961)

complete (backtracking) search over partial assignments
with early propagation (using unit clauses for extending assignments)
and early conflict detection (conflict is empty clause, created by propagation)
clever data structures and algorithms for propagation
heuristics: what variable to assign next. E.g., the one that occurs most often in recent conflicts.

CDCL (conflict driven clause learning, 1996)

when DPLL detects a conflict (and backtracks), compute a clause that will avoid that conflict for the future, and add to CNF (it does not change semantics, since it is implied)
heuristics: there will be many learnt clauses, sometimes we need to forget some of them

Preprocessing (pioneered by Minisat/SatElite , Een and Sorensen, 2003 ..)

remove variable by complete resulution: ⋀_i(F_i∨p) ∧ ⋀_j(¬p∨G_j) resolves to ⋀_i, j(F_i∨G_j)
heuristics: pick p such that number of clauses does not increase too much

SMT

SMT for Termination

Examples will be for string rewriting systems (SRS), methods can be generalized to term rewriting systems (TRS) but we don’t do this here.

We have the general theorem: R is terminating iff there is a monotone interpretation into a well-founded domain that is compatible with R. (Cf. ISR22 lecture by Carsten)

We specialize to certain domains.

First, weights: R = {bb → aaa, aa → b}. We interpret into the additive monoid of non-negative reals. We want

(monotonicity, well-foundedness) a, b ≥ 0 such that
(compatibility) b + b > a + a + a ∧ a + a > b.

Solution: ex4.smt2

$ z3 ex4.smt2 
sat
((a (/ 3.0 2.0))
 (b (/ 5.0 2.0)))

Discuss: is this really well-founded? Yes, w.r.t. the order given by x > y + 1/2.

Next, linear functions over N (monoid w.r.t. composition). Example: for termination of R = {ab → ba}.

We set i(a) = λx.a₁x + a₀, i(b) = λx.b₁x + b₀.

Then i(ab) = λx.a₁b₁x + a₁b₀ + a₀, i(ba) = λx.b₁a₁x + b₁a₀ + b₀.

Constraints are

(monotonicity, well-foundedness) a₁ ≥ 1, a₀ ≥ 0, b₁ ≥ 1, b₀ ≥ 0
(compatibility) a₁b₀ + a₀ > b₁a₀ + b₀

Exercise (fill in the hole): ex5.smt2

$ z3 ex5.smt2 
sat
((a1 16)
 (a0 10)
 (b1 2)
 (b0 2))

Note: previous method (weights) can be represented a special case of linear functions, where the linear coefficient is fixed to be 1, so the only unknown is the absolute part.

Next, basic form of matrix interpretations: each letter is interpreted by a square matrix over N. Conditions are:

(monotonicity) top-left and bottom-right entry ≥ 1
(compatibility) interpretation of lhs and rhs must be point-wise ≥, with a strict decrease in the upper-right corner.

Each such matrix describes a linear function on N^d that is monotone w.r.t. the (well-founded, not total) order on N^d given by x > y ⇔ x₁ > y₁ ∧ ⋀_i > 1x_i ≥ y₁.

Linear functions (previous example) can be seen as a special case, where matrices have shape [[a₁,a₀],[0,1]] (and we only store the upper row).

Challenges (to motivate next step in this lecture):

Find a matrix interpretation (of minimal dimension) for “Zantema’s Problem” (z001) a²b² → b³a³.

This is a classical test case for (manual, and later, automated) termination analysis. Cf. Alfons Geser: A Solution to Zantema’s Problem, techreport Uni Passau MIP-9314, 1993.
Find a matrix interpretation (of minimal dimension) for “Zantema’s Other Problem” (z086) {a² → bc, b² → ac, c² → ab}.

This is the showcase for matrix interpretations, https://www.win.tue.nl/rtaloop/problems/104.html 2005.

Names z0** are benchmark identifiers in TPDB (Termination Problems Data Base) https://termination-portal.org/wiki/TPDB.

and three very simple test cases: a → b, ab → ba, aa → aba. These are useful for checking that the encoding is correct, without wasting time waiting for the solver.

Writing SMT constraints programmatically

use simple-smt library https://hackage.haskell.org/package/simple-smt

Exercise qfnia-z086.hs

discuss: this leaves much to be desired:

expressiveness: the library is called “simple” for a reason
performance: QF_NIA is hard

simple-smt is useful, as a basic layer on which to build (type)safer and more expressive abstractions

One (stand-alone) higher-level library is https://hackage.haskell.org/package/smtlib2 But look at the types! E.g.,

(.+.) :: (Embed m e, HasMonad a, HasMonad b
  , MatchMonad a m, MatchMonad b m
  , MonadResult a ~ e tp, MonadResult b ~ e tp
  , IsSMTNumber tp) => a -> b -> m (e tp)

Perhaps there is a middle ground.

How does an SMT solver work

Again, that’s not the topic of our course, and we rely on SMT competition https://smt-comp.github.io/2022/ for picking a solver.

SMT is “SAT modulo T”, we use search algorithm “DPLL modulo T”:

the “current (partial) assignment” is an assignment from theory atom to Boolean (e.g., for QF_LRA, a conjuction of linear inequalities)
we do Boolean unit propagation, conflict detection, learning,

additionally,

a “theory solver” determines if current assignment is satisfiable (if not, it is a conflict)
a “theory propagator” produces new atoms when possible

Solving QF_NIA by bit-blasting

Bit-Blasting

represent unknown (number) by vector of unknown bits,
represent formula in theory, by formula in propositional logic

Example: numerical and relational Operations for type Bits from the Ersatz library.

Ansatz: blast-naive-z001.hs

Called “naive” because it increases bit width with each addition and multiplication (there can be no overflow). This creates large formulas where the solver takes too long.

That’s the motivation for fixing bit width - and detecting overflow.

Exercise: blast-fixed-z001.hs

Things to note in that code:

type-level numbers for bit width
usage of type classes (Num, Variable, Equatable/Orderable - why not Eq/Ord)

Exercises (based on that code):

what is wrong with overflow propagation for multiplication? (in semantics, ignoring efficiency)
improve efficiency (formula size) of overflow detection for multiplication
When we construct an unknown number, their contents is unknown but we set their overflow to False. Could it be unknown as well?
use type-level numbers for matrix dimension, so we can ultimately write Matrix (dim :: Natural) (Number (bits :: Natural))
reifyNat for setting type-level arguments at run-time

Eager and Lazy Bit-Blasting

bit-blasting: translate finite-domain (numerical) constraint to boolean constraint
eager: what we just did (first translate completely, then solve).
lazy: SMT with QF_BV (bit vectors): theory solver can use arithmetical properties to detect unsatisfiability (e.g., zero + zero cannot be nonzero) without blasting (creating) all bits

Then why don’t we use QF_BV, as it seems uniformly better (than QF_NIA and SAT)? Current answer: QF_BV arithmetics is modular (i.e., silently overflowing). Have to compute overflow by extra formulas. That’s perfectly possible, but this is Boolean programming, so the advantage of SMT seems mostly gone.

qfbv-z086-wrong.hs wrong - ignores overflow
Exercise (fill the hole) qfbv-z086-fixed.hs detects overflow in a crude way.

How do we interface the solver?

that is, write constraints, read satisfying assignment.

A review of what we already did, in above examples.

Languages (DIMACS, SMTLIB2) are for the solvers, what is machine/assembly language for the CPU: we need it, but we don’t want to write it - we use a compiler instead, from a more expressive higher-level language. Ideally, from the language of our application, in my case, Haskell:

we want an embedded DSL (domain specific language) for constraint programming. Then we can use all the useful features of the host language (concrete syntax, type system, abstraction by names/subprograms/type classes etc., libraries) - else we had to re-invent them (poorly).

The embedding needs to provide a transparent connection between names (and types, for SMT) (for unknowns) in the host language and names (and types) in the constraint programming language (needed in both directions, to and from solver)

We also need observable sharing: consider host language expression let { x = a + b } in x * x where all names denote formulas (that involve unknowns). x is shared (in the host language) but the sharing is not observable when we output the formula in the target language - Unless a + b has a side effect of creating a new name, but we cannot have this in a purely functional language.

We could make the sharing explicit do { x <- reify (a + b); return $ x*x } (where reify produces a fresh name) (my library satchmo https://hackage.haskell.org/package/satchmo does this) but that’s awkward to write (cannot easily nest sub-expressions) and we cannot use Num (since methods for addition, multiplication are pure)

The ersatz https://hackage.haskell.org/package/ersatz library gets observable sharing with, ahem, judicious use of unsafePerformIO. Exercise: find this in the Ersatz source code. For help with reading the code: https://www.imn.htwk-leipzig.de/~waldmann/etc/untutorial/ersatz/

How do we access/control the solver?

application starts solver process via the OS, interface via stdin/stdout
- pro: human-readable, fixed format, can easily switch solver.
- cons: slow disk (?), slow printing/parsing (?), requires co-ordination between OS processes and program threads
application is linked with solver’s object file, calls solver API methods
- pro: direct in-memory operations to build formula and read assignment, no extra OS processes
- cons: depends on solver API which is not standardised (hard to switch solvers).

But there is hope (for SAT, at least)- cf. IPASIR Standard Interface for Incremental Satisfiability Solving https://github.com/biotomas/ipasir. Implemented in kissat. Available as Haskell library in https://github.com/jwaldmann/ersatz-kissatapi.

Symmetries

A Symmetry of a constraint program ∃x : F(x) is a bijection (a permutation) π of the models (the satisfying assignments)

Example. A symmetry of ∃x, y : x + y < 1 is π that swaps x with y.

Example. Each permutation of the universe leads to a symmetry of “R is locally confluent but not confluent”.

Each set S of symmetries partitions the set of models into equvialence classes w.r.t. S-reachability.

Symmetry breaking

A “symmetry breaking constraint” for a set of symmetries S of F is a constraint B such that ∀x : F(x) ⇒ ∃y ∈ S^*(x) : F(y) ∧ B(y).

We call the symmetry “fully broken” by B iff in the previous formula, for each x, there is exactly one such y.

Example. A fully symmetry breaking constraint for “swap x with y” is x ≤ y w.r.t. some total order.

Example. A symmetry breaking constraint that breaks the set of permutation of the universe U = {1, …, n}, for any constraint of one relation R ⊆ U², is “element 1 has minimal in-degree” (also: has minimal out-degree. But not: has minimal in-degree and minimal out-degree).

In the same setting, a fully breaking constraint is “the sequence of in-degrees is (weakly) monotonically increasing”.

Adding Symmetry breaking constraints does not change satisfiability, but (and that’s the goal) it might allow the solver to traverse the set of partial assignments (candidate models) faster. This helps to find a model, or prove that there is none.

This is a well-established method in constraint programming. It seems less powerful for SAT encodings - since the (encoding of the) symmetry breaker increases the formula size, and adds extra variables.

Symmetry Guessing

To find a model of ∃x : F(x), we can add any extra constraint B: any model of ∃x : F(x) ∧ B(x) is a model of ∃x : F(x).

But if F(x) ∧ B(x) is unsatisfiable, then we know nothing about the status of F. (We only know this if B is indeed a symmetry breaker.)

One way of obtaining candidates for B is that they describe invariants.

Example. For x + y ≤ 1 a candidate is x = y because it describes invariance w.r.t. permutation of unknowns. This also shows the intented usage: instead of (x+y) ≤ 1 ∧ (x=y) we should write (or have the solver infer) x + x ≤ 1, having reduces the number of unknowns.

Comparison, Examples, Exercises

breaking a symmetry: e.g., x ≤ π(x), keeps satisfiability
guessing a symmetry: e.g., x = π(x), might lose satisfiability

Exercise: symmetry breaking, and symmetry guessing, for the larger ARS examples from Zantema’s paper.

Exercise: symmetry breaking, and symmetry guessing, for z001 a²b² → b³a³. We note that swapping a with b, and reversal of lhs and rhs, keeps z001 invariant. So we guess that there is a matrix interpretation that is invariant w.r.t. that operation. Write the constraints that encodes this. What reduction in the number of unknowns is obtained?

Solution blast-sym-z001.hs.

Unknowns do not need to be eliminated by hand - the SAT solver should be able to detect and remove equivalent unknowns. Check this! (look at the trace of kissat)

Exercise: symmetry breaking and symmetry guessing for z086 {aa → bc, bb → ac, cc → ab}.

Exercise (Homework) detect and use symmetries in SRS termination automatically. Win termination competition.

Outlook

Extensions of topics that we treated

linear interpretations with representation λx.2^ax + b (and bit-blasting) (I use this for KBO-like ordering constraints in matchbox. Not formalized.)
QF_NIA to QF_LIA by linearisation (bit-blasting for non-linear unknowns) (Christina Borralleras et al., CADE 2009) https://dblp.org/rec/conf/cade/BorrallerasLNRR09

References for SAT and SMT in Termination

recent (2022) informal survey of SAT and SMT in current termination provers: https://lists.rwth-aachen.de/hyperkitty/list/termtools@lists.rwth-aachen.de/thread/FNDNU5Y7TGXYXX34YWKFO2ICSRT6M3ME/
PhD thesis (2012) and papers by Carsten Fuhs https://www.bbk.ac.uk/our-staff/profile/8731976/carsten-fuhs#publications
first appearance of SAT encoding in termination: Masahito Kurihara, Hisashi Kondo: BDD encoding for partial order constraints, SMC 2000 https://dblp.org/rec/conf/smc/KuriharaK00

Other approaches to Constraint Programming (not in this course)

historically, from linear programming

linear “programming” (LP) = conjunction of linear inequalities (possibly with a goal function), satisfiability is in P (ellipsoid method, Leonid Kachiyan 1979)
(mixed) integer “programming (MIP) = conjunction of linear inequalities where some unknowns have integer domain. Adaption of LP solver algorithms (e.g., Gomory cuts). Simulation of other Boolean combinations by extra {0, 1} unknowns. Thus, includes SAT, is NP-hard. Discuss: is it in NP?

historically, from logic programming

constraint logic programming = Prolog “modulo theory” (Prolog programs with theory atoms), semantics: SLP resolution combined with theory solver
answer set programming (ASP) provides different semantics for finite domain constraints - not just any satisfying assignment (model) but only “stable” models. Has higher complexity (NP with NP oracle). Is often mistaken for “better interface to SAT” (ignoring semantic difference) since surface syntax provides some abstractions over DIMACS.

finite domain constraint programming

each unknown has finite domain. The constraint program is a mixture of a formula (as in SMT), and search strategies annotations for unknowns (SMT leaves choice of strategy to the solver completely). Example solver: Gecode (Christian Schulze, Guido Tack, 200? –)
unified interface language for FD constraints: minizinc (Peter Stuckey et al., 201? –) https://www.minizinc.org/ My impression: they are slowly approaching SMT.

Summary (from the standpoint of analysis of rewriting)

(general) SAT/SMT did drastically improve the capabilities of automated analysis of termination and complexity of rewriting systems, and is used by the majority of current analysers (see Termination Competitions)
(specific, from my experience with Matchbox termination prover) SMT is mostly a convenience, solving non-linear arithmetical constraints via bit-blasting is still faster. Why?
often, the constraints have a large Boolean part (e.g., operations with arctic numbers { − ∞} ∪ Z have “if finite then .. else ..”)
even for numerical constraints: the general (unbounded) case is hopeless (Hilbert 10 - undecidable; Tarski, QEPCAD - hopeless complexity). For bounded domain of unknowns, a natural operation is to (recursively) reduce to lower/upper half of interval, and that is a Boolean encoding.
but actually .. for matchbox I found that unary bit-blasting is best, with order encoding: e.g., 3 is 11100000. (Cf. Michael Codish et al., Exotic Semi-Ring Constraints, SMT 2012, https://doi.org/10.29007/qqvt) Why is that? My guess: unary representation allows for quick propagation of ranges, e.g., a number 3 ≤ x ≤ 5 is 111**000. Cf. binary encoding: this same range would have three unspecified bits, since 3 = 011, 4 = 100, 5 = 101.

Open problems and Suggestions

find good Kissat settings (there are hundreds) https://github.com/arminbiere/kissat/issues/25
use theory of string concatenation for analysis of string rewriting (finding derivations of certain types, e.g., compressed loops)
convince SMT community to add overflow detection to QF_BV arithmetics (bitwuzla already has this in their API, cf. BITWUZLA_KIND_BV_UMUL_OVERFLOW at https://github.com/bitwuzla/bitwuzla/blob/main/src/api/c/bitwuzla.h#L1706)
find a good encoding of arctic constraints in QF_LRA (reduce number of Booleans)
(alternatively) write a theory solver/propagator for arctic arithmetic