FOUNDATIONS OF PROBABILISTIC PROGRAMMING
What does a probabilistic program actually compute? How can one formally reason about
such probabilistic programs? This valuable guide covers such elementary questions and
more. It provides a state-of-the-art overview of the theoretical underpinnings of modern
probabilistic programming and their applications in machine learning, security, and other
domains, at a level suitable for graduate students and non-experts in the field. In addition,
the book treats the connection between probabilistic programs and mathematical logic,
security (what is the probability that software leaks confidential information?), and presents
three programming languages for different applications: Excel tables, program testing, and
approximate computing. This title is also available as Open Access on Cambridge Core.
gilles barthe is Scientific Director at the Max Planck Institute for Security and
Privacy and Research Professor at the IMDEA Software Institute, Madrid. His recent
research develops programming language techniques and verification methods for
probabilistic languages, with a focus on cryptographic and differentially private
computations.
joost-pieter katoen is Professor at RWTH Aachen University and University of
Twente. His research interests include formal verification, formal semantics, concurrency
theory, and probabilistic computation. He co-authored the book Principles of Model
Checking (2008). He received an honorary doctorate from Aalborg University, is member
of the Academia Europaea, and is an ERC Advanced Grant holder.
alexandra silva is Professor of Algebra, Semantics, and Computation at University
College London. A theoretical computer scientist with contributions in the areas of
semantics of programming languages, concurrency theory, and probabilistic network
verification, her work has been recognized by multiple awards, including the Needham
Award 2018, the Presburger Award 2017, the Leverhulme Prize 2016, and an ERC Starting
Grant in 2015.
Published online by Cambridge University Press
Published online by Cambridge University Press
FOUNDATIONS OF PROBABILISTIC
PROGRAMMING
Edited by
GILLES BARTHE
Max-Planck-Institut f¨ur Cybersicherheit und Schutz der Privatsph¨are, Bochum, Germany
JOOST–PIETER KATOEN
Rheinisch–Westf¨alische Technische Hochschule, Aachen, Germany
ALEXANDRA SILVA
University College London
Published online by Cambridge University Press
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi 110025, India
79 Anson Road, #06–04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/9781108488518
DOI: 10.1017/9781108770750
© Gilles Barthe, Joost-Pieter Katoen and Alexandra Silva 2021
This work is in copyright. It is subject to statutory exceptions and to the provisions of
relevant licensing agreements; with the exception of the Creative Commons version the
link for which is provided below, no reproduction of any part of this work may
take place without the written permission of Cambridge University Press.
An online version of this work is published at doi.org/10.1017/9781108770750 under a
Creative Commons Open Access license CC-BY which permits re-use, distribution and
reproduction in any medium for any purpose providing appropriate credit to the original
work is given, any changes made are indicated. To view a copy of this license, visit
https://creativecommons.org/licenses/by/4.0
All versions of this work may contain content reproduced under license from third parties.
Permission to reproduce this third-party content must be obtained from these third-parties directly.
When citing this work, please include a reference to the DOI 10.1017/9781108770750
First published 2021
Printed in the United Kingdom by TJ Books Limited, Padstow Cornwall
A catalogue record for this publication is available from the British Library.
ISBN 978-1-108-48851-8 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Published online by Cambridge University Press
Contents
Contributors page vii
Preface xi
1 Semantics of Probabilistic Programming: A Gentle Introduction
Fredrik Dahlqvist, Alexandra Silva and Dexter Kozen 1
2 Probabilistic Programs as Measures
Sam Staton 43
3 Application of Computable Distributions to the Semantics of Prob-
abilistic Programs
Daniel Huang, Greg Morrisett and Bas Spitters 75
4 On Probabilistic λ-Calculi
Ugo Dal Lago 121
5 Probabilistic Couplings from Program Logics
Gilles Barthe and Justin Hsu 145
6 Expected Runtime Analyis by Program Verification
Benjamin Lucien Kaminski, Joost-Pieter Katoen and Christoph Math-
eja 185
7 Termination Analysis of Probabilistic Programs with Martingales
Krishnendu Chatterjee, Hongfei Fu and Petr Novotný 221
8 Quantitative Analysis of Programs with Probabilities and Concen-
tration of Measure Inequalities
Sriram Sankaranarayanan 259
9 The Logical Essentials of Bayesian Reasoning
Bart Jacobs and Fabio Zanasi 295
v
Published online by Cambridge University Press
vi Contents
10 Quantitative Equational Reasoning
Giorgio Bacci, Radu Mardare, Prakash Panangaden and Gordon
Plotkin 333
11 Probabilistic Abstract Interpretation: Sound Inference and Appli-
cation to Privacy
José Manuel Calderón Trilla, Michael Hicks, Stephen Magill,
Piotr Mardziel and Ian Sweet 361
12 Quantitative Information Flow with Monads in Haskell
Jeremy Gibbons, Annabelle McIver, Carroll Morgan and
Tom Schrijvers 391
13 Luck: A Probabilistic Language for Testing
Lampropoulos Leonidas, Benjamin C. Pierce, Li-yao Xia, Diane
Gallois-Wong, Cătălin Hriţcu, John Hughes 449
14 Tabular: Probabilistic Inference from the Spreadsheet
Andrew D. Gordon, Claudio Russo, Marcin Szymczak, Johannes
Borgström, Nicolas Rolland, Thore Graepel and Daniel Tarlow 489
15 Programming Unreliable Hardware
Michael Carbin and Sasa Misailovic 533
Published online by Cambridge University Press
Preface
Probabilistic programs
Probabilistic programs describe recipes for inferring statistical conclusions from a
complex mixture of uncertain data and real-world observations. They can represent
probabilistic graphical models far beyond the capabilities of Bayesian networks
and are expected to have a major impact on machine intelligence. Probabilistic
programs are ubiquitous. They steer autonomous robots and self-driving cars, are
key to describe security mechanisms, naturally code up randomised algorithms
for solving NP-hard or even unsolvable problems, and are rapidly encroaching on
AI. Probabilistic programming aims to make probabilistic modelling and machine
learning accessible to the programmer.
What is this book all about?
Probabilistic programs, though typically relatively small in size, are hard to grasp,
let alone check automatically. Elementary questions are notoriously hard even the
most elementary question “does a program halt with probability one? is “more un-
decidable than the halting problem. This book is about the theoretical foundations
of probabilistic programming. It is primarily concerned with fundamental questions
such as the following: What is Bayesian probability theory? What is the precise
mathematical meaning of probabilistic programs? How show almost-sure termi-
nation? How determine the (possibly infinite) expected runtime of probabilistic
programs? How can two similar programs be compared? It covers several analy-
sis techniques on probabilistic programs such as abstract interpretation, algebraic
reasoning and determining concentration measures.
These chapters are complemented with chapters on the formal definition of
concrete probabilistic programming languages and some possible applications of
the use of probabilistic programs.
xi
https://doi.org/10.1017/9781108770750.001 Published online by Cambridge University Press
xii Preface
How to read this volume?
The volume consists of five parts: semantics, verification, logic, security, and pro-
gramming languages.
Semantics.
The first part on semantics consists of four chapters on different aspects of the
formal semantics of probabilistic programming languages. Dahlqvist et al. start off
in Chapter 1 by presenting an operational and denotational semantics of an imper-
ative language with discrete and continuous distributions. The chapter by Staton
presents a compositional measure-theoretic semantics for a first-order probabilis-
tic language with arbitrary soft conditioning. Chapter 3 by Huang et al. studies
semantics with a focus on computability. Motivated by the tension between the
discrete and the continuous in probabilistic modelling, type-2 computable distribu-
tions are introduced as the elementary mathematical objects to give semantics to
probabilistic languages. Dal Lago’s Chapter 4 completes the semantics part by treat-
ing a probabilistic version of the λ-calculus, the backbone language for functional
programming languages.
Verification.
The second part on formal verification starts with a chapter by Barthe and Hsu
on the use of couplings, a well-known concept in probability theory for verifying
probabilistic programs. Kaminski et al. present a weakest pre-condition calculus
in the style of Dijkstra for determining the expected run-time of a probabilistic
program. This can be used to determine whether a program needs infinitely many
steps to terminate with probability one. Chapter 7 by Chatterjee et al. presents var-
ious techniques based on supermartingales to decide the almost-sure termination
of a probabilistic program, i.e., does a program terminate with probability one on
all possible inputs? The verification part ends with a chapter by Sankaranarayanan
about the quantitative analysis of probabilistic programs by means of concentration
of measure inequalities. This includes Chernoff–Hoeffding and Bernstein inequal-
ities.
Logic.
The third part focuses on logic and consists of two chapters. In Chapter 9, Ja-
cobs and Zanasi present an insightful new perspective on fundamental aspects of
probability theory which are relevant for probabilistic programming. Their chapter
introduces a category-theoretic formalization of Bayesian reasoning, and in particu-
lar a string-diagram-friendly one. This contribution is complemented by the chapter
by Bacci et al. which surveys some recent contributions on extending the classical
https://doi.org/10.1017/9781108770750.001 Published online by Cambridge University Press
Preface xiii
Birkhoff/Lawvere theory of equational reasoning to the quantitative setting. In that
setting, compared entities are not necessarily equal but rather treated by a notion of
distance.
Security.
Part four is concerned with security, an important application field in which prob-
abilities are pivotal. Chapter 11 by Calderon et al. collects together results on
probabilistic abstract interpretation and applies them to probabilistic programming
in the context of security. Chapter 12 by Gibbons et al. presents an embedded
domain-specific language in Haskell to compute hyper-distributions induced by
programs. This is used to compute the amount of leakage of a program by measur-
ing variations on post-distributions that include Shannon entropy and Bayes risk
(that is, the maximum information an attacker can learn in a single run).
Programming languages.
The final part of this volume is concerned with three concrete probabilistic pro-
gramming languages: Luck, Tabular, and Rely. Chapter 13 describes Luck, pro-
posed by Lampropoulos et al., a language for test generation and a framework for
property-based testing of functional programs. Luck combines local instantiation of
unknown variables and global constraint solving to make test generation more effi-
cient than existing approaches. Gordon et al. introduce Tabular, a domain-specific
programming language designed to express probabilistic models and to perform
probabilistic inference over relational data. Tabular can be used from Microsoft
Excel or as stand-alone software. Chapter 14 presents the syntax, semantics and
type system of Tabular and shows how it can be used to design probabilistic models
and to perform probabilistic inference. Chapter 15, the last chapter of this volume,
by Carbin and Misailovic, presents Rely, a programming language that enables
reasoning about the probability that a program produces the correct result when
executed on unreliable hardware.
How this volume emerged
This book consists of 15 contributed chapters and a preface. The idea for this
volume emerged at the first summer school on Foundations of Programming and
Software Systems held in Braga, Portugal, May–June 2017. This biennial school
series is supported by EATCS (European Association for Theoretical Computer
Science), ETAPS (European Conference on Theory and Practice of Software),
ACM SIGPLAN (Special Interest Group on Programming Languages) and ACM
SIGLOG (Special Interest Group on Logic and Computation). It was felt that there is
no comprehensive book on the theoretical foundations of probabilistic programming
https://doi.org/10.1017/9781108770750.001 Published online by Cambridge University Press
xiv Preface
languages. We sincerely hope that this volume contributes to filling this gap. Enjoy
reading!
Acknowledgements
Many people have helped us to make this volume possible. First and foremost, we
like to thank all authors for their fine contributions and for their patience in this
book process. All chapters have been subject to peer reviews. We thank the review-
ers Alejandro Aguirre, Krishnendu Chatterjee, Ugo Dal Lago, Thomas Ehrhard,
Claudia Faggian, Marco Gaboardi, Francesco Gavazzo, Jeremy Gibbons, Andrew
D. Gordon, Ichiro Hasuo, Jan Hoffmann, Justin Hsu, Bart Jacobs, Benjamin Lu-
cien Kaminski, Pasquale Malacaria, Radu Mardare, Christoph Matheja, Annabelle
McIver, Sasa Misailovic, Petr Novotný, Prakash Panangaden, Corina Pasareanu,
Alejandro Russo, Sriram Sankaranarayanan, Steffen Smolka, and Marcin Szym-
czak for their thorough reviewing work.
We thank Luis Barbosa, Catarina Fernandes and Renato Neves for their invaluable
efforts in the local organisation of the FoPPS 2017 summer school. We thank Joshua
Moerman for his efforts in the editing process. David Tranah and Anna Scriven
from Cambridge University Press are thanked for their support during this process.
We thank Hanna Schraffenberger for designing the cover of the book. Finally, we
are grateful for the financial support from the ERC (AdG FRAPPANT and StG
ProFoundNet) and the MPI on Security and Privacy which enabled to publish this
volume under gold open access.
https://doi.org/10.1017/9781108770750.001 Published online by Cambridge University Press
1
Semantics of Probabilistic Programming:
A Gentle Introduction
Fredrik Dahlqvist and Alexandra Silva
University College London
Dexter Kozen
Cornell University
Abstract: Reasoning about probabilistic programs is hard because it compounds
the difficulty of classic program analysis with sometimes subtle questions of prob-
ability theory. Having precise mathematical models, or semantics, describing their
behaviour is therefore particularly important. In this chapter, we review two prob-
abilistic semantics. First an operational semantics which models the local, step-
by-step, behaviour of programs, then a denotational semantics describing global
behaviour as an operator transforming probability distributions over memory states.
1.1 Introduction
A probabilistic program is any program whose execution is probabilistic. This
usually means that there is a source of randomness that allows weighted choices
to be made during execution. Given an initial machine-state, in the event that the
program halts, there will be a distribution describing the probability of output
events. Any deterministic program is trivially a probabilistic program that does
not make any random choices. The source of randomness is typically a random
number generator, which is assumed to provide independent samples from a known
distribution. In practice, these are often pseudo-random number generators, which
do not provide true randomness, but only an approximation; however, it is possible
to construct hardware random number generators that provide true randomness, for
example by measuring a noisy electromagnetic process.
Reasoning about deterministicprograms usually involves answering binary yes/no
questions: Is the postcondition always satisfied? Does this program halt on all in-
puts? Does it always halt in polynomial time? On the other hand, reasoning about
probabilistic programming usually involves more quantitative questions: What is
the probability that the postcondition is satisfied? What is the probability that this
a
From Foundations of Probabilistic Programming, edited by Gilles Barthe, Joost-Pieter Katoen and Alexandra
Silva published 2020 by Cambridge University Press.
1
https://doi.org/10.1017/9781108770750.002 Published online by Cambridge University Press
2 Dahlqvist, Kozen and Silva: Semantics of Probabilistic Programming
program halts? Is its expected halting time polynomial? In order to answer questions
like these, the first step should be to develop a formal mathematical semantics for
probabilistic programs, which will allow us to formalise such questions precisely.
This is the main purpose of this chapter.
Reasoning about probabilistic programs is in general difficult because it com-
pounds the difficulty of deterministic program analysis with questions of probability
theory, which can sometimes be counterintuitive. We will use examples to illustrate
all the main ideas presented in this chapter. We introduce these examples here and
will return to them as we develop the semantics of probabilistic programs. We
start with two examples involving discrete probabilities for which naive probability
theory provides a sufficient framework for reasoning. We will then present two
programs that involve continuous distributions for which a more general theory
known as measure theory is needed. The requisite background for understanding
these concepts is presented in Section 1.2.
x :=0;
while x ==0 do
x:=coin()
start
[x → ?] [x → 0] [x → 1]
x:=0
1
/2 : x:=1
1
/2 : x:=0
Figure 1.1 A simple coin-toss program
We start with the simple program of Fig. 1.1 displayed next to the small proba-
bilistic automaton it implements. Here the construct coin() is our random number
generator; each successive call returns 0 or 1, each with probability 1/2, and succes-
sive calls are independent, which means that n successive calls will yield one of the
2
n
possible sequences of n binary digits, each with probability 2
n
. A distribution
on {0, 1} that takes value 1 with probability p and 0 with probability 1 p is called
a Bernoulli distribution with (success) parameter p. Thus coin() is a Bernoulli
distribution with success parameter 1/2.
It is intuitively clear that this program eventually halts with probability 1. Looking
at the automaton of Fig. 1.1, one can see that the probability of the program going
https://doi.org/10.1017/9781108770750.002 Published online by Cambridge University Press
1.1 Introduction 3
through n iterations of the body of the loop is 2
n
. Moreover, the expected number
of iterations of the body of the loop is given by
n=1
n2
n
= 2.
This type of simple probabilistic process involving repeated independent trials until
some fixed “success” event occurs is called a Bernoulli process. If the probability
of success in each trial is p, then the expected time until success is 1/p. In this
example, p = 1/2. We will show in Section 1.3 how the mathematical interpretation
of this program (its semantics) can be constructed compositionally, that is to say
line-by-line, and how it agrees with these simple observations.
Our second example is also discrete, but intuitively less obvious. The program
of Fig. 1.2 implements a random walk on the two-dimensional grid Z × Z. In each
iteration of the body of the loop, the function step updates the current coordinates
by moving left, right, down, or up, each with equal probability 1/4.
main {
u :=0;
v :=0;
step (u,v );
while u !=0 || v!=0 do
step (u,v)
}
step (u,v ){
x:=coin ();
y:=coin ();
u:=u+(x-y);
v := v+( x+y -1)
}
Figure 1.2 A random walk on a two-dimensional grid
The loop continues until the random walk returns to the origin. The first call to
step outside the loop ensures that the program takes at least one step, so it does not
halt immediately. The question of the halting probability is now much less obvious.
The state space is infinite, and there is no constraint on how far the random walk can
travel from the origin. Indeed, for any distance, there is a nonzero probability that
it goes at least that far. However, it turns out that the probability that the program
halts is 1. In the terminology of probability theory, we would say that the two-
dimensional random walk is recurrent at every point. This example illustrates how
the analysis of probabilistic programs can rely on results from probability theory
https://doi.org/10.1017/9781108770750.002 Published online by Cambridge University Press
4 Dahlqvist, Kozen and Silva: Semantics of Probabilistic Programming
that are far from obvious. Indeed, the three-dimensional version is not recurrent;
the probability that a random walk on Z
3
eventually returns to the origin is strictly
less than 1.
We now consider two programs that require continuous distributions. The se-
mantics of such programs cannot be defined without the full power of measure
theory, the mathematical foundation of probabilities and integration. The program
of Fig. 1.3 approximates the constant π using Monte Carlo integration, a probabilis-
tic integration method. The program works by taking a large number of independent,
uniformly distributed random samples from the square [0,1]×[0, 1] and counting
the number that fall inside the unit circle. As the area of the square is 1 and the area
of the part of the unit circle inside that square is π/4, by the law of large numbers
we expect to see a π/4 fraction of sample points lying inside the circle.
i :=0;
n :=0;
while i<1e9 do
x:=rand ();
y:=rand ();
if (x*x+y*y) < 1 then n:=n+1;
i:=i+1
i :=4*n /1 e9;
Figure 1.3 Probabilistic computation of π.
In this example, the random number generator rand() samples from the uniform
distribution on the interval [0, 1]. This distribution is often called Lebesgue mea-
sure. Here the state space [0, 1] is uncountable and the probability of drawing any
particular x ∈[0, 1] is zero. Such probability distributions are called continuous.
The natural question to ask about this program is not whether it terminates (it clearly
does) but whether it returns a good approximation of π with high probability. We
will answer this question in Section 1.3.
Finally, the program in Fig. 1.4 generates a real number between [0,1] whose
expansion in base 3 does not contain any 1’s. This program is not like the others
in that it does not halt (nor is it meant to). The program generates a sample from
a curious and in many respects counterintuitive distribution called the Cantor
distribution. It cannot be described using discrete probability distributions (i.e.
finite or countable weighted sums of point masses), although the program only
uses a discrete fair coin as a source. The Cantor distribution is also an example of
continuous probability distribution, which assigns probability zero to every element
of the state space. It is also an example of a so-called singular distribution, since
it can be shown that the set of all its possible outcomes—that is to say the set of
https://doi.org/10.1017/9781108770750.002 Published online by Cambridge University Press
1.2 Measure theory: What you need to know 5
all real numbers whose base-3 expansion contains no 1’s—has measure 0 in the
Lebesgue measure on [0, 1].
x :=0;
d :=1;
while true do
d := d/3;
x := x+2* coin ()*d
Figure 1.4 Cantor distribution program.
1.2 Measure theory: What you need to know
Measures are a generalization of the concepts of length, area, or volume of Euclidean
geometry to other spaces. They form the basis of probability and integration theory.
In this section, we explain what it means for a space to be a measurable space,we
define measures on these spaces, and we examine the rich structure of spaces of
measures, which will be essential to the semantics of probabilistic programs defined
in Section 1.3.5. When not specified otherwise we use the word measure to refer to
finite measures.
1.2.1 Some intuition
The concepts of length, area, and volume on Euclidean spaces are examples of
(positive) measures. These are sufficient to illustrate most of the desired properties
of measures and some pitfalls to avoid. For the sake of simplicity, let us examine
the concept of length. Given an interval [a, b]⊆R, its length is of course (
[
a, b
]
) =
ba. But the length function makes sense for other subsets of R besides intervals.
So we will begin with two related questions:
(a) Which subsets of R can meaningfully be assigned a “length” consistent with
the length of intervals? I.e., what should the domain of be?
(b) Which properties should the length function satisfy?
The answer to question (a) will give rise to the notion of measurable space, and the
answer to question (b) will give rise to the notion of measure, both defined formally
in Section 1.2.2.
Note that larger intervals have larger lengths: if [a, b]⊆[c, d], then we have that
([a, b]) = b a d c = ([c, d]). This intuitively obvious property is a general
feature of all positive measures: they associate nonnegative real numbers to subsets
monotonically with respect to set inclusion. Let us now take two disjoint intervals
https://doi.org/10.1017/9781108770750.002 Published online by Cambridge University Press
6 Dahlqvist, Kozen and Silva: Semantics of Probabilistic Programming
[a
1
, b
1
]and [a
2
, b
2
]with b
1
< a
2
. It is natural to define the length of [a
1
, b
1
]∪[a
2
, b
2
]
as the sum of the length of the respective intervals, i.e.
([a
1
, b
1
]∪[a
2
, b
2
]) = ([a
1
, b
1
]) + ([a
2
, b
2
]) = (b
1
a
1
) + (b
2
a
2
).
We can draw two conclusions from this natural definition. First, if A, B are two
disjoint subsets of R in the domain of , then their union should also belong to the
domain of , and the measure of the union should be the sum of the measures. More
generally, if A
i
,1 i n, is any finite collection of pairwise disjoint sets in the
domain of , then
n
i=1
A
i
should also be in the domain of , and the measure of
the union should be the sum of the measures of the A
i
; that is,
n
i=1
A
i
=
n
i=1
(A
i
). (1.1)
A real-valued function on subsets satisfying (1.1) is called (finitely) additive. All
measures will be finitely additive, and in fact more. Consider the countable col-
lection of pairwise disjoint intervals
[
n, n + 2
n
)
, n N. Generalising (1.1), it is
natural to define on the union of these intervals as
n=0
[
n, n + 2
n
)
=
n=0
2
n
= 2.
Again, we can draw two conclusions from this natural definition. First, if A
i
for
i N is a countable collection of pairwise disjoint sets in the domain of , then
i N
A
i
should be in the domain of ; second, that (1.1) should be extended to such
countable collections, so that
i=0
A
i
=
i=0
(A
i
). (1.2)
A function satisfying (1.2) is called countably additive or σ-additive.Every
measure will be countably additive. The reader will now legitimately ask: what
happens if the sum in (1.2) diverges? To deal with this behaviour, one simply allows
as a possible length, that is to say the codomain of can be the extended real line
R
+
{∞}. In particular, this allows us to define the length of R via (1.2) as:
(R) =
nZ
[n, n + 1)
= .
However, for the purpose of semantics of probabilistic programs, we will not need
measures taking the value . A measure is called finite if it only assigns finite
values in R to any set in its domain. For the remainder of this chapter, the term
“measure”, otherwise unqualified, will refer to finite measures.
https://doi.org/10.1017/9781108770750.002 Published online by Cambridge University Press