This list of research articles is under construction and very incomplete. Please help us collect relevant articles by submitting new bibtex entries or correcting mistakes. This list is maintained by Daniel Roy.
Keywords: probabilistic programming, inference, mcmc, metropolis-hastings
Keywords: probabilistic programming, inference, smc, particle filter, coarse-to-fine
We propose a framework to prove almost sure termination for probabilistic programs with real valued variables. It is based on ranking supermartingales, a notion analogous to ranking functions on non-probabilistic programs. The framework is proven sound and complete for a meaningful class of programs involving randomization and bounded nondeterminism. We complement this foundational insight by a practical proof methodology, based on sound conditions that enable compositional reasoning and are amenable to a direct implementation using modern theorem provers. This is integrated in a small dependent type system, to overcome the problem that lexicographic ranking functions fail when combined with randomization. Among others, this compositional methodology enables the verification of probabilistic programs outside the complete class that admits ranking supermartingales.
Keywords: probabilistic programs, program verification, supermartingales, termination
We present a new algorithm for Bayesian inference over probabilistic programs, based on data flow analysis techniques from the program analysis community. Unlike existing techniques for Bayesian inference on probabilistic programs, our data flow analysis algorithm is able to perform inference directly on probabilistic programs with loops. Even for loop-free programs, we show that data flow analysis offers better precision and better performance benefits over existing techniques. We also describe heuristics that are crucial for our inference to scale, and present an empirical evaluation of our algorithm over a range of benchmarks.
A Bayesian model is based on a pair of probability distributions, known as the prior and sampling distributions. A wide range of fundamental machine learning tasks, including regression, classification, clustering, and many others, can all be seen as Bayesian models. We propose a new probabilistic programming abstraction, a typed Bayesian model, based on a pair of probabilistic expressions for the prior and sampling distributions. A sampler for a model is an algorithm to compute synthetic data from its sampling distribution, while a learner for a model is an algorithm for probabilistic inference on the model. Models, samplers, and learners form a generic programming pattern for model-based inference. They support the uniform expression of common tasks including model testing, and generic compositions such as mixture models, evidence-based model averaging, and mixtures of experts. A formal semantics supports reasoning about model equivalence and implementation correctness. By developing a series of examples and three learner implementations based on exact inference, factor graphs, and Markov chain Monte Carlo, we demonstrate the broad applicability of this new programming pattern.
The probability density function of a probability distribution is a fundamental concept in probability theory and a key ingredient in various widely used machine learning methods. However, the necessary framework for compiling probabilistic functional programs to density functions has only recently been developed. In this work, we present a density compiler for a probabilistic language with discrete and continuous distributions, and discrete observations, and provide a proof of its soundness. The compiler greatly reduces the development effort of domain experts, which we demonstrate by solving inference problems from various scientific applications, such as modelling the global carbon cycle, using a standard Markov chain Monte Carlo framework.
The combination of probabilistic and nondeterministic choice in program calculi is a notoriously tricky problem, and one with a long history. We present a simple functional programming approach to this challenge, based on algebraic theories of computational effects. We make use of the powerful abstraction facilities of modern functional languages, to introduce the choice operations as a little embedded domain-specific language rather than having to define a language extension; we rely on referential transparency, to justify straightforward equational reasoning about program behaviour.
Abstract interpretation has been widely used for verifying properties of computer systems. Here, we present a way to extend this framework to the case of probabilistic systems. The probabilistic abstraction framework that we propose allows us to systematically lift any classical analysis or verification method to the probabilistic setting by separating in the program semantics the probabilistic behavior from the (non-)deterministic behavior. This separation provides new insights for designing novel probabilistic static analyses and verification methods. We define the concrete probabilistic semantics and propose different ways to abstract them. We provide examples illustrating the expressiveness and effectiveness of our approach.
Keywords: Computer Communication Networks, Logics and Meanings of Programs, Mathematical Logic and Formal Languages, Programming Languages, Compilers, Interpreters, Programming Techniques, Software Engineering
The problem of replicating the flexibility of human common-sense reasoning has captured the imagination of computer scientists since the early days of Alan Turing's foundational work on computation and the philosophy of artificial intelligence. In the intervening years, the idea of cognition as computation has emerged as a fundamental tenet of Artificial Intelligence (AI) and cognitive science. But what kind of computation is cognition?
We describe a computational formalism centered around a probabilistic Turing machine called QUERY, which captures the operation of probabilistic conditioning via conditional simulation. Through several examples and analyses, we demonstrate how the QUERY abstraction can be used to cast common-sense reasoning as probabilistic inference in a statistical model of our observations and the uncertain structure of the world that generated that experience. This formulation is a recent synthesis of several research programs in AI and cognitive science, but it also represents a surprising convergence of several of Turing's pioneering insights in AI, the foundations of computation, and statistics.
Keywords: probabilistic programming, Alan Turing, artificial intelligence
We describe a dynamic programming algorithm for computing the marginal distribution of discrete probabilistic programs. This algorithm takes a functional interpreter for an arbitrary probabilistic programming language and turns it into an efficient marginalizer. Because direct caching of sub-distributions is impossible in the presence of recursion, we build a graph of dependencies between sub-distributions. This factored sum-product network makes (potentially cyclic) dependencies between subproblems explicit, and corresponds to a system of equations for the marginal distribution. We solve these equations by fixed-point iteration in topological order. We illustrate this algorithm on examples used in teaching probabilistic models, computational cognitive science research, and game theory.
Keywords: probabilistic programming, inference, dynamic programming
We study the computability of conditional probability, a fundamental notion in probability theory and Bayesian statistics. In the elementary discrete setting, a ratio of probabilities defines conditional probability. In more general settings, conditional probability is defined axiomatically, and the search for more constructive definitions is the subject of a rich literature in probability theory and statistics. However, we show that in general one cannot compute conditional probabilities. Specifically, we construct a pair of computable random variables (X, Y) in the unit interval whose conditional distribution P[Y|X] encodes the halting problem.
Nevertheless, probabilistic inference has proven remarkably successful in practice, even in infinite-dimensional continuous settings. We prove several results giving general conditions under which conditional distributions are computable. In the discrete or dominated setting, under suitable computability hypotheses, conditional distributions are computable. Likewise, conditioning is a computable operation in the presence of certain additional structure, such as independent absolutely continuous noise.
We investigate the class of computable probability distributions and explore the fundamental limitations of using this class to describe and compute conditional distributions. In addition to proving the existence of noncomputable conditional distributions, and thus ruling out the possibility of generic probabilistic inference algorithms (even inefficient ones), we highlight some positive results showing that posterior inference is possible in the presence of additional structure like exchangeability and noise, both of which are common in Bayesian hierarchical modeling.
This theoretical work bears on the development of probabilistic programming languages (which enable the specification of complex probabilistic models) and their implementations (which can be used to perform Bayesian reasoning). The probabilistic programming approach is particularly well suited for defining infinite-dimensional, recursively-defined stochastic processes of the sort used in nonparametric Bayesian statistics. We present a new construction of the Mondrian process as a partition-valued Markov process in continuous time, which can be viewed as placing a distribution on an infinite kd-tree data structure.
This thesis presents a formal model of productivity and reuse which treats the problem as a structure-by-structure inference in a Bayesian framework. The model—Fragment Grammars, a generalization of Adaptor Grammars (Johnson et al., 2007)—is built around two proposals. The first is that anything that can be computed can be stored. The specific computational mechanism by which this is accomplished, stochastic memoization, is inherited from Adaptor Grammars (Goodman et al., 2008; Johnson et al., 2007). The second proposal is that any stored item can include subparts which must be computed productively. This is made possible by the computational mechanism of stochastically lazy evaluation, introduced in the thesis.
As inductive inference and machine learning methods in computer science see continued success, researchers are aiming to describe even more complex probabilistic models and inference algorithms. What are the limits of mechanizing probabilistic inference? We investigate the computability of conditional probability, a fundamental notion in probability theory and a cornerstone of Bayesian statistics, and show that there are computable joint distributions with noncomputable conditional distributions, ruling out the prospect of general inference algorithms, even inefficient ones. Specifically, we construct a pair of computable random variables in the unit interval such that the conditional distribution of the first variable given the second encodes the halting problem. Nevertheless, probabilistic inference is possible in many common modeling settings, and we prove several results giving broadly applicable conditions under which conditional distributions are computable. In particular, conditional distributions become computable when measurements are corrupted by independent computable noise with a sufficiently smooth density.
Keywords: computable probability theory, conditional probability
As we devise more complicated prior distributions, will inference algorithms keep up? We highlight a negative result in computable probability theory by Ackerman, Freer, and Roy (2010) that shows that there exist computable priors with noncomputable posteriors. In addition to providing a brief survey of computable probability theory geared towards the A.I. and statistics community, we give a new result characterizing when conditioning is computable in the setting of exchangeable sequences, and provide a computational perspective on work by Orbanz (2010) on conjugate nonparametric models. In particular, using a computable extension of de Finetti’s theorem (Freer and Roy 2009), we describe how to transform a posterior predictive rule for generating an exchangeable sequence into an algorithm for computing the posterior distribution of the directing random measure.
We prove a computable version of de Finetti's theorem on exchangeable sequences of real random variables. As a consequence, exchangeable stochastic processes expressed in probabilistic functional programming languages can be automatically rewritten as procedures that do not modify non-local state. Along the way, we prove that a distribution on the unit interval is computable if and only if its moments are uniformly computable.
Keywords: FACTORIE, applications
Language relies on a division of labor between stored units and structure building operations which combine the stored units into larger structures. This division of labor leads to a tradeoﬀ: more structure-building means less need to store while more storage means less need to compute structure. We develop a hierarchical Bayesian model called fragment grammar to explore the optimum balance between structure-building and reuse. The model is developed in the context of stochastic functional programming (SFP), and in particular, using a probabilistic variant of Lisp known as the Church programming language. We show how to formalize several probabilistic models of language structure using Church, and how fragment grammar generalizes one of them—adaptor grammars. We conclude with experimental data with adults and preliminary evaluations of the model on natural language corpus data.
I introduce a new set of natively probabilistic computing abstractions, including probabilistic generalizations of Boolean circuits, backtracking search and pure Lisp. I show how these tools let one compactly specify probabilistic generative models, generalize and parallelize widely used sampling algorithms like rejection sampling and Markov chain Monte Carlo, and solve difficult Bayesian inference problems.
I first introduce Church, a probabilistic programming language for describing probabilistic generative processes that induce distributions, which generalizes Lisp, a language for describing deterministic procedures that induce functions. I highlight the ways randomness meshes with the reflectiveness of Lisp to support the representation of structured, uncertain knowledge, including nonparametric Bayesian models from the current literature, programs for decision making under uncertainty, and programs that learn very simple programs from data. I then introduce systematic stochastic search, a recursive algorithm for exact and approximate sampling that generalizes a popular form of backtracking search to the broader setting of stochastic simulation and recovers widely used particle filters as a special case. I use it to solve probabilistic reasoning problems from statistical physics, causal reasoning and stereo vision. Finally, I introduce stochastic digital circuits that model the probability algebra just as traditional Boolean circuits model the Boolean algebra. I show how these circuits can be used to build massively parallel, fault-tolerant machines for sampling and allow one to efﬁciently run Markov chain Monte Carlo methods on models with hundreds of thousands of variables in real time.
I emphasize the ways in which these ideas fit together into a coherent software and hardware stack for natively probabilistic computing, organized around distributions and samplers rather than deterministic functions. I argue that by building uncertainty and randomness into the foundations of our programming languages and computing machines, we may arrive at ones that are more powerful, flexible and efficient than deterministic designs, and are in better alignment with the needs of computational science, statistics and artiﬁcial intelligence.
This thesis attempts to implement Bayesian induction of programs, a la Solomonoff. We develop equational theories and a theorem prover for lambda-calculus with a probability monad, and use this to enumerate the first few thousand equivalence classes of programs, ordered by Kolmogorov complexity. The resulting software can search for simple programs satisfying user-defined constraints.
Keywords: automated deduction, lambda-calculus, Bayesian inference
Formal languages for probabilistic modeling enable re-use, modularity, and descriptive clarity, and can foster generic inference techniques. We introduce Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pure Lisp as its deterministic subset. The semantics of Church is defined in terms of evaluation histories and conditional distributions on such histories. Church also includes a novel language construct, the stochastic memoizer, which enables simple description of many complex non-parametric models. We illustrate language features through several examples, including: a generalized Bayes net in which parameters cluster over trials, infinite PCFGs, planning by inference, and various non-parametric clustering models. Finally, we show how to implement query on any Church program, exactly and approximately, using Monte Carlo techniques.
Practical Bayesian data analysis involves manipulating and summarizing simulations from the posterior distribution of the unknown parameters. By manipulation we mean computing posterior distributions of functions of the unknowns, and generating posterior predictive distributions. The results need to be summarized both numerically and graphically.
We introduce, and implement in R, an object-oriented programming paradigm based on a random variable object type that is implicitly represented by simulations. This makes it possible to define vector and array objects that may contain both random and deterministic quantities, and syntax rules that allow to treat these objects like any numeric vectors or arrays, providing a solution to various problems encountered in Bayesian computing involving posterior simulations.
We illustrate the use of this new programming environment with examples of Bayesian computing, demonstrating missing-value imputation, nonlinear summary of regression predictions, and posterior predictive checking.
Humans and other intelligent agents must make inferences about the real-world objects that underlie their observations: for instance, the objects visible in an image, or the people mentioned in a set of text documents. The agent may not know in advance how many objects exist, how they are related to each other, or which observations correspond to which underlying objects. Existing declarative representations for probabilistic models do not capture the structure of such scenarios.
This thesis introduces Bayesian logic (BLOG), a ﬁrst-order probabilistic modeling language that specifies probability distributions over possible worlds with varying sets of objects. A BLOG model contains statements that define conditional probability distributions for a certain set of random variables; the model also specifies certain context-speciﬁc independence properties. We provide criteria under which such a model is guaranteed to fully deﬁne a probability distribution. These criteria go beyond existing results in that they can be satisﬁed even when the Bayesian network deﬁned by the model is cyclic, or contains nodes with inﬁnitely many ancestors.
We describe several approximate inference algorithms that exploit the contextspeciﬁc dependence structure revealed by a BLOG model. First, we present rejection sampling and likelihood weighting algorithms that are guaranteed to converge to the correct probability for any query on a structurally well-defined BLOG model. Because these algorithms instantiate only those variables that are context-specifically relevant, they can generate samples in finite time even when the model defines infinitely many variables. We then deﬁne a general framework for inference on BLOG models using Markov chain Monte Carlo (MCMC) algorithms. This framework allows a programmer to plug in a domain-specfiic proposal distribution, which helps the Markov chain move to high-probability worlds. Furthermore, the chain can operate on partial world descriptions that specify values only for context-specifically relevant variables. We give conditions under which MCMC over such partial world descriptions is guaranteed to converge to correct probabilities. We also show that this framework performs efficiently on a real-world task: reconstructing the set of distinct publications referred to by a set of bibliographic citations.
Keywords: domain theory, probability
Keywords: domain theory, probability
Keywords: domain theory, probability
Keywords: domain theory, probability
Keywords: domain theory, randomized algorithms
Keywords: inductive inference
Keywords: inductive inference