Research articles on probabilistic programming

logoPROBABILISTIC-PROGRAMMING.org

This list of research articles is under construction and very incomplete. Please help us collect relevant articles by submitting new bibtex entries or correcting mistakes. This list is maintained by Daniel Roy.


[Stu15]
Andreas Stuhlmüller. Modeling Cognition with Probabilistic Programs: Representations and Algorithms. PhD thesis, Massachusetts Institute of Technology, 6 2015. [ bib ]
[RSG15]
Daniel Ritchie, Andreas Stuhlmüller, and Noah D. Goodman. C3: Lightweight incrementalized mcmc for probabilistic programs using continuations and callsite caching, 2015. [ bib | arXiv | http ]
Keywords: probabilistic programming, inference, mcmc, metropolis-hastings
[SHSG15]
Andreas Stuhlmüller, Robert X. D. Hawkins, N. Siddharth, and Noah D. Goodman. Coarse-to-fine sequential monte carlo for probabilistic programs, 2015. [ bib | arXiv | http ]
Keywords: probabilistic programming, inference, smc, particle filter, coarse-to-fine
[FH15]
Luis María Ferrer Fioriti and Holger Hermanns. Probabilistic termination: Soundness, completeness, and compositionality. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '15, pages 489–501, New York, NY, USA, 2015. ACM. [ bib | DOI | http ]
We propose a framework to prove almost sure termination for probabilistic programs with real valued variables. It is based on ranking supermartingales, a notion analogous to ranking functions on non-probabilistic programs. The framework is proven sound and complete for a meaningful class of programs involving randomization and bounded nondeterminism. We complement this foundational insight by a practical proof methodology, based on sound conditions that enable compositional reasoning and are amenable to a direct implementation using modern theorem provers. This is integrated in a small dependent type system, to overcome the problem that lexicographic ranking functions fail when combined with randomization. Among others, this compositional methodology enables the verification of probabilistic programs outside the complete class that admits ranking supermartingales.

Keywords: probabilistic programs, program verification, supermartingales, termination
[WvdMM14]
F. Wood, J. W. van de Meent, and V. Mansinghka. A new approach to probabilistic programming inference. In Artificial Intelligence and Statistics, 2014. [ bib ]
[PW14]
Brooks Paige and Frank Wood. A compilation target for probabilistic programming languages. arXiv preprint arXiv:1403.0504, 2014. [ bib ]
[Cas14a]
John Cassel. Probabilistic programming with stochastic memoization: Implementing non-parametric bayesian inference. Mathematica Journal, 16, 2014. [ bib | DOI ]
[DPW14]
Alessandra Di Pierro and Herbert Wiklicky. Probabilistic analysis of programs: A weak limit approach. In Ugo dal Lago, editor, 3rd International Workshop on Foundational and Practical Aspects of Resource Anlysis (FOPARA 2013), volume 8552 of Lecture Notes in Computer Science. Springer, 2014. [ bib ]
[Cas14b]
John Cassel. Probabilistic programming with stochastic memoization: Implementing non-parametric bayesian inference. Mathematica Journal, 16, 2014. [ bib | DOI ]
[GHNR14]
Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic programming. In International Conference on Software Engineering (ICSE, FOSE track), 2014. [ bib | .pdf ]
[HNRS14]
Chung-Kil Hur, Aditya V. Nori, Sriram K. Rajamani, and Selva Samuel. Slicing probabilistic programs. In Programming Languages Design and Implementation (PLDI), 2014. [ bib | http ]
[CRN+13]
Guillaume Claret, Sriram K. Rajamani, Aditya V. Nori, Andrew D. Gordon, and Johannes Borgström. Bayesian inference using data flow analysis. Technical Report MSR-TR-2013-27, Microsoft Research, March 2013. [ bib | http ]
We present a new algorithm for Bayesian inference over probabilistic programs, based on data flow analysis techniques from the program analysis community. Unlike existing techniques for Bayesian inference on probabilistic programs, our data flow analysis algorithm is able to perform inference directly on probabilistic programs with loops. Even for loop-free programs, we show that data flow analysis offers better precision and better performance benefits over existing techniques. We also describe heuristics that are crucial for our inference to scale, and present an empirical evaluation of our algorithm over a range of benchmarks.

[DPW13]
Alessandra Di Pierro and Herbert Wiklicky. Semantics of probabilistic programs: A weak limit approach. In Chung chieh Shan, editor, Programming Languages and Systems – Proceedings of the 11th Asian Symposium, APLAS 2013, volume 8301 of Lecture Notes in Computer Science, pages 241–256. Springer International Publishing, 2013. [ bib | DOI ]
[BMAM13]
Pierre Bessière, Emmanuel Mazer, Juan-Manuel Ahuactzin, and Kamel Mekhnacha. Bayesian Programming. Chapman and Hall/CRC, 2013. [ bib ]
[CNR13]
Arun Chaganty, Aditya V. Nori, and Sriram K. Rajamani. Efficiently sampling probabilistic programs via program analysis. In Artificial Intelligence and Statistics (AISTATS), pages 153–160, 2013. [ bib | .pdf ]
[CLNR13]
Arun T. Chaganty, Akash Lal, Aditya V. Nori, and Sriram K. Rajamani. Combining relational learning with smt solvers using cegar. In Computer Aided Verification (CAV), 2013. [ bib | .pdf ]
[GAB+13]
Andrew D. Gordon, Mihhail Aizatulin, Johannes Borgström, Guillaume Claret, Thore Graepel, Aditya V. Nori, Sriram K. Rajamani, and Claudio Russo. A model-learner pattern for Bayesian reasoning. Technical Report MSR-TR-2013-1, Microsoft Research, January 2013. [ bib | http ]
A Bayesian model is based on a pair of probability distributions, known as the prior and sampling distributions. A wide range of fundamental machine learning tasks, including regression, classification, clustering, and many others, can all be seen as Bayesian models. We propose a new probabilistic programming abstraction, a typed Bayesian model, based on a pair of probabilistic expressions for the prior and sampling distributions. A sampler for a model is an algorithm to compute synthetic data from its sampling distribution, while a learner for a model is an algorithm for probabilistic inference on the model. Models, samplers, and learners form a generic programming pattern for model-based inference. They support the uniform expression of common tasks including model testing, and generic compositions such as mixture models, evidence-based model averaging, and mixtures of experts. A formal semantics supports reasoning about model equivalence and implementation correctness. By developing a series of examples and three learner implementations based on exact inference, factor graphs, and Markov chain Monte Carlo, we demonstrate the broad applicability of this new programming pattern.

[BBGR13]
Sooraj Bhat, Johannes Borgström, Andrew D. Gordon, and Claudio Russo. Deriving probability density functions from probabilistic functional programs. In 19th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). Springer, 2013. EAPLS Best Paper Award for ETAPS 2013. [ bib | http ]
The probability density function of a probability distribution is a fundamental concept in probability theory and a key ingredient in various widely used machine learning methods. However, the necessary framework for compiling probabilistic functional programs to density functions has only recently been developed. In this work, we present a density compiler for a probabilistic language with discrete and continuous distributions, and discrete observations, and provide a proof of its soundness. The compiler greatly reduces the development effort of domain experts, which we demonstrate by solving inference problems from various scientific applications, such as modelling the global carbon cycle, using a standard Markov chain Monte Carlo framework.

[Gib13]
Jeremy Gibbons. Unifying theories of programming with monads. In Unifying Theories of Programming, volume 7681 of Lecture Notes in Computer Science, pages 23–67. Springer, 2013. [ bib | DOI ]
The combination of probabilistic and nondeterministic choice in program calculi is a notoriously tricky problem, and one with a long history. We present a simple functional programming approach to this challenge, based on algebraic theories of computational effects. We make use of the powerful abstraction facilities of modern functional languages, to introduce the choice operations as a little embedded domain-specific language rather than having to define a language extension; we rely on referential transparency, to justify straightforward equational reasoning about program behaviour.

[CM12]
Patrick Cousot and Michael Monerau. Probabilistic abstract interpretation. In Helmut Seidl, editor, Programming Languages and Systems, number 7211 in Lecture Notes in Computer Science, pages 169–193. Springer Berlin Heidelberg, January 2012. [ bib | http ]
Abstract interpretation has been widely used for verifying properties of computer systems. Here, we present a way to extend this framework to the case of probabilistic systems. The probabilistic abstraction framework that we propose allows us to systematically lift any classical analysis or verification method to the probabilistic setting by separating in the program semantics the probabilistic behavior from the (non-)deterministic behavior. This separation provides new insights for designing novel probabilistic static analyses and verification methods. We define the concrete probabilistic semantics and propose different ways to abstract them. We provide examples illustrating the expressiveness and effectiveness of our approach.

Keywords: Computer Communication Networks, Logics and Meanings of Programs, Mathematical Logic and Formal Languages, Programming Languages, Compilers, Interpreters, Programming Techniques, Software Engineering
[FRT12]
Cameron E. Freer, Daniel M. Roy, and Joshua B. Tenenbaum. Towards common-sense reasoning via conditional simulation: legacies of Turing in Artificial Intelligence. In Rod Downey, editor, Turing's Legacy, ASL Lecture Notes in Logic. Cambridge University Press, 2012. [ bib | arXiv | .pdf ]
The problem of replicating the flexibility of human common-sense reasoning has captured the imagination of computer scientists since the early days of Alan Turing's foundational work on computation and the philosophy of artificial intelligence. In the intervening years, the idea of cognition as computation has emerged as a fundamental tenet of Artificial Intelligence (AI) and cognitive science. But what kind of computation is cognition?

We describe a computational formalism centered around a probabilistic Turing machine called QUERY, which captures the operation of probabilistic conditioning via conditional simulation. Through several examples and analyses, we demonstrate how the QUERY abstraction can be used to cast common-sense reasoning as probabilistic inference in a statistical model of our observations and the uncertain structure of the world that generated that experience. This formulation is a recent synthesis of several research programs in AI and cognitive science, but it also represents a surprising convergence of several of Turing's pioneering insights in AI, the foundations of computation, and statistics.

Keywords: probabilistic programming, Alan Turing, artificial intelligence
[SG12]
Andreas Stuhlmüller and Noah D. Goodman. A dynamic programming algorithm for inference in recursive probabilistic programs. In Second Statistical Relational AI workshop at UAI 2012 (StaRAI-12), 2012. [ bib | arXiv ]
We describe a dynamic programming algorithm for computing the marginal distribution of discrete probabilistic programs. This algorithm takes a functional interpreter for an arbitrary probabilistic programming language and turns it into an efficient marginalizer. Because direct caching of sub-distributions is impossible in the presence of recursion, we build a graph of dependencies between sub-distributions. This factored sum-product network makes (potentially cyclic) dependencies between subproblems explicit, and corresponds to a system of equations for the marginal distribution. We solve these equations by fixed-point iteration in topological order. We illustrate this algorithm on examples used in teaching probabilistic models, computational cognitive science research, and game theory.

Keywords: probabilistic programming, inference, dynamic programming
[DHW11]
Alessandra Di Pierro, Chris Hankin, and Herbert Wiklicky. Probabilistic timing covert channels: To close or not to close? International Journal of Information Security, 10(2):83–106, 2011. [ bib ]
[AFR11]
Nathanael L. Ackerman, Cameron E. Freer, and Daniel M. Roy. Noncomputable conditional distributions. In Proc. of the 26th Ann. Symp. on Logic in Comp. Sci. IEEE Press, 2011. [ bib | .pdf ]
We study the computability of conditional probability, a fundamental notion in probability theory and Bayesian statistics. In the elementary discrete setting, a ratio of probabilities defines conditional probability. In more general settings, conditional probability is defined axiomatically, and the search for more constructive definitions is the subject of a rich literature in probability theory and statistics. However, we show that in general one cannot compute conditional probabilities. Specifically, we construct a pair of computable random variables (X, Y) in the unit interval whose conditional distribution P[Y|X] encodes the halting problem.

Nevertheless, probabilistic inference has proven remarkably successful in practice, even in infinite-dimensional continuous settings. We prove several results giving general conditions under which conditional distributions are computable. In the discrete or dominated setting, under suitable computability hypotheses, conditional distributions are computable. Likewise, conditioning is a computable operation in the presence of certain additional structure, such as independent absolutely continuous noise.

[Roy11]
Daniel M. Roy. Computability, inference and modeling in probabilistic programming. PhD thesis, Massachusetts Institute of Technology, 2011. MIT/EECS George M. Sprowls Doctoral Dissertation Award. [ bib | .pdf ]
We investigate the class of computable probability distributions and explore the fundamental limitations of using this class to describe and compute conditional distributions. In addition to proving the existence of noncomputable conditional distributions, and thus ruling out the possibility of generic probabilistic inference algorithms (even inefficient ones), we highlight some positive results showing that posterior inference is possible in the presence of additional structure like exchangeability and noise, both of which are common in Bayesian hierarchical modeling.

This theoretical work bears on the development of probabilistic programming languages (which enable the specification of complex probabilistic models) and their implementations (which can be used to perform Bayesian reasoning). The probabilistic programming approach is particularly well suited for defining infinite-dimensional, recursively-defined stochastic processes of the sort used in nonparametric Bayesian statistics. We present a new construction of the Mondrian process as a partition-valued Markov process in continuous time, which can be viewed as placing a distribution on an infinite kd-tree data structure.

[WSG11]
David Wingate, Andreas Stuhlmüller, and Noah D. Goodman. Lightweight implementations of probabilistic programming languages via transformational compilation. In Proc. of the 14th Artificial Intelligence and Statistics, 2011. [ bib | .pdf ]
[WGSS11]
David Wingate, Noah D. Goodman, Andreas Stuhlmüller, and Jeffrey M. Siskind. Nonstandard interpretations of probabilistic programs for efficient inference. In Adv. in Neural Inform. Processing Syst., volume 24, 2011. [ bib | .pdf ]
[HSG11]
Irvin Hwang, Andreas Stuhlmüller, and Noah D. Goodman. Inducing probabilistic programs by bayesian program merging, 2011. [ bib | arXiv ]
[O'D11]
Timothy J. O'Donnell. Productivity and Reuse in Language. PhD thesis, Harvard University, 2011. [ bib ]
This thesis presents a formal model of productivity and reuse which treats the problem as a structure-by-structure inference in a Bayesian framework. The model—Fragment Grammars, a generalization of Adaptor Grammars (Johnson et al., 2007)—is built around two proposals. The first is that anything that can be computed can be stored. The specific computational mechanism by which this is accomplished, stochastic memoization, is inherited from Adaptor Grammars (Goodman et al., 2008; Johnson et al., 2007). The second proposal is that any stored item can include subparts which must be computed productively. This is made possible by the computational mechanism of stochastically lazy evaluation, introduced in the thesis.

[DHW10]
Alessandra Di Pierro, Chris Hankin, and Herbert Wiklicky. Program analysis probably counts. The Computer Journal, 53(6):871–880, 2010. [ bib ]
[Poo10]
David Poole. Probabilistic programming languages: Independent choices and deterministic systems. In R. Dechter, H. Geffner, and J.Y. Halpern, editors, Heuristics, Probability and Causality: A Tribute to Judea Pearl, pages 253–269. College Publications, 2010. [ bib | .pdf ]
[AFR10]
Nathanael L. Ackerman, Cameron E. Freer, and Daniel M. Roy. On the computability of conditional probability, 2010. [ bib | arXiv | .pdf ]
As inductive inference and machine learning methods in computer science see continued success, researchers are aiming to describe even more complex probabilistic models and inference algorithms. What are the limits of mechanizing probabilistic inference? We investigate the computability of conditional probability, a fundamental notion in probability theory and a cornerstone of Bayesian statistics, and show that there are computable joint distributions with noncomputable conditional distributions, ruling out the prospect of general inference algorithms, even inefficient ones. Specifically, we construct a pair of computable random variables in the unit interval such that the conditional distribution of the first variable given the second encodes the halting problem. Nevertheless, probabilistic inference is possible in many common modeling settings, and we prove several results giving broadly applicable conditions under which conditional distributions are computable. In particular, conditional distributions become computable when measurements are corrupted by independent computable noise with a sufficiently smooth density.

Keywords: computable probability theory, conditional probability
[FR10]
Cameron E. Freer and Daniel M. Roy. Posterior distributions are computable from predictive distributions. In Y. W. Teh and M. Titterington, editors, Proc. of the 13th Artificial Intelligence and Statistics, pages 233–240, 2010. [ bib | .pdf ]
As we devise more complicated prior distributions, will inference algorithms keep up? We highlight a negative result in computable probability theory by Ackerman, Freer, and Roy (2010) that shows that there exist computable priors with noncomputable posteriors. In addition to providing a brief survey of computable probability theory geared towards the A.I. and statistics community, we give a new result characterizing when conditioning is computable in the setting of exchangeable sequences, and provide a computational perspective on work by Orbanz (2010) on conjugate nonparametric models. In particular, using a computable extension of de Finetti’s theorem (Freer and Roy 2009), we describe how to transform a posterior predictive rule for generating an exchangeable sequence into an algorithm for computing the posterior distribution of the directing random measure.

[KS09b]
Oleg Kiselyov and Chung-chieh Shan. Monolingual probabilistic programming using generalized coroutines. In Proc. of Uncertainty in Artificial Intelligence, pages 285–292, 2009. [ bib ]
[FR09]
Cameron E. Freer and Daniel M. Roy. Computable de Finetti measures, 2009. [ bib | DOI | arXiv | .pdf ]
We prove a computable version of de Finetti's theorem on exchangeable sequences of real random variables. As a consequence, exchangeable stochastic processes expressed in probabilistic functional programming languages can be automatically rewritten as procedures that do not modify non-local state. Along the way, we prove that a distribution on the unit interval is computable if and only if its moments are uniformly computable.

[KS09a]
Oleg Kiselyov and Chung-chieh Shan. Embedded probabilistic programming. In Domain-Specific Languages, pages 360–384, 2009. [ bib | DOI ]
[MSS09]
Andrew McCallum, Karl Schultz, and Sameer Singh. Factorie: Probabilistic programming via imperatively defined factor graphs. In Adv. in Neural Inform. Processing Syst., volume 22, 2009. [ bib | .pdf ]
Keywords: FACTORIE
[SSM09]
Sameer Singh, Karl Schultz, and Andrew McCallum. Bi-directional joint inference for entity resolution and segmentation using imperatively-defined factor graphs. In Machine Learning and Knowledge Discovery in Databases, 2009. [ bib | .pdf ]
Keywords: FACTORIE, applications
[OGT09]
Timothy J. O'Donnell, Noah D. Goodman, and Joshua B. Tenenbaum. Fragment grammars: Exploring computation and reuse in language. Technical Report MIT-CSAIL-TR-2009-013, Massachusetts Institute of Technology, 2009. [ bib | http ]
Language relies on a division of labor between stored units and structure building operations which combine the stored units into larger structures. This division of labor leads to a tradeoff: more structure-building means less need to store while more storage means less need to compute structure. We develop a hierarchical Bayesian model called fragment grammar to explore the optimum balance between structure-building and reuse. The model is developed in the context of stochastic functional programming (SFP), and in particular, using a probabilistic variant of Lisp known as the Church programming language. We show how to formalize several probabilistic models of language structure using Church, and how fragment grammar generalizes one of them—adaptor grammars. We conclude with experimental data with adults and preliminary evaluations of the model on natural language corpus data.

[Man09]
Vikash Mansinghka. Natively Probabilistic Computation. PhD thesis, Massachusetts Institute of Technology, 2009. MIT/EECS George M. Sprowls Doctoral Dissertation Award. [ bib | .pdf ]
I introduce a new set of natively probabilistic computing abstractions, including probabilistic generalizations of Boolean circuits, backtracking search and pure Lisp. I show how these tools let one compactly specify probabilistic generative models, generalize and parallelize widely used sampling algorithms like rejection sampling and Markov chain Monte Carlo, and solve difficult Bayesian inference problems.

I first introduce Church, a probabilistic programming language for describing probabilistic generative processes that induce distributions, which generalizes Lisp, a language for describing deterministic procedures that induce functions. I highlight the ways randomness meshes with the reflectiveness of Lisp to support the representation of structured, uncertain knowledge, including nonparametric Bayesian models from the current literature, programs for decision making under uncertainty, and programs that learn very simple programs from data. I then introduce systematic stochastic search, a recursive algorithm for exact and approximate sampling that generalizes a popular form of backtracking search to the broader setting of stochastic simulation and recovers widely used particle filters as a special case. I use it to solve probabilistic reasoning problems from statistical physics, causal reasoning and stereo vision. Finally, I introduce stochastic digital circuits that model the probability algebra just as traditional Boolean circuits model the Boolean algebra. I show how these circuits can be used to build massively parallel, fault-tolerant machines for sampling and allow one to efficiently run Markov chain Monte Carlo methods on models with hundreds of thousands of variables in real time.

I emphasize the ways in which these ideas fit together into a coherent software and hardware stack for natively probabilistic computing, organized around distributions and samplers rather than deterministic functions. I argue that by building uncertainty and randomness into the foundations of our programming languages and computing machines, we may arrive at ones that are more powerful, flexible and efficient than deterministic designs, and are in better alignment with the needs of computational science, statistics and artificial intelligence.

[Obe09]
Fritz Obermeyer. Automated equational reasoning in nondeterministic lambda-calculi modulo theories H*. PhD thesis, Carnegie-Mellon University, 2009. [ bib | http | .pdf ]
This thesis attempts to implement Bayesian induction of programs, a la Solomonoff. We develop equational theories and a theorem prover for lambda-calculus with a probability monad, and use this to enumerate the first few thousand equivalence classes of programs, ordered by Kolmogorov complexity. The resulting software can search for simple programs satisfying user-defined constraints.

Keywords: automated deduction, lambda-calculus, Bayesian inference
[BLS08]
P. Bessière, C. Laugier, and R. Siegwart. Probabilistic Reasoning and Decision Making in Sensory-Motor Systems. Springer, 2008. [ bib ]
[RMGT08]
Daniel M. Roy, Vikash K. Mansinghka, Noah D. Goodman, and Joshua B. Tenenbaum. A stochastic programming perspective on nonparametric Bayes. Nonparametric Bayesian Workshop, Int. Conf. on Machine Learning, 2008. [ bib ]
[PPT08]
Sungwoo Park, Frank Pfenning, and Sebastian Thrun. A probabilistic language based on sampling functions. ACM Trans. Program. Lang. Syst., 31(1):1–46, 2008. [ bib ]
[GMR+08]
Noah D. Goodman, Vikash K. Mansinghka, Daniel M. Roy, Keith Bonawitz, and Joshua B. Tenenbaum. Church: a language for generative models. In Proc. of Uncertainty in Artificial Intelligence, 2008. [ bib | .pdf ]
Formal languages for probabilistic modeling enable re-use, modularity, and descriptive clarity, and can foster generic inference techniques. We introduce Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pure Lisp as its deterministic subset. The semantics of Church is defined in terms of evaluation histories and conditional distributions on such histories. Church also includes a novel language construct, the stochastic memoizer, which enables simple description of many complex non-parametric models. We illustrate language features through several examples, including: a generalized Bayes net in which parameters cluster over trials, infinite PCFGs, planning by inference, and various non-parametric clustering models. Finally, we show how to implement query on any Church program, exactly and approximately, using Monte Carlo techniques.

[MW08]
Tom Minka and John Winn. Gates: a graphical notation for mixture models. In Adv. in Neural Inform. Processing Syst., volume 21, 2008. [ bib ]
[MRW+08]
Andrew McCallum, Khashayar Rohanemanesh, Michael Wick, Karl Schultz, and Sameer Singh. Factorie: Efficient probabilistic programming for relational factor graphs via imperative declarations of structure, inference and learning. In NIPS Workshop on Probabilistic Programming, 2008. [ bib | .pdf ]
Keywords: FACTORIE
[MZK+08]
Brian Milch, Luke S. Zettlemoyer, Kristian Kersting, Michael Haimes, and Leslie Pack Kaelbling. Lifted probabilistic inference with counting formulas. In Proc. 23rd AAAI Conference on Artificial Intelligence, pages 1062–1068, 2008. [ bib | .pdf ]
[DHW07]
Alessandra Di Pierro, Chris Hankin, and Herbert Wiklicky. A systematic approach to probabilistic pointer analysis. In Z. Shao, editor, Proceedings of APLAS'07 – Fifth ASIAN Symposium on Programming Languages and Systems, volume 4807 of Lecture Notes in Computer Science, pages 335–350. Springer Verlag, 2007. [ bib ]
[MMR+07]
Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. BLOG: Probabilistic models with unknown objects. In Lise Getoor and Ben Taskar, editors, Statistical Relational Learning. MIT Press, 2007. [ bib | .pdf ]
[KG07]
Jouni Kerman and Andrew Gelman. Manipulating and summarizing posterior simulations using random variable objects. Stat. Comput., 17:235–244, 2007. [ bib | DOI | .pdf ]
Practical Bayesian data analysis involves manipulating and summarizing simulations from the posterior distribution of the unknown parameters. By manipulation we mean computing posterior distributions of functions of the unknowns, and generating posterior predictive distributions. The results need to be summarized both numerically and graphically.

We introduce, and implement in R, an object-oriented programming paradigm based on a random variable object type that is implicitly represented by simulations. This makes it possible to define vector and array objects that may contain both random and deterministic quantities, and syntax rules that allow to treat these objects like any numeric vectors or arrays, providing a solution to various problems encountered in Bayesian computing involving posterior simulations.

We illustrate the use of this new programming environment with examples of Bayesian computing, demonstrating missing-value imputation, nonlinear summary of regression predictions, and posterior predictive checking.

[Mil06]
Brian Milch. Probabilistic Models with Unknown Objects. PhD thesis, University of California, Berkeley, 2006. [ bib | .pdf ]
Humans and other intelligent agents must make inferences about the real-world objects that underlie their observations: for instance, the objects visible in an image, or the people mentioned in a set of text documents. The agent may not know in advance how many objects exist, how they are related to each other, or which observations correspond to which underlying objects. Existing declarative representations for probabilistic models do not capture the structure of such scenarios.

This thesis introduces Bayesian logic (BLOG), a first-order probabilistic modeling language that specifies probability distributions over possible worlds with varying sets of objects. A BLOG model contains statements that define conditional probability distributions for a certain set of random variables; the model also specifies certain context-specific independence properties. We provide criteria under which such a model is guaranteed to fully define a probability distribution. These criteria go beyond existing results in that they can be satisfied even when the Bayesian network defined by the model is cyclic, or contains nodes with infinitely many ancestors.

We describe several approximate inference algorithms that exploit the contextspecific dependence structure revealed by a BLOG model. First, we present rejection sampling and likelihood weighting algorithms that are guaranteed to converge to the correct probability for any query on a structurally well-defined BLOG model. Because these algorithms instantiate only those variables that are context-specifically relevant, they can generate samples in finite time even when the model defines infinitely many variables. We then define a general framework for inference on BLOG models using Markov chain Monte Carlo (MCMC) algorithms. This framework allows a programmer to plug in a domain-specfiic proposal distribution, which helps the Markov chain move to high-probability worlds. Furthermore, the chain can operate on partial world descriptions that specify values only for context-specifically relevant variables. We give conditions under which MCMC over such partial world descriptions is guaranteed to converge to correct probabilities. We also show that this framework performs efficiently on a real-world task: reconstructing the set of distinct publications referred to by a set of bibliographic citations.

[MR06]
Brian Milch and Stuart Russell. General-purpose MCMC inference over relational structures. In Proc. 22nd Conference on Uncertainty in Artificial Intelligence, pages 349–358, 2006. [ bib | .pdf ]
[DHW05]
A. Di Pierro, C. Hankin, and H. Wiklicky. Probabilistic lambda-calculus and quantitative program analysis. Journal of Logic and Computation, 15(2):159–179, 2005. [ bib ]
[MMR+05]
Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, and Andrey Kolobov. BLOG: Probabilistic models with unknown objects. In Proc. 19th International Joint Conference on Artificial Intelligence, pages 1352–1359, 2005. [ bib | .pdf ]
[MMS+05]
Brian Milch, Bhaskara Marthi, David Sontag, Stuart Russell, Daniel L. Ong, and Andrey Kolobov. Approximate inference for infinite contingent Bayesian networks. In 10th International Workshop on Artificial Intelligence and Statistics, 2005. [ bib | .pdf ]
[WS04]
Jon Whittle and Johann Schumann. Automating the implementation of Kalman filter algorithms. ACM Trans. Math. Softw., 30(4):434–453, December 2004. [ bib | DOI | .pdf ]
[LBDM04]
O. Lebeltel, P. Bessière, J. Diard, and E.. Mazer. Bayesian robot programming. Advanced Robotics, 16(1):49–79, 2004. [ bib ]
[MMR04]
Brian Milch, Bhaskara Marthi, and Stuart Russell. BLOG: Relational modeling with unknown objects. ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields, 2004. [ bib | .pdf ]
[MMR03]
Bhaskara Marthi, Brian Milch, and Stuart Russell. First-order probabilistic models for information extraction. IJCAI Workshop on Learning Statistical Models from Relational Data, 2003. [ bib | .pdf ]
[FS03]
Bernd Fischer and Johann Schumann. AutoBayes: a system for generating data analysis programs from statistical models. Journal of Functional Programming, 13:483–508, 2003. [ bib | DOI | .pdf ]
[GFSB03]
Alexander G. Gray, Bernd Fischer, Johann Schumann, and Wray L. Buntine. Automatic derivation of statistical algorithms: The EM family and beyond. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 673–680. MIT Press, 2003. [ bib | .pdf ]
[RP02]
Norman Ramsey and Avi Pfeffer. Stochastic lambda calculus and monads of probability distributions. Proc. of the 29th ACM SIGPLAN-SIGACT Symp. on Principles of Program. Lang., pages 154–165, 2002. [ bib ]
[Pfe01]
Avi Pfeffer. IBAL: A probabilistic rational programming language. In Proc. of the 17th Int. Joint Conf. on Artificial Intelligence, pages 733–740. Morgan Kaufmann Publ., 2001. [ bib ]
[MM99]
Carroll Morgan and Annabelle McIver. pGCL: formal reasoning for random algorithms. South African Comput. J., 22:14–27, 1999. [ bib ]
[JT98]
Achim Jung and Regina Tix. The troublesome probabilistic powerdomain. In Third Workshop on Computation and Approximation, volume 13 of Electronic Notes in Theoretical Computer Science. Elsevier Science Publishers, 1998. [ bib | .html ]
Keywords: domain theory, probability
[Bun98]
Wray L. Buntine. Will domain-specific code synthesis become a silver bullet? IEEE Intelligent Systems, 1998. With discussion by Peter Norvig, Jeffrey Van Baalen, David Spiegelhalter and Andrew Thomas. Guest editor for Trends and Controversies. [ bib | DOI ]
[KMP97]
Daphne Koller, David McAllester, and Avi Pfeffer. Effective Bayesian inference for stochastic programs. In Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pages 740–747, 1997. [ bib | http | .pdf ]
[MMS96]
Carroll Morgan, Annabelle McIver, and Karen Seidel. Probabilistic predicate transformers. ACM Trans. Program. Lang. Syst., 18(3):325–353, 1996. [ bib ]
[Hec94]
Reinhold Heckmann. Probabilistic power domains, information systems, and locales. In Mathematical Foundations of Programming Semantics, Lecture notes in computer science. Springer, 1994. [ bib | http ]
Keywords: domain theory, probability
[JP89]
Claire Jones and Gordon Plotkin. A probabilistic powerdomain of evaluations. In Proc. of the Fourth Ann. Symp. on Logic in Comp. Sci., pages 186–195. IEEE Press, 1989. [ bib | .pdf ]
[Jon89]
Claire Jones. Probabilistic non-determinism. PhD thesis, University of Edinburgh, Edinburgh, Scotland, UK, 1989. [ bib | http ]
Keywords: domain theory, probability
[Plo82]
Gordon Plotkin. Probabilistic powerdomains. In Proceedings CAAP, 1982. [ bib ]
Keywords: domain theory, probability
[Koz81]
Dexter Kozen. Semantics of probabilistic programs. Journal of Computer System Sciences, 22:328–350, 1981. [ bib ]
Keywords: domain theory, randomized algorithms
[SD78]
Nasser Saheb-Djahromi. Probabilistic LCF. In Mathematical Foundations of Comput. Sci., 1978 (Proc. Seventh Sympos., Zakopane, 1978), volume 64 of Lecture Notes in Comput. Sci., pages 442–451. Springer, Berlin, 1978. [ bib ]
[Sol78]
Ray Solomonoff. Complexity-based induction systems: Comparisons and convergence theorems. IEEE Transactions on Information Theory, IT-24:422–432, 1978. [ bib ]
Keywords: inductive inference
[Sol64]
Ray Solomonoff. A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254, 1964. [ bib ]
Keywords: inductive inference
[dMSS56]
Karel de Leeuw, Edward F. Moore, Claude E. Shannon, and Norman Shapiro. Computability by probabilistic machines. In Automata studies, Annals of Math. Studies, no. 34, pages 183–212. Princeton Univ. Press, Princeton, N. J., 1956. [ bib ]