# NIPS*2012 Workshop/Schedule

## DAY 1 - FRIDAY DECEMBER 7

 7:30am Opening Address by Vikash Mansinghka 7:40am Invited talk: Modeling human common sense with probabilistic programsJosh Tenenbaum (MIT) 8:30am Contributed talk: Automated variational inference in probabilistic programmingDavid Wingate and Theo Weber (Lyric Labs) 9:00am Coffee 9:30am A tour through the theoretical foundations of probabilistic programmingDan Roy (Cambridge) 10:00am Poster Spotlight Presentations Probabilistic Theorem Proving: A Unifying Approach for Inference in Probabilistic Programming. Vibhav Gogate and Pedro Domingos. Just-in-time Compilation of MCMC for Probabilistic Programs. Lingfeng Yang, Yi-Ting Yeh, Noah Goodman, and Pat Hanrahan. Lifted Inference for Probabilistic Programming. Wannes Meert, Guy Van den Broeck, Nima Taghipour, Daan Fierens, Hendrik Blockeel, Jesse Davis, and Luc De Raedt. Probabilistic Programming in Practice: Two Case Studies. Avi Pfeffer. Passage: A Parallel Sampler Generator for Hierarchical Bayesian Modeling. Chad Scherrer, Iavor Diatchki, Levent Erkok, Matthew Sottile, and Rob Zinkov. Equational reasoning for conditioning as disintegration. Chung-chieh Shan and Dylan Thurston. Constraints for Probabilistic Logic Programming. Joris Renkens, Daan Fierens, Guy Van den Broeck, Maurice Bruynooghe, and Luc De Raedt. Analysis by inverse stochastic synthesis: Bayesian interpretation of simple 2D scenes. Max Siegel, Tejas Kulkarni, Josh Tenenbaum, and Vikash Mansinghka. Learning Stochastic Inverses for Adaptive Inference in Probabilistic Programs. Andreas Stuhlmueller and Noah Goodman. 10:30am Non-skiers: Informal Session for Discussion, Posters, Demos 12:00pm Group Lunch 3:30pm BREAKOUT DISCUSSIONS and POSTERS 5:00pm Coffee 5.30pm Invited talk: Probabilistic Programming at Microsoft Research Cambridge: Infer.NET for Real-world ApplicationsChris Bishop (Microsoft Research) 6.20pm Announcements and details for Group Dinner

## DAY 2 - SATURDAY DECEMBER 8

 7:30am Invited talk: Open-universe probability models: idea, theory, and applicationsStuart Russell (UC Berkeley) 8:20am Contributed talk: Stan, scalable software for Bayesian modelingMatt Hoffman (Adobe), Bob Carpenter, and Andrew Gelman (Columbia) 8:40am Contributed talk: A short introduction to probabilistic soft logicAngelika Kimmig (KU Leuven), Stephen Bach, Matthias Broecheler, Bert Huang, Lise Getoor (U. Maryland) 9:00am Coffee 9:30am What we gain by representing models using code, illustrated in a new, practical Church system Vikash Mansinghka (MIT) 10:00am Poster Spotlight Presentations Exploiting Conditional Independence for Efficient, Automatic Multicore Inference for Church. Yura Perov and Vikash Mansinghka. Probabilistic Decision Programming with Figaro. Brian Ruttenberg and Avi Pfeffer. Memory efficient Church inference with exchangeable foreign probabilistic procedures and JITing via reduced traces. Jeff Wu and Vikash Mansinghka. Efficiently Sampling Probabilistic Programs via Program Analysis. Arun Chaganty, Aditya Nori, and Sriram Rajamani. ProbLog2: From Probabilistic Programming to Statistical Relational Learning. Joris Renkens, Dimitar Shterionov, Guy Van den Broeck, Jonas Vlasselaer, Daan Fierens, Wannes Meert, Gerda Janssens, Luc De Raedt Compositional Stochastic Modeling and Probabilistic Programming. Eric Mjolsness. Compiling Relational Database Schemata into Probabilistic Graphical Models. Sameer Singh and Thore Graepel. Accelerating Inference: Towards a Full Language, Compiler and Hardware Stack. Jeff Bernstein, Bill Bradley, Shawn Hershey, Ben Vigoda, Theo Weber, Andrew Schweitzer, and Noah Stein. 10:30am Non-skiers: Informal Session for Discussion, Posters, Demos 12:00pm Group Lunch 3:30pm DEMO SESSION and POSTERS Systems being demoed include... Infer.NET (John Bronskill) Venture engine for the Church probabilistic programming language (Vikash Mansinghka) ProbLog (Joris Renkens) Contact us if you would like us to add your system/name to the list. 5:00pm Coffee 5.30pm The flexible mind: probabilistic programming as a substrate for cognitive modelingNoah Goodman (Stanford) 6.00pm Contributed talk: A model-learner pattern for Bayesian reasoningAndy Gordon (Microsoft Research), Mihhail Aizatulin (Open U.), Johannes Borgstroem (Uppsala U.), Guillaume Claret, Thore Graepel, Aditya Nori, Sriram Rajamani, Claudio Russo (Microsoft Research) 6.20pm Contributed talk: Probabilistic computation for information security Piotr Mardziel (U. Maryland) and Kasturi Raghavan (UCLA) 6.40pm Closing Remarks by Noah Goodman

## Abstracts for Talks and Posters

### Modeling human common sense with probabilistic programs

Josh Tenenbaum

Bayesian networks and other families of graphical models have revolutionized not only machine learning and AI, but many other fields of science and engineering, with their combination of general-purpose languages for representing probabilistic structure in the world and general-purpose inference algorithms. Together these two features give a powerful toolkit for reasoning, learning and decision under uncertainty. But it is not enough, simply because in many domains, a graph is too limited a description of the world's causal structure. Probabilistic programs provide a far more expressive language for representing causal structure. I will show how two core domains of human common-sense thought, intuitive physics and intuitive psychology, can be described as inference over probabilistic programs that approximate the causal processes of mechanics and rational planning, respectively. Probabilistic programs can be used to model human judgments with quantitative precision and rigor that until now has only been accessible to psychologists working in much lower-level domains, such as perception and psychophysics, and can also be used to endow robots and AI systems with more human-like competence. There are significant inference challenges that must be solved before these languages can become as widely usable as graphical models have become in the last decade, but showing some of the promise of probabilistic programs for AI and cognitive modeling, I hope to motivate the NIPS community to take on these challenges.

### Automated variational inference in probabilistic programming

David Wingate and Theo Weber

We present a new algorithm for approximate inference in probabilistic programs, based on a stochastic gradient for variational programs. This method is efficient without restrictions on the probabilistic program; it is particularly practical for distributions which are not analytically tractable, including highly structured distributions that arise in probabilistic programs. We show how to automatically derive mean-field probabilistic programs and optimize them, and demonstrate our perspective improves inference efficiency over other algorithms.

### A tour through the theoretical foundations of probabilistic programming

Daniel M. Roy

Over the last several years, our understanding of probabilistic programming has grown substantially. A range of probabilistic programming languages have been developed, and more efficient inference systems have been engineered, finding applications throughout artificial intelligence, cognitive science, nonparametric Bayesian statistics, and elsewhere. These developments raise many fundamental theoretical questions about the limitations and possibilities of such systems. In this abstract we touch on a number of such questions, through our own research contributions and those of other researchers. These foundational issues involve the relationships between sampling and computing probabilities and between efficiency of sampling, conditional independence, and noise. We also discuss how cryptography can be productively viewed as a source of intractable statistical problems, and how certain seemingly intractable problems in the foundations of statistics can be clarified via the computational lens. Finally, we explore how the study of the computability of classic theorems in probability theory has revealed limitations on the ways that distributions on complex structured data, such as sequences, arrays, graphs can be represented.

### Probabilistic Theorem Proving: A Unifying Approach for Inference in Probabilistic Programming

Vibhav Gogate and Pedro Domingos

Inference is the key bottleneck in probabilistic programming. Often, the main advantages of probabilistic programming -- simplicity, modularity, ease-of-use, etc. -- are dwarfed by the complexity and intractability of inference. In this paper, we consider probabilistic theorem proving, a unifying approach for inference in probabilistic logic and extend it for inference in probabilistic programming. Specifically, we propose a new canonical representation, recursive probabilistic knowledge bases (RPKBs), and show that the inference problem in probabilistic programming can be reduced to the lifted weighted model counting problem over RPKBs. We then propose a new algorithm for solving the latter, which is exact when feasible and approximate when exact inference is not possible. Our new algorithm can effectively handle context-specific independence, determinism and complex logical structure, all of which are quite common in real-world probabilistic programs.

### Just-in-time Compilation of MCMC for Probabilistic Programs

Lingfeng Yang, Yi-Ting Yeh, Noah Goodman, and Pat Hanrahan

We present a dynamic compilation technique for increasing the efficiency of Markov Chain Monte Carlo (MCMC) algorithms for inference in models specified via a probabilistic program. Our technique is based on the fact that only some random choices affect the control flow of program execution. We refer to these as \emph{structural} choices, and the other choices as \emph{non-structural}. Partial evaluation of the acceptance probability computation can remove these choices, producing a trace where only a fixed set of non-structural choices are evaluated. We dynamically create these traces as each set of structural choices is encountered at run-time. Because the traces are straight-line code with constants folded, we can apply further optimizations. In particular, we apply an optimization called \emph{incrementalization}: for each random choice made in the trace, we extract the subset of commands in the trace that must be recomputed when the choice changes. In total, this provides just-in-time compilation of the score computation, needed for MCMC updates, into optimized code specialized for each individual proposal. We evaluate this technique on a simple open-universe model and two standard BUGS test models. We achieve orders of magnitude speedup and become competitive with more specialized and less expressive statistical languages such as JAGS.

### Lifted Inference for Probabilistic Programming

Wannes Meert, Guy Van den Broeck, Nima Taghipour, Daan Fierens, Hendrik Blockeel, Jesse Davis, and Luc De Raedt

A probabilistic program often gives rise to a complicated underlying probabilistic model. Performing inference in such a model is challenging. One solution to this problem is lifted inference which improves tractability by exploiting symmetries in the underlying model. Our group is pursuing a lifted approach to inference for probabilistic logic programs.

### Probabilistic Programming in Practice: Two Case Studies

Avi Pfeffer

Probabilistic programming is an exciting technology that promises to make representing and reasoning with rich probabilistic models easier. We present two case studies of application of our probabilistic programming language Figaro to practical problems: supporting intelligence analysts and analyzing malware. Our experiences show that at the current point in time, probabilistic programming is a powerful tool for rapid prototyping of probabilistic systems.

### Passage: A Parallel Sampler Generator for Hierarchical Bayesian Modeling

Chad Scherrer, Iavor Diatchki, Levent Erkok, Matthew Sottile and Rob Zinkov

We introduce Passage, an EDSL (Embedded Domain Specific Language) for hi- erarchical Bayesian modeling. Passage is hosted by the functional programming language Haskell, and inherits Haskell’s infrastructure for rapid code develop- ment and higher-order functions. We expect Passage to be interesting to both casual users interested in Bayesian modeling, and also to researchers in the field who want to explore computational aspects of statistical modeling. In particular, our framework allows easy manipulation of the model and the generated code to explore alternative execution strategies, including targets that can make use of the heavy parallelism as afforded by many modern machines. Passage is open-source software, available for free download.

### Equational reasoning for conditioning as disintegration

Chung-chieh Shan and Dylan Thurston.

Conditional distributions are widely used for practical inference, even when the condition has zero probability. This popularity contrasts with the scary pitfalls (such as Borel's paradox) that beset rigorous treatments of conditioning. In general, conditional expectations may arise that do not correspond to any conditional distribution at all. This gap between theory and practice has made it difficult to automate conditioning calculations soundly. In particular, given a generative model, we want to produce its conditional expectations mechanically, so as to optimize them numerically, simplify them symbolically, and so on.

Disintegration is a rigorous approach to conditioning that covers a wide range of applications yet admits intuitive conditional distributions and allows their guilt-free manipulation'. We mechanize this approach by adding a Lebesgue measure' operation to the usual monadic representation of stochastic experiments. We show how to compute conditional distributions by equating expressions in this representation.

### Constraints for Probabilistic Logic Programming

Daan Fierens, Guy Van den Broeck, Maurice Bruynooghe, Luc De Raedt, and Joris Renkens

In knowledge representation, one commonly distinguishes definitions of predicates from constraints. This distinction is also useful for probabilistic programming and statistical relational learning as it explains the key differences between probabilistic programming languages such as ICL, ProbLog and Prism (which are based on definitions) and statistical relational learning languages such as Markov Logic (based on constraints). This motivates us to extend ProbLog with constraints; the resulting cProbLog in a sense unifies ProbLog and Markov Logic and is strictly more expressive than either of them.

### Analysis by inverse stochastic synthesis: Bayesian interpretation of simple 2D scenes

Max Siegel, Tejas Kulkarni, Josh Tenenbaum, and Vikash Mansinghka

Many vision researchers have tried to formulate visual perception as the inverse problem of graphics - working backwards from pixels to scene descriptions - and cast it as MAP inversion of a generative model of the imaging process. This proposal has usually seemed to intractable to directly implement. Here we explore a stochastic Bayesian formulation of inverse imaging that, for simple scenes, appears to overcome this difficulty. We examine the performance of this approach on image interpretation problems involving collections of simple shapes, such as boxes and letters. Finally, we provide evidence that our approach allows for quick mixing that overcome the essential curse of dimensionality in our model.

### Learning Stochastic Inverses for Adaptive Inference in Probabilistic Programs

Andreas Stuhlmuller and Noah Goodman

We describe an algorithm for adaptive inference in probabilistic programs. During sampling, the algorithm accumulates information about the local probability distributions that compose the program's overall distribution. We use this information to construct targeted samples: given a value for an intermediate expression, we stochastically invert each of the steps giving rise to this value, sampling "backwards" in order to assign values to random choices such that we get a likely parse of the intermediate value. We propose this algorithm for importance sampling and as a means of constructing blocked proposals for a Metropolis-Hastings sampler.

### Probabilistic Programming at Microsoft Research Cambridge: Infer.NET for Real-world Applications

Chris Bishop

The explosive growth of interest in machine learning over the last few years highlights the opportunity for new software tools and techniques to accelerate the development and increase the impact of machine learning applications. In this talk we emphasise the advantages of probabilistic programming as a paradigm for creating bespoke machine learning solutions, including rapid prototyping, transparency of functionality, and the segregation of modelling code from inference code. We also describe the major investment being made by Microsoft Research Cambridge in the development of Infer.NET, and we illustrate the merits of probabilistic programming for creating large-scale applications.

### Open-universe probability models: idea, theory, and applications

Stuart Russell*

The first part of the talk will briefly summarize the historical development and current status of formal languages for open-universe probability models, which allow for uncertain reasoning about unknown objects and events. The second part will focus on an application: global seismic monitoring for the Comprehensive Nuclear-Test-Ban Treaty.

* The speaker is supported by, and this talk is given under the auspices of, the Blaise Pascal International Research Chair, funded by l'Etat et la Region d'Ile de France and administered by the Fondation de l'Ecole Normale Superieure.

### Stan, scalable software for Bayesian modeling

Matt Hoffman, Bob Carpenter, and Andrew Gelman

Stan is a general compiler for Bayesian directed graphical models in the mold of BUGS. Users provide a model definition in a high-level BUGS-like language and specify some variables as data, and Stan then draws samples of the unknown variables from the posterior distribution. Stan differs from previous BUGS-like systems in that it uses the no-U-turn sampler (a tuning-free variant of Hamiltonian/Hybrid Monte Carlo) instead of Gibbs sampling or random-walk Metropolis, allowing for much faster mixing. This talk will discuss the various issues and choices that came up on the journey from the original motivating gripe ("BUGS is too slow") to the current version (and future versions) of Stan.

### A short introduction to probabilistic soft logic

Angelika Kimmig, Stephen Bach, Matthias Broecheler, Bert Huang, Lise Getoor

Probabilistic soft logic (PSL) is a framework for collective, probabilistic reasoning in relational domains. PSL uses first order logic rules as a template language for graphical models over random variables with soft truth values from the interval [0,1]. Inference in this setting is a continuous optimization task, which can be solved efficiently. This paper provides an overview of the PSL language and its techniques for inference and weight learning. An implementation of PSL is available at http://psl.umiacs.umd.edu/.

### What we gain by representing models using code, illustrated in a new, practical Church system

Vikash Mansinghka

Probabilistic programming languages let users represent models and data using executable code rather than math. In this talk I will show results suggesting this is a step forwards, making it easier to build useful machine learning systems and understand their behavior.

I will present short probabilistic programs that solve classic statistics and machine learning problems, as well as new kinds of models based on computer systems such as graphics engines and databases. I will illustrate how the Church language in particular generalizes key mathematical ideas from graphical models --- conditional independence and referential locality --- far beyond their original setting, enabling efficient incremental algorithms and automatic parallelization of inference and learning.

I will also include a live demonstration of Venture, a new, practical, scalable, Turing-universal stochastic inference engine with a web-based interactive modeling environment.

This talk is based on joint work with Yura Perov, Jeff Wu, Max Siegel and Tejas Kulkarni.

### Exploiting Conditional Independence for Efficient, Automatic Multicore Inference for Church

Yura Perov and Vikash Mansinghka

We introduce new architectural techniques that enable e fficient, locality-sensitive, incremental, parallel inference in Church. Specifically, we introduce (1) envelopes, which generalize the notion of Markov blankets to the setting of Church inference; (2) envelope-based computation trace updates, that ensure each Markov chain iteration has predictable complexity and improved asymptotics over "lightweight" engines; and (3) conservative locking and aggressive scheduling techniques which exploit envelopes to automatically parallelize inference using multiple cores.

Our approach leverages the concise representation of conditional independencies a orded by the functional skeleton of Church and embodied in the dynamic structure of a Church computation trace. We measure the performance of a prototype Clojure-based Church interpreter using this architecture, and show that our approach delivers significant speedups and parallel e fficiency on HMMs, lattice Isings, and probabilistic topic models over 50,000 word instance corpora.

We also identify new questions about Markov chain convergence and automatic load balancing for stochastic recursions exposed by our approach.

### Probabilistic Decision Programming with Figaro

Brian Ruttenberg and Avi Pfeffer

Decision networks are models of probabilistic decision making processes, where the intent of such modeling is to determine an optimal strategy that maximizes the expected utility of the decision maker. Similar to other graphical models, building large decision networks with traditional programming languages is prohibitive. Hence, there are many benefits to combining the rich representational power of probabilistic programming with decision making. Several existing probabilistic languages have integrated decision making capabilities; however, they are generally limited in their expressive power.

In this work, we extend Figaro, an object-oriented, probabilistic programming language, with decision making capabilities. Decisions are represented in probabilistic models by extending Figaro data structures, allowing the use of existing Figaro inference algorithms to compute optimal strategies. Strategy optimization using discrete and continuous chance variables is supported, as well as optimization over multiple decisions using backward induction. In addition, strategy optimization with chance variables of arbitrary classes is supported (such as graphs), using a nearest neighbor algorithm to compute optimal strategies from previously generated inference samples. These new features add a powerful capability to the Figaro language and allow a user to quickly build complex probabilistic decision networks.

### Memory efficient Church inference with exchangeable foreign probabilistic procedures and JITing via reduced traces

Jeff Wu and Vikash Mansinghka

We introduce a new Church architecture with several unusual features. First, it uses reduced traces that only store the entropy and dynamic dependence structure, not the full computation traces used by other efficient incremental engines. This means our engine only requires memory that scales with the entropy cost of the underlying model and data, not the entire runtime cost, combining the space efficiency of lightweight engines and the efficiency of trace-based ones. Second, it supports a fully general exchangeable random procedure interface for interfacing foreign forward-simulation and inference code with an interactive Church interpreter. Third, it is written in RPython, which supports automatic generation of trace-based JITting on a per-proposal basis, and delivers 100x+ speedups over previous Church engines on HMMs, topic models (collapsed and uncollapsed), Dirichlet process mixtures, and several other statistical benchmarks.

### Efficiently Sampling Probabilistic Programs via Program Analysis

Arun Chaganty, Aditya Nori, and Sriram Rajamani

Probabilistic programs are intuitive and succinct representations of complex probability distributions. A natural approach to performing inference over these programs is to execute them and compute statistics over the resulting samples. Indeed, this approach has been taken before in a number of probabilistic programming tools. In this work, we address two key challenges of this paradigm: (i) ensuring samples are well distributed in the combinatorial space of the program, and (ii) efficiently generating samples with minimal rejection. We present a new sampling algorithm ESP that addresses these challenges using concepts from the field of program analysis.

### ProbLog2: From Probabilistic Programming to Statistical Relational Learning

Joris Renkens, Dimitar Shterionov, Guy Van den Broeck, Jonas Vlasselaer, Daan Fierens, Wannes Meert, Gerda Janssens, Luc De Raedt

ProbLog is a probabilistic programming language based on Prolog. The new ProbLog system---called ProbLog2---can solve a range of inference and learning tasks typical for the Probabilistic Graphical Models (PGM) and Statistical Relational Learning (SRL) communities. The main mechanism behind ProbLog2 is a conversion of the given program to a weighted Boolean formula. We argue that this conversion approach can also be applied---with certain restrictions---to other probabilistic programming languages such as Church and Figaro.

### Compositional Stochastic Modeling and Probabilistic Programming

Eric Mjolsness

Probabilistic programming is related to a compositional approach to stochastic modeling by switching from discrete to continuous time dynamics. In continuous time, an operator-algebra semantics is available in which processes proceeding in parallel (and possibly interacting) have summed time-evolution operators. From this foundation, algorithms for simulation, inference and model reduction may be systematically derived. The useful consequences are potentially far-reaching in computational science, machine learning and beyond. Hybrid compositional stochastic modeling/probabilistic programming approaches may also be possible.

### Compiling Relational Database Schemata into Probabilistic Graphical Models

Sameer Singh and Thore Graepel

Instead of requiring a domain expert to specify the probabilistic dependencies of the data, in this work we present an approach that uses the relational DB schema to automatically construct a Bayesian graphical model for a database. This resulting model contains customized distributions for columns, latent variables that cluster the data, and factors that reflect and represent the foreign key links. Experiments demonstrate the accuracy of the model and the scalability of inference on synthetic and real-world data.

### Accelerating Inference: Towards a Full Language, Compiler and Hardware Stack

Jeff Bernstein, Bill Bradley, Shawn Hershey, Ben Vigoda, Theo Weber, Andrew Schweitzer, and Noah Stein

We introduce Dimple, a fully open-source API for probabilistic modeling. Dimple allows the user to specify probabilistic models in the form of graphical models, bayesian networks, or factor graphs, and performs inference (by automatically deriving an inference engine from a variety of algorithms) on the model. Dimple also serves as a compiler for GP5, a hardware accelerator for inference.

### The flexible mind: probabilistic programming as a substrate for cognitive modeling

Noah Goodman

How can we describe rich commonsense knowledge and the flexible inferences that people draw using this knowledge? I will argue that probabilistic programs offer expressivity that is needed to capture human cognition. However, I will argue that human cognition requires high-level probabilistic programming languages. With examples drawn from the psychology of reasoning and language I will show the utility of function abstraction, memoization, lexically scoped query, and meta-syntax (eval). I will close with some thoughts about what psychology suggests about inference algorithms for probabilistic programs.

### A model-learner pattern for Bayesian reasoning

Andy Gordon, Mihhail Aizatulin, Johannes Borgstroem, Guillaume Claret, Thore Graepel, Aditya Nori, Sriram Rajamani, Claudio Russo

We propose a new pattern of probabilistic programming based on models and learners. A model consists of a pair of probabilistic expressions for the prior and sampling distributions of a Bayesian model, together with a hyperparameter. A learner derives from a model and an inference algorithm, and may be trained on data so as to compute posterior and predictive distributions. We describe a range of different models and learners, and also combinators for generic idioms such as model testing, mixture models, and evidence-based model averaging.

### Probabilistic computation for information security

Piotr Mardziel and Kasturi Raghavan

Probabilistic computation is a convenient means of mechanically reasoning about a variety of information security problems. At its core, information security concerns itself with measuring or limiting the knowledge an adversary might attain from interacting with a protected system. Probabilistic inference lets one compute this knowledge explicitly as long as the interaction can be described as a program in a suitable probabilistic language that supports conditioning. Security concerns, however, require soundness guarantees on probabilistic inference which are generally not present in machine learning applications. We summarize some recent work on probabilistic computing for information security and highlight challenging aspects that still need to be addressed.