Graphical Models
23 Jan 2012 10:53
[Update, 12 March 2010: On re-reading I am less than happy with this, because I have come appreciate the uses of graphical models in non-causal modeling more, and this slights them unnecessarily. I will try to re-write this soon.]
A.k.a. causal models, causal graphs, Bayes graphs, Bayes networks, Bayesian networks. (Here "Bayes" is a metonym for "conditional probability". There are perfectly good frequentist interpretations of these models.) I'm sticking latent-variable and path-analysis models in here, too, because they all pretty much work the same way.
Everyone who takes basic statistics has it drilled into them that "correlation is not causation." (When I took psych. 1, the professor said he hoped that, if he were to come to us on our death-beds and prompt us with "Correlation is," we would all respond "not causation.") This is a problem, because one can infer correlation from data, and would like to be able to make inferences about causation. There are typically two ways out of this. One is to perform an experiment, preferably a randomized double-blind experiment, to eliminate accidental sources of correlation, common causes, etc. That's nice when you can do it, but impossible with supernovae, and not even easy with people. The other out is to look for correlations, say that of course they don't equal causations, and then act as if they did anyway. The technical names for this latter course of action are "linear regression" and "analysis of variance," and they form the core of applied quantitative social science, e.g., The Bell Curve.
Graphical models are, in part, a way of escaping from this impasse.
The basic idea is as follows. You have a bunch of variables, and you want to represent the causal relationships, or at least the probabilistic dependencies, between them. You do so by means of a graph. Each node in the graph stands for a variable. If variable A is a cause of B, then an arrow runs from A to B. If A is a cause of B, we also say that A is one of B's parents, and B one of A's children. If there is a causal path from A to B, then A is an ancestor of B, and B is a descendant of A. If a variable has no parents in the graph, it is exogenous, otherwise it is endogenous.
Part of what we mean by "cause" is that, when we know the immediate causes, the remoter causes are irrelevant --- given the parents, remoter ancestors don't matter. The standard example is that applying a flame to a piece of cotton will cause it to burn, whether the flame came from a match, spark, lighter or what-not. Probabilistically, this is a conditional indepedence property, or a Markov property: a variable is independent of its ancestors conditional on its parents. In fact, given its parents, its children, and its childrens' other parents, a variable is conditionally independent of all other variables. This is called the graphical or causal Markov property. When this holds, we can factor the joint probability distribution for all the variables into the product of the distribution of the exogenous variables, and the conditional distribution for each endogenous variable given its parents.
(You may be wondering what happens if A is a parent of B and B is a parent of A, as can happen when there is feedback between the variables. This leads to difficulties, traditionally dealt with by explicitly limiting the discussion to acyclic graphs. I shall follow this wise precedent here.)
Now, there are certain rules which let us infer conditional independence relations from each other. For instance, if X is independent of the combination of Y and W, given Z, then X is indepdent of Y alone given Z. So, if we have a graph which obeys the causal Markov condition, there are generally other conditional independence relations which follow from the basic ones. If these are the only conditional indepences which hold in the distribution, it is said to be faithful to the graph (or vice versa); otherwise it is unfaithful. For a graph to be Markov and unfaithful, there must (as it were) be an elaborate conspiracy among the conditional distributions, so elaborate that it will generally be destroyed by any change in any of those distributions. So faithfulness is a robust property.
This may sound pretty arcane, but that's just because it is arcane. The point, however, is that if you can make the three assumptions above (no causal cycles, Markov property, faithfulness), you're in business in a really remarkable way. There are very powerful statistical techniques that will let you infer the causal structure connecting your variables. This comes in two flavors. One is the Bayesian way: cook up a prior distribution over all possible causal graphs; compute the likelihood of the data under each graph; update your distribution over graphs; iterate. This is generally computationally intractable, assuming you can come up with a meaningful prior in the first place. The other approach is to use tests for conditional independence to eliminate possible connections between variables, and so to narrow down the range of candidate structures; it is basically frequentist, and can be shown, under a broad range of circumstances, to be asymptotically reliable.
Once you have your causal graph --- whether through estimation or through simply being handed one --- you can do lots of great things with it, like predict the effects of manipulating some of the variables, or make backward inferences from effects to causes. Of course, if the graph is big, doing the necessary calculations can be very troublesome in itself, and so people work on approximation methods and even ways of doing statistical inference on models of statistical distributions...
It's probably obvious I think this is incredibly neat, and even one of the most important ideas to come out of machine learning. Of course it doesn't really solve the problem of establishing causal relations, in the way Hume objected to; it says, assuming there are causal relations, of a certain stochastic form, and that these are stable, then they can be learned. But that, and the more general questions of what we ought to mean by "cause", deserve a notebook of their own.
Things I want to understand better: frequentist inference procedures. Computational learning theory for graphical models (the paper by Janzing and Herrmann is good). How to treat systems with feedback? How to treat dynamical systems and time series? How does all of this fit together with computational mechanics?
Not even a conjecture. Back in the 1960s, Chow and Liu (reference below) gave a polynomial algorithm for finding the best approximation to a global joint probability distribution using only pairwise interactions among the variables, i.e., the one which minimized the Kullback-Leibler divergence between the true and the approximating distribution. I have read that extending this to even three-way interactions is NP, though I don't know if it's NP-complete. (1) How is the intractability result established? (2) Is this the same as the computational phase transition one finds in going from 2-SAT to 3-SAT, where the critical point is at two-point-something SAT? (Presumably the answer to (1) would shed some light on this.) (3) Even if not, is there an analogous phase transition, perhaps in a different universality class? (Update in 2009, several years later: Bento and Montanari, below, sounds relevant, but I haven't read it yet.)
- Recommended, more general:
- Clark Glymour, The Mind's Arrows: Bayes Nets and Graphical Causal Models in Psychology [Mini-review]
- Michael Irwin Jordan (ed.), Learning in Graphical Models
- Jordan and Sejnowski (eds.), Graphical Models [Nice collection of papers from Neural Computation]
- Judea Pearl
- "Causal Inference in Statistics: An Overview", forthcoming in Statistics Surveys 3 (2009): 96--146 [PDF]
- Causality: Models, Reasoning and Inference
- Peter Spirtes, Clark Glymour and Richard Scheines, Causation, Prediction, and Search [Comments]
- Recommended, more specialized:
- C. K. Chow and C. N. Liu, "Approximating Discrete Probability Distributions with Dependence Trees", IEEE Transactions on Information Theory 14 (1968): 462--467 [An old but very nice result on how to get the optimal approximation to a global probability distribution using only pairwise interactions among the variables]
- Finale Doshe-Velez, David Wingate, Joshua Tenenbaum and Nicholas Roy, "Infinite Dynamic Bayesian Networks", ICML 2011 [PDF]
- Ghahramani, "Learning Dynamic Bayesian Networks," in Giles and Gori (eds.), Adaptive Processing of Sequences and Data Structures
- Ghahramani and Jordan, "Factorial Hidden Markov Models," Machine Learning 29 (1997): 245--273
- Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", Journal of Machine Learning Research 11 (2010): 1709--1731
- Dominik Janzing and Daniel J. L. Herrmann, "Reliable and Efficient Inference of Bayesian Networks from Sparse Data by Statistical Learning Theory", cs.LG/0309015
- Steffen Lauritzen, Graphical Models [A fairly abstract probabilistic/mathematical-statistical treatment; elegant, if you're into such things.]
- Han Liu, John Lafferty and Larry Wasserman, "The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs", Journal of Machine Learning Research 10 (2009): 2295--2328 = arxiv:0903.0649
- John C. Loehlin, Latent Variable Models: An Introduction to Factor, Path, and Structural Analysis [An intro. to old-school linear latent-variable models, especially of the sort used by psychologists. Good in its own domain, but does not make enough contact with modern graphical models.]
- Eric Mjolsness, "Stochastic Process Semantics for Dynamical Grammar Syntax: An Overview", cs.AI/0511073
- Christopher J. Quinn, Todd P. Coleman, Negar Kiyavash, "Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes", arxiv:1101.5108
- Thomas Richardson
- "A Discovery Algorithm for Directed Cyclic Graphs", UAI 1996 [Short version without details]
- "A Discovery Algorithm for Directed Cyclic Graphs", Technical Report 68, CMU department of philosophy (postscript) [Long version with details]
- Pawel Wocjan, Dominik Janzing, and Thomas Beth, "Required sample size for learning sparse Bayesian networks with many variables," cs.LG/0204052
- To read:
- Edoardo M. Airoldi, "Getting started in probabilistic graphical models", arxiv:0706.2040 [Tutorial aimed at biologists]
- Animashree Anandkumar, Kamalika Chaudhuri, Daniel Hsu, Sham M. Kakade, Le Song, Tong Zhang, "Spectral Methods for Learning Multivariate Latent Tree Structure", arxiv:1107.1283 [This sounds very much like Spearman's "tetrad equations" from 100 years ago!]
- Animashree Anandkumar, Vincent Y.F. Tan, Alan. S. Willsky, "High-Dimensional Gaussian Graphical Model Selection: Tractable Graph Families", arxiv:1107.1270
- Erik Aurell, Magnus Ekeberg, "Inverse Ising inference using all the data", arxiv:1107.3536 ["We show that a method based on logistic regression, using all the data, solves the inverse Ising problem far better than mean-field calculations relying only on sample pairwise correlation functions, while still computationally feasible for numbers of nodes in the range of hundreds. The largest improvement in reconstruction occurs for strong interactions..."]
- Francis R. Bach and Michael I. Jordan, "Learning Graphical Models for Stationary Time Series", UCB Statistics Tech. Rep. 650
- Onureena Banerjee, Laurent El Ghaoui, Alexandre d'Aspremont, "Model Selection Through Sparse Maximum Likelihood Estimation", arxiv:0707.0704
- Jose Bento, Morteza Ibrahimi and Andrea Montanari, "Learning Networks of Stochastic Differential Equations", NIPS 23 (2010) [PDF]
- Jose Bento, Andrea Montanari, "Which graphical models are difficult to learn?", arxiv:0910.5761
- Danny Bickson, Carlos Guestrin, "Linear Characteristic Graphical Models: Representation, Inference and Applications", arxiv:1008.5325
- David Brillinger, "Remarks Concerning Graphical Models for Time Series and Point Processes," Revista de Econometria 16 (1996): 1--23 [PS]
- Clive G. Bowsher, "Stochastic kinetic models: Dynamic independence, modularity and graphs", Annals of Statistics 38 (2010): 2242--2281
- Matthias Brocheler and Lise Getoor, "Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning", NIPS 23 (2010) [PDF]
- Elias Chaibub Neto, Mark P. Keller, Alan D. Attie, and Brian S. Yandell, "Causal graphical models in systems genetics: A unified framework for joint inference of causal network and genetic architecture for correlated phenotypes", Annals of Applied Statistics 4 (2010): 320--339
- Venkat Chandrasekaran, Pablo A. Parrilo, Alan S. Willsky, "Latent Variable Graphical Model Selection via Convex Optimization", arxiv:1008.1290
- Michael Chertkov and Vladimir Y. Chernyak
- "Loop calculus in statistical physics and information science", Physical Review E 73 (2006): 065102 = cond-mat/0601487
- "Loop series for discrete statistical models on graphs", cond-mat/0603189
- David Maxwell Chickering, "Optimal Structure Identification With Greedy Search," Journal of Machine Learning Research 3 (2002): 507--554
- Diego Colombo, Marloes H. Maathuis, Markus Kalisch, Thomas S. Richardson, "Learning high-dimensional directed acyclic graphs with latent and selection variables", arxiv:1104.5617
- Robert G. Cowell, A. Philip Dawid, Steffen L. Lauritzen and David J. Spiegelhalter, Probabilistic Networks and Expert Systems
- David Cox and Nanny Warmuth, Multivariate Dependcencies: Models, Analysis, and Interpretation
- Mihai Cucuringu, Jesus Puente, David Shue, "Model Selection in Undirected Graphical Models with the Elastic Net", arxiv:1111.0559
- Rainer Dahlhaus, "Graphical interaction models for multivariate time series," Metrika 51 (2000): 157--172
- Gabriel C. G. de Abreu, Rodrigo Labouriau, David Edwards, "High-dimensional Graphical Model Search with gRapHD R Package", Journal of Statistical Software 37 (2010), arxiv:0909.1234
- Luis M. de Campos, "A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests", Journal of Machine Learning Research 7 (2006): 2149--2187 [Sounds damn cool]
- Amir Dembo and Andrea Montanari, "Gibbs Measures and Phase Transitions on Sparse Random Graphs", arxiv:0910.5460
- Amir Dembo, Andrea Montanari, Nike Sun, "Factor models on locally tree-like graphs", arxiv:1110.4821
- Pedro Domingos and Daniel Lowd, Markov Logic: An Interface Layer for Artificial Intelligence
- Vanessa Didelez, "Graphical models for marked point processes based on local independence", arxiv:0710.5874
- Mathias Drton, Rina Foygel, and Seth Sullivant, "Global identifiability of linear structural equation models", Annals of Statistics 39 (2011): 865--886
- Michael Eichler
- "Graphical modelling of multivariate time series", math.ST/0610654
- "A note on global Markov properties for mixed graphs", arxiv:1107.3036
- Seif Eldawlatly, Yang Zhou, Rong Jin and Karim G. Oweiss, "On the Use of Dynamic Bayesian Networks in Reconstructing Functional Neuronal Networks from Spike Train Ensembles", Neural Computation 22 (2010): 158--189
- Gal Elidan, Iftach Nachman and Nir Friedman, "'Ideal Parent' Structure Learning for Continuous Variable Bayesian Networks", Journal of Machine Learning Research 8 (2007): 1799--1833
- Sergi Elizalde and Kevin Woods, "Bounds on the number of inference functions of a graphical model", math.CO/0610233
- Juan Ferrándiz, Enrique F. Castillo and Pilar Sanmartin, "Temporal aggregation in chain graph models", Journal of Statistical Planning and Inference 133 (2005): 69--93
- G. David Forney, Jr., Pascal O. Vontobel, "Partition Functions of Normal Factor Graphs", arxiv:1102.0316
- Freedman, "On Specifying Graphical Models for Causation," UCB Stat. Tech. Rep. 601 [abstract, pdf]
- Frey, Graphical Models in Machine Learning and Data Communication
- Roland Fried and Vanessa Didelez, "Latent variable analysis and partial correlation graphs for multivariate time series", Statistics and Probability Letters 73 (2005): 287--296
- Cyril Furtlehner, Jean-Marc Lasgouttes, Arnaud De La Fortelle, "Belief Propagation and Bethe approximation for Traffic Prediction", physics/0703159
- Dan Geiger, David Heckerman, Henry King, and Christopher Meek, "Stratified exponential families: Graphical models and model selection", Annals of Statistics 29 (2001): 505--529
- Christophe Giraud, "Estimation of Gaussian graphs by model selection", arxiv:0710.2044
- Glymour and Cooper (eds.), Computation, Causation and Discovery
- Vibhav Gogate, William Austin Webb, and Pedro Domingos, "Learning Efficient Markov Networks", NIPS 23 (2010) [PDF]
- Jorge Goncalves and Sean Warnick, "Dynamical Structure Functions for the Estimation of LTI Networks with Limited Information", q-bio.MN/0610008 [LTI = "linear, time-invariant"]
- Green, Hjort and Richardson (eds.), Highly Structured Stochastic Systems
- Vikas Hamine and Paul Helman, "Learning Optimal Augmented Bayes Networks", cs.LG/0509055
- Patrick L. Harrington, Jr., and Alfred O. Hero III, "Spatio-Temporal Graphical Model Selection", arxiv:1004.2304
- Alain Hauser, Peter Bühlmann, "Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs", arxiv:1104.2808
- Raymond Hemmecke, Silvia Lindner, Milan Studeny, "Learning restricted Bayesian network structures", arxiv:1011.6664
- Holger Höfling and Robert Tibshirani, "Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods", Journal of Machine Learning Research 10 (2009): 883--906
- Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, "Stochastic Reasoning, Free Energy, and Information Geometry", Neural Computation 16 (2004): 1779--1810
- Manfred Jaeger & co., Primula [Java implementation of a modeling language for relational Bayesian networks; released under GPL]
- Ali Jalali, Chris Johnson, Pradeep Ravikumar, "On Learning Discrete Graphical Models Using Greedy Methods", arxiv:1107.3258
- Markus Kalisch and Peter Bühlmnann
- "Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm", Journal of Machine Learning Research 8 (2007): 616--636
- "Robustification of the PC-Algorithm for Directed Acyclic Graphs", Journal of Computational and Graphical Statistics 17 (2008): 773--789
- Mladen Kolar, Eric P. Xing, "Estimating Networks With Jumps", arxiv:1012.3795
- Nicole Kraemer, Juliane Schaefer, Anne-Laure Boulesteix, "Regularized estimation of large-sacle gene association networks using graphical Gaussian models", arxiv:0905.0603
- John Lafferty, Han Liu, Larry Wasserman, "Sparse Nonparametric Graphical Models", arxiv:1201.0794
- Sanjiang Li, "Causal models have no complete axiomatic characterization", arxiv:0804.2401
- Shai Litvak and Shimon Ullman, "Cortical Circuitry Implementing Graphical Models", Neural Computation 21 (2009): 3010--3056 [or rather, models of cortical circuitry implementing graphical models]
- Han Liu, Xi Chen, John Lafferty, Larry Wasserman, "Graph-Valued Regression", arxiv:1006.3972
- Han Liu, John Lafferty and Larry Wasserman, "Tree Density Estimation", arxiv:1001.1557
- Han Liu, Kathryn Roeder, Larry Wasserman, "Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models", arxiv:1006.3316 == NIPS 23 (2010)
- Han Liu, Min Xu, Haijie Gu, Anupam Gupta, John Lafferty, Larry Wasserman, "Forest Density Estimation", Journal of Machine Learning Research 12 (2011): 907--951
- Stephen Luttrell, "Adaptive Cluster Expansion (ACE): A Hierarchical Bayesian Network", cs.NE/0410020
- Marloes H. Maathuis, Markus Kalisch, Peter Bühlmann, "Estimating high-dimensional intervention effects from observational data", Annals of Statistics 37 (2009): 3133--31654, arxiv:0810.4214
- Dhafer Malouche and Sylvie Sevestre-Ghalila, "Estimating High dimensional faithful Gaussian graphical Models : uPC-algorithm", arxiv:0705.1613
- Giovanni M. Marchetti, Nanny Wermuth, "Matrix representations and independencies in directed acyclic graphs", arxiv:0904.0333
- Lilyana Mihalkova, Lise Getoor, "Lifted Graphical Models: A Survey", arxiv:1107.4966
- Eric Mjolsness, "Labeled graph notations for graphical models", UCI School of Information and Computer science Technical Report 04-03 [PDF]
- Andrea Montanari, "Graphical Models Concepts in Compressed Sensing", arxiv:1011.4328
- Jennifer Neville and David Jensen, "Relational Dependency Networks", Journal of Machine Learning Research 8 (2007): 653--692
- Lior Pachter and Bernd Sturmfels
- "Tropical Geometry of Statistical Models", q-bio.QM/0311009
- "Parametric Inference for Biological Sequence Analysis", q-bio.GN/0401033
- Alessandro Pelizzola, "Cluster variation method in statistical physics and probabilistic graphical models", Journal of Physics A: Mathematical and General 38 (2005): R309--R339 = cond-mat/0508216
- Jose M. Pena, "Reading Dependencies from Covariance Graphs", arxiv:1010.4504
- Tapani Raiko, Harri Valpola, Markus Harva and Juha Karhunen, "Building Blocks for Variational Bayesian Learning of Latent Variable Models", Journal of Machine Learning Research 8 (2007): 155--201
- Johannes Rauh, Nihat Ay, "Robustness and Conditional Independence Ideals", arxiv:1110.1338
- Pradeep Ravikumar, Martin J. Wainwright, and John D. Lafferty, "High-dimensional Ising model selection using $\ell_1$-regularized logistic regression", Annals of Statistics 38 (2010): 1287--1319, arxiv:0804.4202, also arxiv:1010.0311
- Marco Reale, A Graphical Modelling Approach to Time Series, Ph.D. thesis, Lancaster University, 1998 [Reale's website]
- Marco Reale and Granville Tunnicliffe Wilson
- "Identification of vector AR models with recursive structural errors using conditional independence graphs"
- "The Sampling Properties of Conditional Independence Graphs for Structural Vector Autoregressions"
- T. Rizzo, B. Wemmenhove, H.J. Kappen, "On Cavity Approximations for Graphical Models", cond-mat/0608312
- Joshua W. Robinson, Alexander J. Hartemink, "Learning Non-Stationary Dynamic Bayesian Networks", Journal of Machine Learning Research 11 (2010): 3647--3680
- Philipp Rütimann and Peter Bühlmann, "High dimensional sparse covariance estimation via directed acyclic graphs", arxiv:0911.2375 = Electronic Journal of Statistics 3 (2009): 1133--1160
- Kayvan Sadeghi, Steffen L. Lauritzen, "Markov Properties for Loopless Mixed Graphs", arxiv:1109.5909
- Federico Schlüter, "A survey on independence-based Markov networks learning", arxiv:1108.2283
- Marco Scutari
- "Learning Bayesian Networks with the bnlearn Package", arxiv:0908.3817
- "Measures of Variability for Bayesian Network Graphical Structures", arxiv:1005.4214
- Bill Shipley, Cause and Correlation in Biology: A User's Guide to Path Analysis, Structural Equations and Causal Inference
- Tomi Silander, Teemu Roos, Petri Myllymaki, "Locally Minimax Optimal Predictive Modeling with Bayesian Networks", Journal of Machine Learning Research Workshop and Conference Proceedings 5 (AISTATS 2009): 504--511
- Milan Studeny, Probabilistic Conditional Independence Structures ["The main topic of the monograph is a non-graphical algebraic method for describing probabilistic CI structures. However, one of the first two chapters in the book recalls and gathers basic mathematical tools for study of probabilistic conditional independence (CI) and the other one is a sketchy overview of recent advanced graphical approaches to the desciption of CI structures. The next four chapters develop the non-graphical method. The last standard chapter is an attempt to apply the method in practice: it is devoted to learning Bayesian nets and it is more mathematical (and 'philosophical') revision of some methods for learning Bayesian networks. The main aim of that chapter is to indicate that an algebraic approach can also be applied in this area."]
- Charles Sutton, Andrew McCallum, "An Introduction to Conditional Random Fields", arxiv:1011.4088
- Charles Sutton, Andrew McCallum and Khashayar Rohanimanesh, "Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data", Journal of Machine Learning Research 8 (2007): 693--723
- Vincent Y. F. Tan, Animashree Anandkumar, Lang Tong and Alan S. Willsky, "A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures", IEEE Transactions on Information Theory 57 (2011): 1714--1735, arxiv:0905.0940 [Large deviations for Chow-Liu trees]
- Robert E. Tillman, Arthur Gretton and Peter Spirtes, "Nonlinear directed acyclic structure learning with weakly additive noise models" [Thanks to Prof. Spirtes for a preprint]
- Achim Tresch, Florian Markowetz, "Structure Learning in Nested Effects Models", 0710.4481
- Vivian Viallon, Onureena Banerjee, Gregoire Rey, Eric Jougla, Joel Coste, "An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates", arxiv:1004.2287
- Martin J. Wainwright and Michael I. Jordan, "Graphical Models, Exponential Families, and Variational Inference", Foundations and Trends in Machine Learning 1 (2008): 1--305 [PDF reprint via Prof. Jordan]
- Nanny Wermuth, "Probability distributions with summary graph structure", Bernoulli 17 (2011): 845--879, arxiv:1003.3259
- Xianchao Xie, Zhi Geng, "A Recursive Method for Structural Learning of Directed Acyclic Graphs", Journal of Machine Learning Research 9 (2008): 459--483
- Gui-Bo Ye, Yuanfeng Wang, Yifei Chen, Xiaohui Xie, "Efficient Latent Variable Graphical Model Selection via Split Bregman Method", arxiv:1110.3076
- Jonathan S. Yedidia, William T. Freeman and Yair Weiss, "Understanding Belief Propagation and its Generalizations", Mitsubshi Electric Research Laboratories Tech. Rep. 2001-22
- Marco Zaffalon and Marcus Hutter, "Robust Inference of Trees", cs.LG/0511087
- Shuheng Zhou, John Lafferty, Larry Wasserman, "Time Varying Undirected Graphs", arxiv:0802.2758
- Piotr Zwiernik, "An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models", Journal of Machine Learning Research 12 (2011): 3283--3310 [where "general Markov models" == "binary graphical tree models where all the inner nodes of a tree represent binary hidden variables"]
(Thanks to Gustavo Lacerda for pointing out a goof.)
