This document was prepared by Seth Sullivant.
Problems suggested during talks
Persi Diaconis
Fixed First-Order Summaries.
The basic question is to try to understand how to understand ``permutation data.'' Suppose that voters rank statements in order of preference. The data from such a survey consists of a function
from the symmetric group on letters to he nonnegative integers. Note that The first order summary of is a table of nonnegative integers. The entry of , is equal to the number of people who ranked candidate in position . If we assume no higher-order interactions between rankings, this describes a log-linear model.
In a purely mathematical language: describe a generating set for the toric ideal given by the matrix whose columns are the permutation matrices.
Contingency tables with quadratic statistics.
For tables, the matrix that computes the sufficient statistics (i.e. the matrix which represents the log-linear model) is the following:
Polyhedral Cones and Testing.
Observe
Test against with fixed.
or
There should be a nice interaction between statistics and polyhedral combinatorics in this problem.
Compare ``Competing'' Techniques.
How do these different techniques for estimating -values relate to each other?
With respect to sequential importance sampling: what are possible multi-way generalizations of the Gale-Ryser theorem which gives necessary and sufficient conditions for the existence of a matrix with fixed row and column sums?
Stephen Fienberg
Questions about Odds-Ratios.
: does it makes sense to fix values of two of the odds ratios and and look at the locus of possible values for the third odds-ratio ?
Describe the curve (in the probability simplex) obtained by intersecting two of these quadric surfaces.
Existence of Maximum Likelihood Estimates.
As Alessandro Rinaldo, later pointed out in his talk, this is equivalent to giving a combinatorial description of the facets of the cone of all feasible margins. This polyhedral cone is obtained by taking of the positive hull of the columns of the matrices which appear in the working questions on graphical models by Chris Meek and Serkan Hosten.
Later in the week, Nicholas Eriksson showed that this conjecture is false already for tables.
Mathias Drton asked "How can we incorporate structural zeroes into this problem?"
Stephen Roehrig asked "Is it possible to relate the MLE solutions for decomposable models to decomposable network flow problems?"
Akimichi Takemura
Bernd Sturmfels asked, ``What about problems for 3-way tables that have structural zeros coming from a very regular design?"
Henry Wynn asked, ``Does the group structure of circuits play a role in the solution of this problem?"
Emily Gamundi
Ruriko Yoshida
Ruriko Yoshida discussed short encodings of the set of points in polytopes by means of rational generating functions, and the applications of rational generating functions for giving short encodings of Markov bases.
Ian Dinwoodie asked, ``Is there a way to extract (random) monomials from the generating functions?"
Akimichi Takemura asked, ``Is there a way to extract only the absolutely essential Markov basis elements that are needed to connect a given fiber of the form ?
Ruriko answered,``We are still attempting to solve these problems."
Henry Wynn asked, `` Can these techniques be used to approximate the number of integral points inside an arbitrary convex set?''
Mathias Drton
Mathias Drton discussed multimodality of the likelihood function for the model of seemingly unrelated regression (SUR). These can be thought of as conditional independence models induced by the chain graph with two sets of nodes, and and directed edges from and undirected edges between any pair .
Akimichi Takemura asked,``As the number of samples grows, how quickly does the likelihood converge to a unimodal function in the bivariate SUR model?''
Aleksandra Slavkovic
Aleksandra Slavkovic spoke about disclosure limitation problems associated with the releasse of marginals and conditionals. Since fixing conditionals is a linear constraint, the theory of Markov bases can be applied to the problem of deciding how many tables have the given fixed conditionals.
Jon Forster asked, ``Is it safe to release full conditionals that have been perturbed?"
Aleksandra replied, ``Probably not.''
Luis Garcia
Luis Garcia spoke about the algebraic structure of the probability distributions which arise from graphical models with hidden variables.
Chris Meek asked, ``What is the algebraic characterization of the Verma constraints? (non-independence constraints)''
Mathias Drton asked, ``Are these the same constraints described by Richardson and Spirtes (Annals of Statistics 2002)?''
Henry Wynn asked, ``How do inequalities play a role in this problem?''
Henry Wynn
Henry Wynn spoke about formulae relating cumulants to moments.
Russell Steele
Russell Steele spoke about mixture models, model selection, and the Bayesian Information Criterion (BIC).
This question raised some controversy during the discussion following the talk which was later resolved. The main issue that is interesting in this situation is that the likelihood function for mixture models does not satisfy the necessary regularity conditions to apply the classical results about the BIC.
Frantisek Matus
Frantisek Matus spoke about the representability of conditional independence (CI) structures.
Suppose independence vatieties are properly defined, also for the CI constraints involving functional independences, over the field of complex numbers.
Donald Richards
Donald Richards spoke on the problem of determining the number of solutions to generic maximum likelihood problems and the use of computational algebra, homotopy continuation, and mixed volume in these calculations.
Seth Sullivant
Seth Sullivant spoke about Markov bases for hierarchical models and theoretical results about the structure of minimal Markov basis elements.
Elizabeth Allman
Shmuel Onn
Shmuel Onn spoke about the spectra of hierarchical models. The spectra is defined to be the set of all sets of possible cell entries in for a multiway nonnegative integral table with fixed marginal totals.
Joseph Landsberg
Ilias Kotsireas
Challenge problem in Gröbner bases of polynomial ideals.
The system of polynomial equations given in 3 different formats at
http://www.cargo.wlu.ca/hi/had7.cocoa
http://www.cargo.wlu.ca/hi/had7.maple
http://www.cargo.wlu.ca/hi/had7.reduce
has equations in the variables , , .
Prove that this system of polynomial equations, does not have any solutions.
This would constitute an algebraic proof of the combinatorial result of L. D. Baumert: there are no Hadamard matrices of order with one circulant core.
Moreover, it is of interest to write down the effective Nullstellensatz for this ideal, i.e. write down explicitly a linear combination of (some of) the generators of the ideal, which is equal to 1.
Similar algebraic proofs could then be devised for Hadamard matrices with one circulant core of bigger order.
Reference: Ilias Kotsireas, Christos Koukouvinos, Jennifer Seberry, Hadamard ideals and Hadamard matrices with one circulant core. Submitted, November 2003.
Ideas that were suggested during the workshop:
Design efficient heuristics in binary trees.
Given sets of binary vectors in a set of binary trees with given depths, design efficient heuristics to predict the positions of the analogous vectors in binary trees of bigger depth. The adjective "analogous" can take on various meanings. In our context, it designates solutions of the associated polynomial system. Ideas that were suggested during the workshop: (Persi Diaconis) first of all obtain graphical visualizations and then analyze the data with DFT and other transforms. Work in progress.
Indicator function approach for Hadamard
Equivalence.
Check data sets of Hadamard matrices of big orders to identify the inequivalent Hadamard matrices. Ideas that were suggested during the workshop: (Maria Piera Rogantin) A JSPI paper by Fontana-Pistone-Rogantin contains a description of the indicator function approach to check Hadamard equivalence. In general, one needs to compute the indicator function which has exponentially many monomials. However, this approach is especially efficient for Hadamard matrices with some structure, e.g. with one circulant core. The main reason for this is that the exponentially many monomials fall naturally in a few categories and we only need to check equality of the coefficients of one representative from these categories. Work in progress.
First Open Problem Session
Meek: It depends on the type of problems you are interested in studying.
Roehrig: You can exchange a solution to problems on binary random variables to a problem on nonbinary random variables.
Dinwoodie: How might you use algebra to pursue Bayesian directions in model fit criteria?
Sturmfels: How to you compute this in Macaulay 2? What is Bayesian algebraic geometry?
Fienberg: Taking mixtures of things the different models we have discussed at this conference.
Meek: Finding the modes of the posterior distributions could be an interesting research problem.
Pistone: You could use moment generating functions and cumulant generating functions for these problems.
Fienberg: Identifiability is a big issue in the Bayesian context.
Wynn: How may Edgeworth corrections play a role in this problem?
Suppose that . Since the monomials are a basis for a quotient space, there is a unique representation of the as linear combinations of the . This produces polynomials which belong to . Question: are the only points which vanish on these polynomials the design points; that is, does ?
Question was answered in the negative by Bernd Sturmfels at the conference.
Sullivant: Is the number of symmetry classes a polynomial in the size of the problem?
Dinwoodie: Can we try to apply algebraic techniques to microarray data?
Meek: Gaussian (or similar) assumptions on the random variables might be possible to handle with algebraic techniques.
Karr: can we take advantage of sparse representations of problems?
Second Open Problem Session
Dinwoodie: Are there examples of this phenomenon in logistic regression?
Fienberg: First approach the problem using a block structure.
Steele: Look at a sequence of regressions as a sequence of polynomials. Can you solve the structure explicitly?
MLE Project
On Thursday afternoon, Bernd Sturmfels, Serkan Hosten, and Mathias Drton led a section on algebraic approaches to maximum likelihood estimations. The purpose was to introduce some algebraic techniques and possible small instances of statistical models to which one could apply these techniques. The eventual goal of this project is to try to come to a better understanding of the maximum likelihood estimation problem using algebraic means.
Algebraic Methods for Optimization:
Small Instances of Statistical Models
Questions from the Working Sessions
Graphical Models
Hosten: Look at the leading coefficients of polynomials in the elimination ideals (with parameters) and determine when these are zero.
Linear Polynomial Models
A design is a finite subset of . The design ideal is the vanishing ideal of polynomial functions vanishing on . The set of standard monomials modulo given a term order is denoted . A linear polynomial model is a list of monomials.
Richards: Could you use random search or genetic algorithms to solve this problem?
Software for Algebraic Statistics
Stillman: Not likely to work since those other systems (Mathematica, Maple, etc.) do not give access to their kernels.
Dinwoodie: Toric Markov bases
Riccomagno: Ideal of points
Pachter: A general R interface with algebra packages
Riccomagno: Gröbner bases over fraction fields, with algebraic numbers and parameters.
Pachter: When do we find out when these things have been implemented?
Allman: Improve the documentation!
Back to the
main index
for Computational Algebraic Statistics.