Monday Discussion on Combinatorial Issues

The disussion was moderated by Michelle Wachs. She started off the discussion by asking:

Q. What combinatorial questions arise in biology that might be of interest to biologists? Or, what problems motivated by biology would be of interest to mathematicians, to give us some interesting problems?

Huelsenbeck: In putting priors on the set of all possible trees, we sometimes need to count trees with particular characteristics, such as ones with a particular kind of edge (split, bipartition), or a list of either this or that kind of edge. The constraints may be multiple or conflicting. The count is needed so that we can sum probabilities over these trees.
Holmes pointed out that inclusion/exclusion issues are involved here. There was some discussion about NP-hard problems, but Billera pointed out that complexity issues are not that relevant for the basic question of how to count these things.

Q. Given a set of edges, and a subset of , how many leaf labeled binary trees have some subset of but not the edges in ? Do this for all .

Evans asked why we can't just use MCMC to solve this problem? Holmes pointed out that the combinatorics may give some insight.
Another example of a question that arises: the number of trees that are distance from a particular set of trees (e.g., using a distance such as nearest-neighbor interchange)?

Q. What are good codings for trees?

Wachs described a bijection between phylogenetic trees and type B permutations whose left-to-right minimum is unbarred. (Type B permutations are permutations in which each number can be barred or unbarred.) This can be used as a coding for trees.
There was a question about Penny's method for coding trees (in which each time you add an edge there are places to add it, so that you get of them). Billera said that this is the same as what was on the board.
Penny mentioned the existence of another scheme for coding each tree by a unique number. Holmes pointed out that the biologists use a Newick notation (New Hampshire format) that uses lots of parentheses. Another coding is a matrix where the columns are edges, and the rows are vertices, and you put a 1 if the edge is an ancestor of the vertex. (Brooks used it in biology, and Graham-Winkler used for addressing a network).
Holmes described a matching representation is one due to Diaconis-Holmes. When there is one matching, when there are 3 matchings, when there are 15 matchings, corresponding to 4 leaf trees. The matchings are the sibling pairs. Example: . Call the numbers from 1 to 4 available numbers, and the other numbers unavailable. Given a matching , pick the first available pair of numbers. By pigeonhole argument, there is some pair, that's a sibling pair in a tree. Then the smallest non-available number (5, in this case) is the parent, and the other member of the pair (3) is the sibling of 5. Continue.
Diaconis pointed out their matching method is related to Billera's method using Prufer codes.

Q. Is there an efficient representation for coding trees? (Billera)

Q. Is there a nice neat notation for trees that is continuous when doing a random walk on trees? In other words, do small changes in notation correspond to close trees? (Diaconis)

The type B permutation coding might be very amenable to doing a random walk. Bridson pointed out that this representation is related to subsets of the Cayley graph of the Weyl group.

Q. Are there questions that combinatorialists might have for biologists? Combinatorial structures that biologists might be interesting? (Wachs)

Q. If we took trees and instead of putting real numbers of them, putting discrete values, like 0-1, would that be of interest in biologists? (Diaconis)

Huelsenbeck: this maybe useful in character-mapping, mapping characters that are on or off. Perhaps some substitutions are more likely when characters are on.
Epstein: it certainly seems reasonable to associate to each edge a vector, rather than a real number.
Evans: as in Penny's talk, characters of each of the vertices can be thought of as elements of the Klein 4-group... the edges can represent differences of elements at the vertices.
Wachs: what about -ary trees?
Penny: people find cycles useful for representing uncertainty. Referred to some who tried to define how tree-like the data was, something tree-like would have only small cycles.
Holmes referred to a program called splitstree that makes such diagrams.
Penny: biologist would like structures that represents distances accurately.
Shareshian: described a space that arose in his work, a space in which cycles correspond to associahedra.

Back to the main index for Geometric models of biological phenomena.