for this workshop
Stein's method and applications in high-dimensional statistics
American Institute of Mathematics, San Jose, California
Jay Bartroff, Larry Goldstein, Stanislav Minsker, and Gesine Reinert
The main topics of the workshop are the following.
- Concentration of measure inequalities and sparse recovery problems
- Empirical measures and dimension reduction
- Sequential analysis and change-point detection
- Connections between distributed statistical estimation and rates of convergence to normal approximation
Concentration of measure inequalities are one of the most valuable tools in the study of high dimensional statistics, and are most often employed under the assumption that observations are Gaussian or have light tailed distributions. Additionally, some form of independence is also usually assumed to hold. Typically though, there is no reason to believe that real-world data sets can be modeled by such mathematically convenient distributions, as heavy-tailed models exhibiting dependence offer better approximations to reality. We will explore how Stein's method may be used to weaken assumptions, such as independence, and may inform recent promising advances that produce performance guarantees for heavy tailed distributions comparable to those for the Gaussian.
High dimensional data are often represented through empirical measures, as they provide a flexible view which allow focus to be placed on different aspects of data. Stein's method can be used to describe the asymptotic behavior of empirical measures even when the observations are heterogeneous and not independent of each other. In particular the error in the projection of empirical measures on subspaces can be bounded. For high-dimensional data one often seeks informative low-dimensional summaries. We shall investigate how Stein's method can be used to quantify the accuracy of such dimension reduction techniques.
Sequential hypothesis testing, estimation, and changepoint detection are also fertile ground for Stein techniques. One open problem is to obtain explicit distributional bounds between a stopped sequential test statistic and its limiting distribution, a problem connected to the excess over the boundary of a stopped random walk. A related problem is to explore the distributional effect of early stopping rules in Markov chain Monte Carlo methods for the analysis of high-dimensional data sets. Here the interest lies in stopping the Markov chain Monte Carlo run when it deviates too much from the target, requiring quantification of the distributional distance from the stopped chain to the limit, a main strength of Stein's method.
Large data sets are often processed in distributed systems that consist of several nodes, each of which are only able to access different data sub-samples. Communication between nodes being expensive or time consuming, each node functions independently and results are merged to obtain output at the final step. We want to understand how to design ``optimal'' merging strategies, and to study connections between divide-and-conquer algorithms and the rates of convergence in normal approximation.
The overarching theme of the workshop will be the development of new methods in high dimensional data analysis by applying recent advances in probabilistic methods.
The workshop will differ from typical conferences in some regards. Participants will be invited to suggest open problems and questions before the workshop begins, and these will be posted on the workshop website. These include specific problems on which there is hope of making some progress during the workshop, as well as more ambitious problems which may influence the future activity of the field. Lectures at the workshop will be focused on familiarizing the participants with the background material leading up to specific problems, and the schedule will include discussion and parallel working sessions.
The deadline to apply for support to participate in this workshop has passed.
For more information email firstname.lastname@example.org