All Tutorials, Plenarys and Parallel Talk I will be located in the Assembly Hall.
Parallel Talk II will be located in the Home Room.
Quick Link: June 1, June 2, June 3, June 4, June 5, June 6, June 7, June 8, June 9, June 10, June 11
Monday, June 1  
Breakfast (8:008:20)  
Welcome Speech By Steve Smale (8:208:30)  
Tutorial (8:3012:00)
Title: Kernel Methods and Support Vector Machines 

Lunch Break (12:001:00)  
Plenary Talk (1:002:00)
Title: Learning Phenomena Beyond Learning 

Parallel Talk I (2:003:00)
Title: Geometric Inference for Probability Distribution 
Parallel Talk II (2:003:00)
Title: Zigzag Persistence 
Coffee Break (3:003:30)  
Parallel Talk I (3:304:30)
Title: PACBayes Analysis: Background and Applications 
Parallel Talk II (3:304:30)
Title: Harmonic and Geometric Multiscale Analysis of Data Sets in High Dimensions 
Parallel Talk II (4:305:30)
Title: Cut Locus and Topology from Point Data 
Parallel Talk I (4:305:30)
Title: TBA 
top 
Tuesday, June 2  
Breakfast (8:008:30)  
Tutorial (8:3012:00)
Title: Graphical Models and Applications 

Lunch Break (12:001:00)  
Parallel Talk I (1:002:00)
Title: Euler Calculus and Topological Data Management 
Parallel Talk II (1:002:00)
Title: Visibility Constraints on Features of 3D Objects 
Coffee Break (2:002:30)  
Parallel Talk I (2:303:30)
Title: Seeking Interpretable Models for High Dimensional Data 
Parallel Talk II (2:303:30)
Title: Learning Compressed Sensing 
Parallel Talk I (3:304:30)
Title: Learning Dictionaries for Image Analysis and Sensing 
Parallel Talk II (3:304:30)
Title: Learning Submanifolds of Euclidean Space 
Poster Session (5:307:30) 
top 
Wednesday, June 3  
Breakfast (8:008:30)  
Tutorial (8:3012:00)
Title: Geometric Methods and Manifold Learning 

Lunch Break (12:001:00)  
Plenary Talk (1:002:00)
Title: Examining the Relative Influence of Familial, Genetic, and Environmental Covariate Information in Flexible Risk Models 

Parallel Talk I (2:003:00)
Title: Vision and Hodge Theory 
Parallel Talk II (2:003:00) 
Coffee Break (3:003:30)  
Parallel Talk I (3:304:30)
Title: Learning Deformable Models  Parallel Talk II (3:304:30)
Title: Implementation of Hodge Theory for Simplicial Complexes 
Parallel Talk I (4:305:30)
Title: Statistical Classification and Cluster Processes 
Parallel Talk II (4:305:30)
Title: Analysis of Reproducing Kernel Spaces for Learning 
top 
Thursday, June 4  
Breakfast (8:008:30)  
Tutorial (8:3012:00)
Title: Theory, Methods and Applications of Active Learning 

Lunch Break (12:001:00)  
Plenary Talk (1:002:00)
Title: Sparse Representations from Inverse Problems to Pattern Recognition 

Parallel Talk I (2:003:00)
Title: Optimization Algorithms in Support Vector Machines

Parallel Talk II (2:003:00)
Title: Community Structure in Large Social and Information Networks

Coffee Break (3:003:30)  
Parallel Talk II (3:304:30)
Title: Unsupervised Learning for Stereo Vision

Parallel Talk II (3:304:30)
Title: Active Learning and Selective Sensing 
Parallel Talk I (4:305:30)
Title: Learning Feature Hierarchies 
Parallel Talk II (4:305:30)
Title: On the Dimension Complexity of DNFs

top 
Friday, June 5  
Breakfast (8:008:30)  
Tutorial (8:3012:00)
Title: Theory and Applications of Boosting 

Lunch Break (12:001:00)  
Plenary Talk (2:003:00)
Title: Generative Models for Image Analysis 

Parallel Talk I(1:002:00)
Title: On Surrogate Loss Functions, fDivergences and Decentralized Detection 
Parallel Talk II (2:003:00)
Title: TBA 
Coffee Break (3:003:30)  
Parallel Talk I (3:304:30)
Title: Similaritybased Classifiers: Problems and Solutions

Parallel Talk II (3:304:30)
Title: Object Detection with Discriminatively Trained Part Based Models 
Parallel Talk II (4:305:30)
Title: What Do Unique Games, Structural Biology and the LowRank Matrix Completion Problem Have In Common?

Parallel Talk I (4:305:30)
Title: TBA 
top 
Saturday, June 6  
Breakfast (8:008:30)  
Tutorial (8:3012:00)
Title: Bounding Excess Risk in machine Learning 

Lunch Break (12:001:00)  
Plenary Talk (1:002:00)
Title: Learning in Hierarchical Architectures: from Neuroscience to Derived Kernels 

Parallel Talk I (2:003:00)
Title: More Data Less Work: Runtime As A Monotonically Decreasing Function of Data Set Size 
Parallel Talk II (2:003:00)
Title: Consistent Robust Methods for Logarithmic Time Prediction 
Coffee Break (3:003:30)  
Parallel Talk I (3:304:30)
Title: On Finding Low Error Clusterings 
Parallel Talk II (3:304:30)
Title: Multimanifold Data Modeling Via Spectral Curvature Clustering 
Parallel Talk I (4:305:30) 
Parallel Talk II (4:305:30)
Title: TBA 
top 
Sunday, June 7 
top 
Monday, June 8  
Breakfast (8:008:30)  
Tutorial (8:3012:00)
Title: Analysis of Clustering Procedures 

Lunch Break (12:001:00)  
Plenary Talk (1:002:00)
Title: The Stability of the Contour of an Orientable 2Manifold 

Parallel Talk I (2:003:00)
Title: On a Theory of Similarity Functions for Learning and Clustering

Parallel Talk II (2:003:00)
Title: Data Spectroscopy: Eigenspaces of Convolution Operators and Clustering 
Coffee Break (3:003:30)  
Parallel Talk I (3:304:30)
Title: A Bahadur Type Representation of the Linear Support Vector Machine and its Relative Efficiency 
Parallel Talk II (3:304:30)
Title: Two Models for Bayesian Supervised Dimension Reduction

Parallel Talk I (4:305:30)
Title: Fitting a Graph to Vector Data

Parallel Talk II (4:305:30)
Title: Semisupervised and Active Learning with Dual Supervision 
top 
Tuesday, June 9  
Breakfast (8:008:30)  
Tutorial (8:3012:00)
Title: An Overview of Compressed Sensing and sparse Signal Recovery via L1 Minimization 

Lunch Break (12:001:00)  
Plenary Talk (1:002:00)
Title: Spectral Graph Theory, Linear Solvers, and Applications


Parallel Talk I (2:003:00)
Title: Cheeger Cuts and pSpectral Clustering 
Parallel Talk II (2:003:00)
Title: TBA 
Coffee Break (3:003:30)  
Parallel Talk I (3:304:30)
Title: Multiscale Geometry and Harmonic Analysis of Data Bases

Parallel Talk II (3:304:30)
Title: A MultiView Approach to Learning

Parallel Talk I (4:305:30)
Title: Graphical Models for Speech Recognition: Articulatory and AudioVisual Models 
Parallel Talk II (4:305:30)
Title: TBA 
top 
Wednesday, June 10  
Breakfast (8:008:30)  
Tutorial (8:3012:00)
Title: Semisupervised Learning 

Lunch Break (12:001:00)  
Plenary Talk (1:002:00)
Title: Matrix Completion via Convex Optimization: Theory and Algorithms 

Parallel Talk I (2:003:00)
Title: Drifting Games, Boosting and Online Learning 
Parallel Talk II (2:003:00)
Title: Testing for Stationarity in Acoustic Signals: Parametric and Nonparametric Approaches 
Coffee Break (3:003:30)  
Parallel Talk I (3:304:30)
Title: Recent progresses in Combinatorial Statistics 
Parallel Talk II (3:304:30)
Title: An Inverse Problems Perspective on Learning Theory 
Parallel Talk I (4:305:30)
Title: MAP Estimation with Perfect Graphs

Parallel Talk II (4:305:30)
Title: Statistical Methods for Cortical Control of Hand Movement 
top 
Thursday, June 11  
Breakfast (8:008:30)  
Poster Session (9:0011:00) 

Lunch Break (12:001:00)  
Plenary Talk (1:002:00)
Title: Inference for Networks 

Parallel Talk I (2:003:00)
Title: Cocktail Party Problem as Binary Classification 
Parallel Talk II (2:003:00)
Title: VariableSelection Bayesian Regression: A Novel Prior Distribution and Applications to Genetic Association Studies 
Coffee Break (3:003:30)  
Parallel Talk I (3:304:30)
Title: Machine Learning in Acoustic Signal Processing

Parallel Talk II (3:304:30)
Title: Supervised Spectral Latent Variable Models 
Parallel Talk I (4:305:30)
Title: Nonlinear Dimension Reduction by Spectral Connectivity Analysis and Diffusion CoarseGraining 
Parallel Talk II (4:305:30)
Title: Speech As A Pattern of Points In Time 
List of Abstracts Yali Amit, University of Chicago Title: Learning Deformable Models Abstract: It is widely recognized that the fundamental building block in high level computer vision is the deformable template, which represents realizations of an object class in the image as noisy geometric instantiations of an underlying model. The instantiations typically come from a subset of some group centered at the identity which act on the model or template. Thus in contrast to some machine learning applications where one tries to discover some unspecified manifold structure, here it is entirely determined by the group action and the model. Given a choice of group action and family of template models a major challenge is to use a sample of images of the object to estimate the model and the distribution on the group. The primary obstacle is that the instantiations or group elements that produced each image are unobserved. I will describe a general formulation of this problem and then show some practical applications to object detection and recognition. Nina Balcan, Microsoft Title: On Finding Low Error Clusterings Abstract: There has been substantial work on approximation algorithms for clustering data under distancebased objective functions such as kmedian, kmeans, and minsum objectives. This work is fueled in part by the hope that approximating these objectives well will indeed yield more accurate solutions. That is, for problems such as clustering proteins by function, or clustering images by subject, there is some unknown correct "target" clustering and the implicit assumption is that clusterings that are approximately optimal in terms of these distancebased measures are also approximately correct in terms of error with respect to the target. In this work we show that if we make this implicit assumption explicit that is, if we assume that any capproximation to the given clustering objective Phi is epsilonclose to the target  then we can produce clusterings that are O(epsilon)close to the target, even for values c for which obtaining a capproximation is NPhard. In particular, for the kmedian, kmeans, and minsum objectives, we show that we can achieve this guarantee for any constant c > 1. Our results shows how for these clustering objectives one can get much better guarantees on accuracy than those implied by results obtained so far in the approximation literature, by wisely using all the available information for the problem at hand. Ronen Basri, TTIC Title: Visibility Constraints on Features of 3D Objects Abstract: To recognize threedimensional objects it is important to model how their appearances can change due to changed in viewpoint. A key aspect of this involves understanding which object features can be simultaneously visible under different viewpoints. We address this problem in an image based framework, in which we use a limited number of images of an object taken from unknown viewpoints to determine which subsets of features might be simultaneously visible in other views. This leads to the problem of determining whether a set of images, each containing a set of features, is consistent with a single 3D object. We assume that each feature is visible from a disk of viewpoint on the viewing sphere. In this case we show the problem is NPhard in general, but can be solved efficiently when all views come from a circle on the viewing sphere. We also give iterative algorithms that can handle noisy data and converge to locally optimal solutions in the general case. Our techniques can also be used to recover viewpoint information from the set of features that are visible in different images. We show that these algorithms perform well both on synthetic data and images from the COIL dataset. Joint work done with Pedro Felzenszwalb, Ross Girshick, David Jacobs, and Caroline Klivans. Misha Belkin, Ohio State University Title: Abstract: Peter Bickel, University of California, Berkeley Title: Inference for Networks Abstract: A great deal of attention has recently been paid to determining subcommunities on the basis of relations, corresponding to edges, between individuals, corresponding to vertices out of an unlabelled graph (Neman, SIAM Review 2003; Airoldi et al JMLR 2008; Leskovec & Kleinberg et al SIGKDD 2005) for probabilistic ergodic models of infinite unlabelled graphs. We drive consistency properties of the NewmanGirvon index, and develop an index with better consistency properties and better performance on simulated data sets. This is joint work with Aiyou Chen. Avrim Blum, Carnegie Mellon University Title: On a Theory of Similarity Functions for Learning and Clustering Abstract: Kernel methods have become powerful tools in machine learning. They perform well in many applications, and there is also a welldeveloped theory of what makes a given kernel useful for a given learning problem. However, this theory requires viewing kernels as implicit (and often difficult to characterize) maps into highdimensional spaces. In this talk I will describe work on developing a theory that just views a kernel as a measure of similarity between data objects, and describes the usefulness of a given kernel (or more general similarity function) in terms of fairly intuitive, direct properties of how the similarity function relates to the task at hand, without need to refer to any implicit spaces. I will also talk about an extension of this framework to learning from purely unlabeled data, i.e., clustering. In particular, one can ask how much stronger the properties of a similarity function should be (in terms of its relation to the unknown desired clustering) so that it can be used to cluster well: to learn well without any label information at all. We find that if we are willing to relax the objective a bit (for example, allow the algorithm to produce a hierarchical clustering that we will call successful if some pruning is close to the desired clustering), then this question leads to a number of interesting graphtheoretic and gametheoretic properties that are sufficient to cluster well. This work can be viewed defining a kind of PAC model for clustering. (This talk based on work joint with MariaFlorina Balcan, Santosh Vempala, and Nati Srebro). Emmanuel Candes, California Institute of Technology Tutorial Title: An Overview of Compressed Sensing and Sparse Signal Recovery via L1 Minimization Abstract: In many applications, one often has fewer equations than unknowns. While this seems hopeless, the premise that the object we wish to recover is sparse or nearly sparse radically changes the problem, making the search for solutions feasible. This lecture will introduce sparsity as a key modeling tool together with a series of little miracles touching on many areas of data processing. These examples show that finding *that* solution to an underdetermined system of linear equations with minimum L1 norm, often returns the ``right'' answer. Further, there is by now a wellestablished body of work going by the name of compressed sensing, which asserts that one can exploit sparsity or compressibility when acquiring signals of general interest, and that one can design nonadaptive sampling techniques that condense the information in a compressible signal into a small amount of datain fewer data points than were thought necessary. We will survey some of these theories and trace back some of their origins to early work done in the 50's. Because these theories are broadly applicable in nature, the tutorial will move through several applications areas that may be impacted such as signal processing, biomedical imaging, machine learning and so on. Finally, we will discuss how these theories and methods have far reaching implications for sensor design and other types of designs. Plenary Title: Matrix Completion via Convex Optimization: Theory and AlgorithmsAbstract: This talk considers a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. In partially filled out surveys, for instance, we would like to infer the many missing entries. In the area of recommender systems, users submit ratings on a subset of entries in a database, and the vendor provides recommendations based on the user's preferences. Because users only rate a few items, we would like to infer their preference for unrated items (this is the famous Netflix problem). Formally, suppose that we observe m entries selected uniformly at random from a matrix. Can we complete the matrix and recover the entries that we have not seen? We show that perhaps surprisingly, one can recover lowrank matrices exactly from what appear to be highly incomplete sets of sampled entries; that is, from a minimally sampled set of entries. Further, perfect recovery is possible by solving a simple convex optimization program, namely, a convenient semidefinite program. A surprise is that our methods are optimal and succeed as soon as recovery is possible by any method whatsoever, no matter how intractable; this result hinges on powerful techniques in probability theory. Time permitting, we will also present a very efficient algorithm based on iterative singular value thresholding, which can complete matrices with about a billion entries in a matter of minutes on a personal computer. Rui Castro, Columbia University Tutorial Title: Theory, Methods and Applications of Active Learning Abstract: This tutorial will cover a wide range of topics related to active learning. Broadly speaking, active learning refers to sequential data selection and inference procedures that actively seek out highly informative data, rather than relying solely on random training examples or nonadaptively selected data. We will review basic theory and algorithms for active learning, and discuss applications in classification, regression, and sparse signal recovery. The tutorial will be organized into three main parts. The first part will introduce the problem of active learning, describe a few standard problems and algorithms, and pose the fundamental challenges of active learning. The second part is an indepth "casestudy" of active learning for binary classification, a problem where perhaps the most is known. In this context, we will show how theoretical analysis can quantify the capabilities and limitations of active learning. The final part will discuss future directions and emerging applications of active learning, including applications in cognitive and network sciences. Frederic Chazal, INRIA, France Title: Geometric Inference for Probability Distributions Abstract: Data often comes in the form of a point cloud sampled from an unknown compact subset of Euclidean space. The general goal of geometric inference is then to recover geometric and topological features (Betti numbers, curvatures,...) of this subset from the approximating point cloud data. In recent years, it appeared that the study of distance functions allows to address many of these questions successfully. However, one of the main limitations of this framework is that it does not cope well with outliers nor with background noise.
In this talk, we will show how to extend the framework of distance functions to overcome this problem. Replacing compact subsets by measures, we will introduce a notion of distance function to a probability distribution in $\R^n$. Ronald Coifman, Yale University Title: Multiscale Geometry and Harmonic Analysis of Data Bases Abstract: We describe a method for geometrization of databases such as, questionnaires, or lists of sensor outputs. Interlacing multiscale diffusion geometries of rows and columns of a data matrix, results in a pair of YÂ´language ontologiesÂ¡ which are mutually supportive, (certain words are used in certain contexts). This mutual geometry serves a structure of Harmonic Analysis and signal processing on the database. We will illustrate, on databases of audio (music), psychological questionnaires, science documents, images and many others. Joint work with Mata Gavish, Yale University Sanjoy Dasgupta, University of California, San Diego Title: Analysis of Clustering Procedures Abstract: Clustering procedures are notoriously short on rigorous guarantees. In this tutorial, I will cover some of the types of analysis that have been applied to clustering, and emphasize open problems that remain. Part I. Approximation algorithms for clustering Two popular cost functions for clustering are kcenter and kmeans. Both are NPhard to optimize exactly. (a) Algorithms for approximately optimizing these cost functions. (b) Hierarchical versions of such clusterings. (c) Clustering when data is arriving in a streaming or online manner. Part II. Analysis of popular heuristics (a) How good is kmeans? How fast is it? (b) Probabilistic analysis of EM. (c) What approximation ratio is achieved by agglomerative heuristics for hierarchial clustering? Part III. Statistical theory in clustering What aspects of the underlying data distribution are captured by the clustering of a finite sample from that distribution? (a) Consistency of kmeans. (b) The cluster tree and linkage algorithms. (c) Rates for vector quantization. Vin de Silva, Pomona College Title: Zigzag Persistence Abstract: Zigzag persistence is a new methodology for studying persistence of topological features across a family of spaces or pointcloud data sets. Building on classical results about quiver representations, zigzag persistence generalizes the highly successful theory of persistent homology and addresses several situations which are not covered by that theory. I will present theoretical and algorithmic foundations with a view towards applications in topological statistics. As an important example, I discuss a particular zigzag sequence derived from the levelsets of a realvalued function on a topological space. A powerful structure theorem, called the Pyramid Theorem, establishes a connection between this "levelset zigzag persistence" and the extended persistence of CohenSteiner, Edelsbrunner and Harer. This theorem resolves and open question concerning the symmetry of extended persistence. Moreover, the interval persistence of Dey and Wenger can be understood in this context; in some sense it carries threequarters of the information produced by the other two theories. This is joint work with Gunnar Carlsson and Dmitriy Morozov. Tamal Dey, Ohio State University Title: Cut Locus and Topology from Point Data Abstract: A cut locus of a point p in a compact Riemannian manifold M is defined as the set of points where minimizing geodesics issued from p stop being minimizing. It is known that a cut locus contains most of the topological information of M. Our goal is to utilize this property of cut loci to decipher the topology of M from a point sample. Recently it has been shown that Rips complexes can be built from a point sample P of M systematically to compute the Betti numbers, the rank of the homology groups of M. Rips complexes can be computed easily and therefore are favored over others such as restricted Delaunay, alpha, Cech, and witness complex. However, the sizes of the Rips complexes tend to be large. Since the dimension of a cut locus is lower than that of the manifold M, a subsample of P approximating the cut locus is usually much smaller in size and hence admits a relatively small Rips complex. Herbert Edelsbrunner, Duke University Title: The Stability of the Contour of an Orientable 2Manifold Abstract: Think of the view of the boundary of a solid shape as a projection of a 2manifold to R^2. Its apparent contour is the projection of the critical points. Generalizing the projection to smooth mappings of a 2manifold to R^2, we get the contour as the image of the points at which the derivative is not surjective. Measuring difference with the erosion distance (the Hausdorff distance between the complements), we prove that the contour is stable. Along the way, we introduce the by now well established method of persistent homology, including the stability of its diagrams, as well as an extension using zigzag modules. Joint work with Dmitriy Morozov and Amit Patel. Pedro Felzenszwalb, University of Chicago Title: Object Detection with Discriminatively Trained Part Based Models Abstract: Object recognition is one of the fundamental challenges in computer vision. I will consider the problem of detecting objects of a generic category, such as people or cars in static images. This is a difficult problem because objects in such categories can vary greatly in appearance. For example, people wear different clothes and take a variety of poses while cars come in various shapes and colors. We have developed an object detection system based on mixtures of multiscale deformable part models. The system is both highly efficient and accurate, achieving stateoftheart results on the PASCAL object detection challenges. We train our models using a discriminative procedure that we call latent SVM. A latent SVM leads to a nonconvex training problem. However, a latent SVM is semiconvex and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative algorithm that alternates between fixing latent values for positive examples and optimization of the latent SVM objective function. I will also discuss how this framework may eventually make possible the effective use of models based on rich visual grammars. Joint work with Ross Girshick, David McAllester and Deva Ramanan. Yoav Freund, University of California, San Diego Title: Drifting Games, Boosting and Online Learning Abstract: Drifting games provide a new and useful framework for analyzing learning algorithms. In this talk I will present the framework and show how it is used to derive a new boosting algorithm, called RobustBoost and a new online prediction algorithm, called NormalHedge. I will present two sets of experiments using these algorithms on synthetic and real world data. The first set demonstrates that RobustBoost can learn from mislabeled training data. The second demonstrating an application of NormalHedge to the tracking moving objects Stuart Geman, Brown University Title: Generative Models for Image Analysis Abstract: A probabilistic grammar for the grouping and labeling of parts and objects, when taken together with pose and partdependent appearance models, constitutes a generative scene model and a Bayesian framework for image analysis. To the extent that the generative model generates features, as opposed to pixel intensities, the Ã’inverseÃ“ or Ã’posterior distributionÃ“ on interpretations given images is based on incomplete information; feature vectors are generally insufficient to recover the original intensities. I will argue for fully generative scene models, meaning models that in principle generate actual digital pictures. I will outline an approach to the construction of fully generative models through an extension of contextsensitive grammars and a reformulation of the popular template models for image fragments. Mostly I will focus on the problem of learning template models from image data. Since the model is fully specified (generative), at the pixel level, the templates can be learned by maximum likelihood. A training set of eyes, for example, yields an ensemble of left and right eyes, of familiar and natural character, but not actually coming from any particular individuals in the training set. The upshot is a mixture distribution on image patches, consisting of a set of templates and a set of conditional patch distributionsÃ‘one for each template. One way to test the model is to examine samples. I will show how to sample from the mixture distribution and I will show sample sets of eyes, mouths, and generic Ã’background.Ã“ Another way to test the model is to use it for detection, recognition, or classification. I will show the results of a test on ethnic classification based on the eye region of faces. Robert Ghrist, University of Pennsylvania Title: Euler Calculus and Topological Data Management Abstract: This talk covers the basic of an integral calculus based on Euler characteristic, and its utility in data problems, particularly in aggregation of redundant data and inverse problems over networks. This calculus is a blend of integralgeometric and sheaf theoretic techniques, and leads to surprisingly practical algorithms and computations. Qualitative versions of integral transforms for signal processing will be stressed. Maya Gupta, University of Washington, Seattle Title: Similaritybased Classifiers: Problems and Solutions Abstract: Similaritybased learning assumes one is given similarities between samples to learn from, and can be considered a special case of graphbased learning where the graph is given and fullyconnected. Such problems arise frequently in computer vision, bioinformatics, and problems involving human judgment. We will review the field of similaritybased classification and describe the main problems encountered in adapting standard algorithms for this problem, including different approaches to approximating indefinite similarities by kernels. We will motivate why local methods lessen the indefinite similarity problem, and show that a kernelized linear interpolation and local kernel ridge regression can be profitably applied to such similaritybased classification problems by framing them as weighted nearestneighbor classifiers. Eight real datasets will be used to compare stateoftheart methods and illustrate the open challenges in this field. Mark HasegawaJohnson, University of Illinois Title: Machine Learning in Acoustic Signal Processing Abstract: This tutorial presents a framework for understanding and comparing applications of pattern recognition in acoustic signal processing. Representative applications will be delimited by two binary features: (1) regression vs. (2) classification (inferred variables are continuous vs. discrete), (A) instantaneous vs. (B) dynamic. (1. Regression) problems include imaging and sound source tracking using a device with unknown properties, and inverse problems, e.g., articulatory estimation from speech audio. (2. Classification) problems include, e.g., the detection of syllable onsets and offsets in a speech signal, and the classification of nonspeech audio events. (A. Instantaneous) inference is performed using a universal approximator (neural network, Gaussian mixture, kernel regression), constrained or regularized, if necessary, to reduce generalization error (resulting in a support vector machine, shrunk net, pruned tree, or boosted classifier combination). (B. Dynamic) inference methods apply prior knowledge of state transition probabilities, either in the form of a regularization term (e.g., using Bayesian inference) or in the form of set constraints (e.g., using linear programming) or both; examples include speechtotext transcription, acoustictoarticulatory inversion using a switching Kalman filter, and computation of the query presence probability in an audio information retrieval task. Matthias Hein, Saarland University Title: Cheeger Cuts and pSpectral Clustering Abstract: Spectral clustering has become in recent years one of the most popular clustering algorithm. In this talk I discuss a generalized version of spectral clustering based on the second eigenvector of the graph pLaplacian, a nonlinear generalization of the graph Laplacian. The clustering obtained for 1<=2 can be seen as an interpolation of the relaxation of the normalized cut (p=2) and the Cheeger cut (p=1). However, the main motivation for pspectral clustering is the fact, that one can show that the cut value obtained by thresholding the second eigenvector of the pLaplacian converges towards the optimal Cheeger cut as p tends to 1. I will also present an efficient implementation which allows to do pspectral clustering for large scale datasets. Anil Hirani, University of Illinois, UrbanaChmpaign Title: Implementation of Hodge Theory for Simplicial Complexes Abstract: Hodge decomposition is the main ingredient of Hodge theory. It may be familiar as HelmholtzHodge decomposition of vector fields into curlfree, divergencefree and harmonic parts. How can this be efficiently implemented for arbitrary dimensional simplicial complexes embedded in high dimensional Euclidean space? What are even the basic objects to be manipulated by a computer implementation of Hodge theory? How should the differential, its adjoint, and the Laplacian be implemented? I will describe what we use in PyDEC, our Python language based software library which can compute Hodge decompositions (it can also compute finite element solutions for certain partial differential equations). I will also make my own humble attempt to guess what this Hodge stuff may have to do with machine learning. I may be able to show a demo or two relevant to machine learning. Much of this is joint work with Nathan Bell. Ali Jadbabaie, University of Pennsylvania Title: Distributed Computation of Sparse Generators of Homology Abstract: In this talk I will show how we can use discrete Hodge theory to compute a generator of homology for a simplicial complex in a distributed fashion. I will then demonstrate how one can refine the computed generator to find the sparsest generator in the same class in a distributed, scalable fashion. The proposed approach has some connections to 11 norm minimization and compressed sensing, and can be thought of a generalization of network flow problems from graphs to simplicial complexes. The application of this approach is then demonstrated by finding the sparsest cover of a sensor network and finding network coverage holes. Joint work with Alireza TahbazSalehi. Tony Jebara, Columbis University Title:MAP Estimation with Perfect Graphs Abstract:Efficiently finding the maximum a posteriori (MAP) configuration of a graphical model is an important problem which is often implemented using message passing algorithms and linear programming. The optimality of such algorithms is only well established for singlyconnected graphs such as trees. Recently, along with others, we have shown that matching and bmatching also admit exact MAP estimation under max product belief propagation. This leads us to consider a generalization of trees, matchings and bmatchings: the fascinating family of socalled perfect graphs. While MAP estimation in general loopy graphical models is NP, for perfect graphs of a particular form, the problem is in P. This result leverages recent progress in defining perfect graphs (the strong perfect graph theorem which has been resolved after 4 decades), linear programming relaxations of MAP estimation and recent convergent message passing schemes. In particular, we convert any graphical model into a socalled nand Markov random field. This model is straightforward to relax into a linear program whose integrality can be established in general by testing for graph perfection. This perfection test is performed efficiently using a polynomial time algorithm. Alternatively, known decomposition tools from perfect graph theory may be used to prove perfection for certain graphs. Thus, a general graph framework is provided for determining when MAP estimation in any graphical model is in P, has an integral linear programming relaxation and is exactly recoverable by message passing. Michael Jordan, University of California, Berkeley Title: On Surrogate Loss Functions, fdivergences and Decentralized Detection Abstract: In 1951, David Blackwell published a seminal paperwidely cited in economicsin which a link was established between the risk based on 01 loss and a class of functionals known as fdivergences. The latter functionals have since come to play an important role in several areas of signal processing and information theory, including decentralized detection. Yet their role in these fields has largely been heuristic. We show that an extension of BlackwellÃ•s programme provides a solid foundation for the use of fdivergences in decentralized detection, as well as in more general problems of experimental design. Our extension is based on a connection between fdivergences and the class of socalled surrogate loss funcionscomputationallyinspired upper bounds on 01 loss that have become central in the machine learning literature on classification. (Joint work with XuanLong Nguyen and Martin Wainwright) Sham Kakade, TTIC Title: A MultiView Approach to Learning Abstract:Extracting information relevant to a task in an unsupervised manner is one of the most promising directions for reducing the burden of collecting labeled data  the latter of which is often prohibitively expensive, as it often involves procedures like human hand labeling, clinical trials, costly experiments, etc. The central questions here are: what structure in the unlabeled data can be exploited and how to utilize this structure? Often, our input X can be partitioned into two 'views' X1 and X2, where each view provides a natural source of information for predicting our target Y. For example, for the task of speaker identification, two input sources are an audio stream and a video stream, or, for a webpage classification task, our 'views' could be the word content and the hyperlink structure. We show that, under certain natural independence assumptions, canonical correlation analysis (a generalization of PCA to pairs of random variables) provides a dimensionality reduction technique such that we can project our input X to a lower dimensional space without loosing predictive power of the target Y. Such a reduction could significantly reduce the number of labeled samples required. We then discuss how this idea can be utilized in a number of settings: including supervised learning, (hierarchical) clustering, timeseries modeling, and domain adaptation (where our training and testÊ distributions could differ substantially). Vladimir Koltchinskii, Georgia Institute of Technology Tutorial Title: Bounding Excess Risk in Machine Learning Abstract: We will discuss a general approach to the problem of bounding YÂ´the excess riskÂ¡ of learning algorithms based on empirical risk minimization (possibly penalized). This approach has been developed in the recent years by several authors (among others: Massart; Bartlett, Bousquet and Mendelson; Koltchinskii). It is based on powerful concentration inequalities due to Talagrand as well as on a variety of tools of empirical processes theory (comparison inequalities, entropy and generic chaining bounds on Gaussian, empirical and Rademacher processes, etc.). It provides a way to obtain sharp excess risk bounds in a number of problems such as regression, density estimation and classification and for many different classes of learning methods (kernel machines, ensemble methods, sparse recovery). It also provides a general way to construct sharp data dependent bounds on excess risk that can be used in model selection and adaptation problems. John Langford, Yahoo Research Title: Consistent Robust Methods for Logarithmic Time Prediction Abstract: A natural goal when predicting one of n choices is to do it in O(log(n)) time. Existing algorithms for doing this suffer from several deficienciesthey are often otherwise inefficient in training, sometimes inconsistent, and not very robust when they are consistent. I will present a family of techniques which addresses all of these problems, yielding mechanisms for prediction which are online, logarithmic time, consistent, and robust, for a wide variety of prediction problems. Yann LeCun, New York University Title: Learning Feature Hierarchies Abstract: Ann Lee, Carnegie Mellon University Title: Nonlinear Dimension Reduction by Spectral Connectivity Analysis and Diffusion CoarseGraining Abstract: For naturally occurring data, the dimension of the given input space is often very large while the data themselves have a low intrinsic dimensionality. Spectral kernel methods are nonlinear techniques for transforming data into a coordinate system that efficiently reveal the geometric structure in particular, the YÂ´connectivityÂ¡ of the data. In this talk, we will focus on one particular technique diffusion maps and diffusion coarsegraining; the construction is based on a Markov random walk on the data and offers a general scheme of simultaneously reorganizing and quantizing graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators and that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification and a precise measure of performance of kmeans clustering in spectral embedding spaces. We will discuss two particular applications of diffusion coarsegraining: One application is choosing an appropriate set of prototype similar stellar population (SSP) spectra for parameter estimation of star formation history in galaxies. The other example is texture discrimination by a novel geometrybased metric on distributions. (Part of this work is joint with R.R. Coifman, S. Lafon, J. Richards and C. Schafer) Yoonkyung Lee, Ohio State University Title: A Bahadur Type Representation of the Linear Support Vector Machine and its Relative Efficiency Abstract: The support vector machine has been used successfully in a variety of applications. Also on the theoretical front, its statistical properties including Bayes risk consistency have been examined rather extensively. Taking another look at the method, we investigate the asymptotic behavior of the linear support vector machine through Bahadur type representation of the coefficients established under appropriate conditions. Their asymptotic normality and statistical variability are derived on the basis of the representation. Furthermore, direct theoretical comparison is made with likelihood based approach to classification such as linear discriminant analysis and logistic regression in terms of the asymptotic relative efficiency, where the efficiency of a classification procedure is defined using the excess risk from the Bayes risk. Gilad Lerman, University of Minnesota Title: MultiManifold Data Modeling via Spectral Curvature Clustering Abstract: We propose a fast multiway spectral clustering algorithm for multi manifold data modeling, i.e., modeling data by mixtures of manifolds (possibly intersecting). We describe the supporting theory as well as the practical choices guided by it. We first develop the case of hybrid linear modeling, i.e., when the underlying manifolds are affine subspaces in a Euclidean space, and then we extend this setting to more general manifolds. We exemplify the practical use of the algorithm by demonstrating its successful application to problems of motion segmentation. Karen Livescu, TTIC Title: Graphical Models for Speech Recognition: Articulatory and AudioVisual Models Abstract: Since the 1980s, the main approach to automatic speech recognition has been using hidden Markov models (HMMs), in which each state corresponds to a phoneme or part of a phoneme in the context of the neighboring phonemes. Despite their crude approximation of the speech signal, and the large margin for improvement still remaining, HMMs have proven difficult to beat. In the last few years, there has been increasing interest in more complex graphical models for speech recognition, involving multiple streams of states. I will describe two such approaches, one modeling pronunciation variation as the result of the "sloppy" behavior of articulatory variables (the states of the lips, tongue, etc.) and the other modeling the audio and visual states in audiovisual speech recognition (i.e. recognition enhanced by "lipreading"). Mauro Maggioni, Duke University Title: Harmonic and Geometric Multiscale Analysis of Data Sets in High Dimensions Abstract: In many applications one is faced with the task of analyzing large amounts of data, typically embedded in highdimensional space, but with a lower effective dimensionality, due to physical or statistical constraints. We are interested in studying the geometry of such data sets, modeled as noisy manifolds or graphs, in particular in estimating its intrinsic dimensionality and finding intrinsic coordinate systems on the data. We discuss recent results in these directions, where eigenfunctions of a Laplacian on the data or the associated heat kernel can be used to introduce coordinates with provable guarantees on their biLipschitz distortion. We also discuss ways of studying, fitting, denoising and regularizing functions defined on the data, by using Fourier or a waveletlike multiscale analysis on the data. We present applications to nonlinear image denoising, semisupervised learning on a family of benchmark datasets, and, if time permits, to Markov decision processes. Michael Mahoney, Stanford University Title: Community Structure in Large Social and Information Networks Abstract: The concept of a community is central to social network analysis, and thus a large body of work has been devoted to identifying community structure. For example, a community may be thought of as a set of web pages on related topics, a set of people who share common interests, or more generally as a set of nodes in a network more similar amongst themselves than with the remainder of the network. Motivated by difficulties we experienced at actually finding meaningful communities in large realworld networks, we have performed a large scale analysis of a wide range of social and information networks. Our main methodology uses local spectral methods and involves computing isoperimetric properties of the networks at various size scales  a novel application of ideas from statistics and machine learning to internet data analysis. Our empirical results suggest a significantly more refined picture of community structure than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small size scales, and at larger size scales, the best possible communities gradually ``blend in'' with the rest of the network and thus become less ``communitylike.'' This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. Possible mechanisms for reproducing our empirical observations will be discussed, as will implications of these findings for clustering, classification, and more general data analysis in modern large social and information networks. Stephane Mallat, Ecole Polytechnique Title: Sparse Representations from Inverse Problems to Pattern Recognition Abstract: Sparse representations are at the core of many lowlevel signal processing procedures and are used by most pattern recognition algorithms to reduce the dimension of the search space. Structuring sparse representations fro pattern recognition applications requires taking into account invariants relatively to physical deformations such as rotation scaling or illumination. Sparsity, invariants and stability are conflicting requirements which is a source of open problems. Structured sparse representations with locally linear vector spaces are introduced for superresolution inverse problems and pattern recognition. David McAllester, TTIC Title: Unsupervised Learning for Stereo Vision Abstract: We consider the problem of learning to estimate depth from stereo image pairs. This can be formulated as unsupervised learning  the training pairs are not labeled with depth. We have formulated an algorithm which maximizes conditional likelihood the left image given right image in a model that involves latent information (depth). This unsupervised learning algorithm implicitly trains shape from texture and shape from shading monocular depth cues. The talk will present pragmatic results in the stereo vision problem as well as a general formulation of models and methods for maximizing conditional likelihood in a latent variable model where we wish to interpret the latent information as "labels". Peter McCullagh, University of Chicago Title: Statistical Classifications and Cluster Processes Abstract: After an introduction to the notion of an exchangeable random partition, we continue with a more detailed discussion of the Ewens process and some of its antecedents. The concept of an exchangeable cluster process will be described, the main example being the GaussEwens process. Some applications of cluster processes will be discussed, including problems of classification or supervised learning, and cluster analysis (unsupervised learning). A second type of probabilistic model based on point processes is also described. By contrast, which the GaussEwes cluster process, the domain associated with each class is more diffuse and not localized in the feature space. For both models, the classification problem is interpreted as the problem of computing the predictive distribution for the class of a new object having a given feature vector. In one case, this is a conditional distribution given the observed features, in the other a Papangelou conditional intensity. Gary Miller, Carnegie Mellon University Title: Spectral Graph Theory, Linear Solvers, and Applications Abstract: We discuss the development of combinatorial methods for solving symmetric diagonally dominate linear systems. Over the last fifteen years the computer science community has made substantial progress in fast solvers for SDD systems. For general SDD systems the upper bound is $0(m \log^k n)$ for some constant $k$, where $m$ is the number of nonzero entries, due to Spielman and Teng. Newer methods, combinatorial multigrid, have linear time guarantee for the planar case and work very well in practice. Critical to the use of these new solvers has been the reduction of problems to the solution of SDD systems. We present some of these reductions, including several from image processing. Elchanan Mossel, University of California, Berkeley Title: Recent Progress in Combinatorial Statistics Abstract: I will discuss some recent progress in combinatorial statistics. In particular, I will describe progress in the areas of reconstructing graphical models, ML estimation of the Mallows model and diagnostics of MCMC. Sayan Mukherjee, Duke University Title: Two Models for Bayesian Supervised Dimension Reduction Abstract: We study and develop two Bayesian frameworks for supervised dimension reduction that apply to nonlinear manifolds: Bayesian mixtures of inverse regressions and gradient based methods. Formal probabilistic models with likelihoods and priors are given for both methods and efficient posterior estimates of the effective dimension reduction space and predictive factors can be obtained by a Gibbs sampling procedure. In the case of the gradient based methods estimates of conditional dependence between covariates predictive of the response can also be inferred. Relations to manifold learning and Bayesian factor models are made explicit. The utility of the approach is illustrated on simulated and real examples Partha Niyogi, University of Chicago Title: Speech As A Pattern of Points in Time Abstract: This talk presents techniques for nonstationarity detection in the context of speech and audio waveforms, with broad application to any class of time series that exhibits locally stationary behavior. Many such waveforms, in particular informationcarrying natural sound signals, exhibit a degree of controlled nonstationarity, and are often well modeled as slowly timevarying systems. The talk first describes the basic concepts of such systems and their analysis via local Fourier methods. Parametric approaches appropriate for speech are then introduced by way of timevarying autoregressive models, along with nonparametric approaches based on variation of timelocalized estimates of the power spectral density of an observed random process, along with an efficient offline bootstrap procedure based on the Wold representation. ÊSeveral realworld examples are given. Rob Nowak, University of WisconsinMadison Title: Active Learning and Selective Sensing Abstract: Traditional approaches to machine learning and statistical inference are passive, in the sense that all data are collected prior to analysis in a nonadaptive fashion. One can envision, however more active strategies in which information gleaned from previously collected data is used to guide the selection of new data. This talk discusses the emerging theory of such "active learning" methods. I will show that feedback between data analysis and data collection can be crucial for effective learning and inference. The talk will describe two active learning problems. First, I will consider binaryvalued prediction (classification) problems, for which the prediction errors of passive learning methods can be exponentially larger than those of active learning. Second, I will discuss the role of active learning in the recovery of sparse vectors in noise. I will show that certain weak, sparse patterns are imperceptible from passive measurements, but can be recovered perfectly using selective sensing. Tommy Poggio, MIT Title: Learning in Hierarchical Architectures: from Neuroscience to Derived Kernels Abstract: Understanding the processing of information in our cortex is a significant part of understanding how the brain works, arguably one of the greatest problems in science today. In particular, our visual abilities are computationally amazing: computer science is still far from being able to create a vision engine that imitates them. Thus, visual cortex and the problem of the computations it performs may well be a good proxy for the rest of the cortex and for intelligence itself. Alexander Razborov, University of Chicago Title: On the Dimension Complexity of DNFs Abstract: The dimension complexity of a given concept class is defined as the smallest dimension of its halfspace embedding. The best currently known algorithm for learning DNF formulas discovered by A. Klivans and R. Servedio proceeds precisely by placing an upper bound on the dimension complexity of the concept class consisting of all (polysize) DNF. In this talk we give a matching upper bound thereby ruling out the possibility of improving the KlivansServedio algorithm along similar lines. Joint work with Alexander Sherstov (University of Texas). Lorenzo Rosasco, MIT Title: An Inverse Problems Perspective on Learning Theory Abstract: Learning machines are systems trained, instead of programmed, with a set of examples to perform a task. At a high level the problem of learning a can be seen as an inverse problem, where one is interested in recovering a model from measurements. However, a closer look reveals that the mathematical frameworks for learning and inverse problems seem quite different, as are the ultimate goals: generalization in learning and stability in inverse problems. In this talk we show that in fact the connection between inverse problems and learning goes beyond an informal analogy. If we assume the functions of interest belong to a Hilbert space where functions can be pointwise evaluated, we can cast the problem of learning as an inverse problem defined by a bounded linear operator. Building on such a connection we can view different learning algorithms, such as early stopping, supervised dimensionality reduction and penalized empirical risk minimization, as different ways of finding stable solutions to a stochastic inverse problem. New as well as old algorithms are described in a unifying functional framework close to that used in other fields such as signal processing. The relation between generalization and stability can be formalized using operator theoretic results from inverse problems and concentration inequalities for random matrices. Guillermo Sapiro, University of Minnesota Title: Learning Dictionaries for Image Analysis and Sensing Abstract: Sparse representations have recently drawn much attention from the signal processing and learning communities. The basic underlying model consist of considering that natural images, or signals in general, admit a sparse decomposition in some redundant dictionary. This means that we can find a linear combination of a few atoms from the dictionary that lead to an efficient representation of the original signal. Recent results have shown that learning overcomplete nonparametric dictionaries for image representations, instead of using offtheshelf ones, significantly improves numerous image and video processing tasks. Rob Schapire, Princeton University Tutorial Title: Theory and Applications of Boosting Abstract: Boosting is a general method for producing a very accurate classification rule by combining rough and moderately inaccurate "rules of thumb". While rooted in a theoretical framework of machine learning, boosting has been found to perform quite well empirically. This tutorial will introduce the boosting algorithm AdaBoost, and explain the underlying theory of boosting, including explanations that have been given as to why boosting often does not suffer from overfitting, as well as some of the myriad other theoretical points of view that have been taken on this algorithm. Some practical applications and extensions of boosting will also be described. Greg Shakhnarovich, TTIC Title: Statistical Methods for Cortical Control of Hand Movement Abstract: I will talk about statistical methods for decoding neural activity in the cortex to control dexterous hand activity, as part of a brainmachine interface setup, with the eventual goal to enable function neural prostheses for arm and hand. Human hand is a complex biomechanical device with many degrees of freedom. However, movement involved in natural tasks, such as reaching, grasping and object manipulation, exhibits many constraints on the joint behavior of these degrees of freedom, due to a variety of factors. This may impose an inherently lowdimensional spatial and temporal structure on the movement and allow for efficient decoding, even with noisy neural signal from a small cell population. I will describe recent results with statistical models that capture such structure, and its relationship to neural activity, to obtain state of the art openloop reconstruction of motion from multielectrode cortical recording. Joint work with C. VargasIrwin, P. Yadollahpour, M. Black, and J. Donoghue. Tao Shi, Ohio State University Title: Data Spectroscopy: Eigenspaces of Convolution Operators and Clustering Abstract:In this talk, we focus on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we provide population analysis to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We warn that a fixed number of top eigenvectors might at the same time contain redundant clustering information and miss relevant clustering information. We use this insight to design the Data Spectroscopic clustering (DaSpec) algorithm that utilizes properly selected eigenvectors to determine the number of clusters and to group the data accordingly. Our findings extend the intuitions underlying existing spectral techniques such as spectral clustering and Kernel Principal Components Analysis, and provide new understanding into their usability and modes of failure. Simulation studies and experiments on real world data are conducted to show the potential of our algorithm. DaSpec is found to handle unbalanced groups and recover clusters of different shapes better than competing methods. This is joint work with Prof. Mikhail Belkin (CSE, Ohio State University) and Prof. Bin Yu (Statistics, UC Berkeley). Vikas Sindhwani, IBM Title: Semisupervised and Active Learning with Dual Supervision Abstract: As the capability to collect and store data moves into the petascale regime, it is clear that for typical machine learning tasks, human annotation effort can only touch a minuscule fraction of the total data available. The problem of learning from small labeled sets by utilizing large amounts of unlabeled examples has attracted significant interest in recent years. In this talk, we consider a different complementary mechanism to potentially reduce the burden of labeling. We consider tasks where it is frequently possible to label smaller data entities more cheaply  for example, by associating words, as opposed to documents, with classes in text categorization tasks. We present new semisupervised learning algorithms based on spectral bipartite graph partitioning and matrix approximation formulations for coclustering, that support dual supervision in the form of labels for both examples and/or features. We then consider the novel problem of active dual supervision, or, how to optimally query an example and feature labeling oracle to simultaneously collect two different forms of supervision, with the objective of building the best classifier in the most cost effective manner. We present empirical studies demonstrating the potential of simple active dual supervision schemes to significantly reduce the overall cost of acquiring labeled data. This is joint work with Prem Melville and Jianying Hu. Amit Singer, Princeton University Title: What Do Unique Games, Structural Biology and the LowRank Matrix Completion Problem Have In Common? Abstract: We will formulate several datadriven applications as MAX2LIN and dto1 games, and show how to (approximately) solve them using efficient spectral and semidefinite program relaxations. The relaxations perform incredibly well in the presence of a large number of outlier measurements that cannot be satisfied. We use random matrix theory to prove that the algorithms almost achieve the information theoretic Shannon bound. The underlying group structure of the different applications (like SO(2), SO(3), GL(n), etc.) is heavily exploited. Applications include: cryoelectron microscopy and NMR spectroscopy for 3D protein structuring, lowrank matrix completion, clock synchronization, and surface reconstruction in computer vision and optics. Partly joint with Yoel Shkolnisky, Ronald Coifman and Fred Sigworth (Yale); Mihai Cucuringu and Yaron Lipman (Princeton); and Yosi Keller (Bar Ilan). Stephen Smale, TTIC Title: Vision and Hodge theory Abstract: A general mathematical Hodge theory will be presented together with its relationship to spaces of images. Cristian Sminchisescu, University of Bonn Title: Supervised Spectral Latent Variable Models Abstract: We present probabilistic structured prediction methods for learning inputoutput dependencies where correlations between outputs are modeled as lowdimensional manifolds constrained by both geometric, distance preserving relations in the output distribution, and predictive power of inputs. Technically this reduces to learning a probabilistic, input conditional model, over latent (manifold) and output variables using an alternation scheme. In one round, we optimize the parameters of an inputdriven manifold predictor using latent targets given by preimages (conditional expectations) of the current manifoldtooutput model. In the next round, we use the distribution given by the manifold predictor in order to maximize the probability of the outputs with an additional, implicit geometry preserving constraint on the manifold. The resulting Supervised Spectral Latent Variable Model (SSLVM) combines the properties of geometric manifold learning (accommodates geometric constraints corresponding to any spectral embedding method including PCA, ISOMAP or Laplacian Eigenmaps), with the additional supervisory information to further constrain these for predictive tasks in a probabilistic setting. We discuss the gains of this methodology over baseline PPCA + regression frameworks and demonstrate its potential in realworld computer vision benchmarks for recognition and threedimensional human pose reconstruction. Joint work with Liefeng Bo, TTIC. Dan Spielman, Yale University Title: Fitting a Graph to Vector Data Abstract: We ask "What is the right graph to fit to a set of vectors?" We propose one solution that provides good answers to standard Machine Learning problems, that has interesting combinatorial properties, and that we can compute efficiently. Joint work with Jonathan Kelner and Samuel Daitch. John ShaweTaylor, University College London Tutorial Title: Kernel Methods and Support Vector Machines Abstract: Kernel methods have become a standard tool for pattern analysis during the last fifteen years since the introduction of support vector machines. We will introduce the key ideas and indicate how this approach to pattern analysis enables a relatively easy plug and play application of different tools. The problem of choosing and designing a kernel for specific types of data will also be considered and an overview of different kernels will be given. Nati Srebro, TTIC Title: More Data Less Work: Runtime as a Monotonically Decreasing Function of Data Set Size Abstract: We are used to studying runtime as an increasing function of the data set size, and are happy when this increase is not so bad (e.g. when the runtime increases linearly, or even polynomiall, with the data set size). ÊTraditional runtime analysis of learning is also viewed this way, and studies how training runtime increases as more data is available. ÊHowever, considering the true objective of training, which is to obtain a good predictor, I will argue that training runtime should actually be studied as a *decreasing* function of training set size. ÊFocusing on training Support Vector Machines (SVMs), and combining ideas from optimization, statistical learning theory, and online methods, I will then present both theoretical and empirical results demonstrating how a simple stochastic subgradient descent approach indeed displays such monotonic decreasing behavior. ÊI will also discuss a similar phenomena in the context of Gaussian mixture clustering, where it appears that excess data turns the problem from computationally intractable to computationally tractable. Joint work with Shai ShalevShwartz, Karthik Sridharan, Yoram Singer, Greg Shakhnarovich and Sam Roweis. Matthew Stephens, University of Chicago Title: VariableSelection Bayesian Regression: A Novel Prior Distribution, and Applications to Genetic Association Studies Abstract: Ongoing largescale genetic association studies, in an attempt to identify variants and genes affecting susceptibility to common diseases, are typing hundreds of thousands of genetic variants (SNPs) in thousands of individuals, with the aim of identifying which SNPs are associated with disease status. Standard statistical analyses consider each SNP in turn, and perform hundreds of thousands of univariate tests. Here we consider analyzing multiple SNPs simultaneously using Bayesian variableselection multiple regression (in the context of a continuous response). We suggest a novel prior distribution for the effect sizes of associated SNPs, based on the overall proportion of variance explained. We assess the computational feasibility of this approach through simulation, and investigate its merits relative to both standard univariate analyses, and penalised regression approaches such as LASSO. Leslie Valiant, Harvard University Title: Learning Phenomena Beyond Learning Abstract: We shall argue that the study of computational learning is fundamental and has implications well beyond the phenomenon of learning itself. We shall give two illustrations. First we shall offer a formulation of biological evolution as a restricted form of PAC learning, and shall describe some results for this formulation. Second we shall discuss the phenomenon of intelligent data processing in terms of the question of how reasoning can be performed on learned data with some guarantees of the validity of conclusions. The framework of robust logic that addresses this phenomenon will be described, as well as the results of some experiments that implement this framework for a natural language dataset. Grace Wahba, University of WisconsinMadison Title: Examining the Relative Influence of Familial, Genetic, and Environmental Covariate Information in Flexible Risk Models Abstract: We present a novel method for examining the relative influence of familial, genetic and environmental covariate information in flexible nonparametric risk models. Our goal is investigating the relative importance of these three sources of information as they are associated with a particular outcome. To that end, we developed a method for incorporating arbitrary pedigree information in a smoothing spline ANOVA (SSANOVA) model. By expressing pedigree data as a positive semidefinite kernel matrix, the SSANOVA model is able to estimate a logodds ratio as a multicomponent function of several variables: one or more functional components representing information from environmental covariates and/or genetic marker data and another representing pedigree relationships. We report a case study on models for retinal pigmentary abnormalities in the Beaver Dam Eye Study (BDES). Our model verifies known facts about the epidemiology of this eye lesion  found in eyes with early agerelated macular degeneration (AMD)  and shows significantly increased predictive ability in models that include all three of the genetic, environmental and familial data sources. The case study also shows that models that contain only two of these data sources, that is, pedigreeenvironmental covariates or pedigreegenetic markers, or environmental covariatesgenetic markers, have comparable predictive ability, while less than the model with all three. This result is consistent with the notions that genetic marker data encodes at least partly  pedigree data, and that familial correlations encode shared environment data as well. DeLiang Wang, Ohio State University Title: Cocktail Party Problem as Binary Classification Abstract: Speech segregation, or the cocktail party problem, has proven to be extremely challenging. Part of the challenge stems from the lack of a carefully analyzed computational goal. While the separation of every sound source in a mixture is considered the gold standard, I argue that such an objective is neither realistic nor what the human auditory system does. Motivated by the auditory masking phenomenon, we have suggested instead the ideal timefrequency (TF) binary mask as a main goal for computational auditory scene analysis. Ideal binary masking retains the mixture energy in TF units where the local signaltonoise ratio exceeds a certain threshold, and rejects the mixture energy in other TF units. Recent psychophysical evidence shows that ideal binary masking leads to large speech intelligibility improvements in noisy environments for both normalhearing and hearingimpaired listeners. The effectiveness of the ideal binary mask implies that sound separation may be formulated as a case of binary classification, which opens the cocktail party problem to a variety of pattern classification and clustering methods. As an example, I discuss a recent system that segregates unvoiced speech by supervised classification of acousticphonetic features. Shmuel Weinberger, University of Chicago Title: Learning Submanifolds of Euclidean Space Abstract: I will explain some of the issues involved with deducing aspects of the topology of submanifolds of Euclidean space from random and noisy samples. Joint work with Partha Niyogi and Steve Smale. Yair Weiss, Hebrew University Title: Learning Compressed Sensing Abstract: Compressed sensing is a recent set of mathematical results showing that sparse signals can be exactly reconstructed from a small number of linear measurements. Interestingly, for ideal sparse signals with no measurement noise, random measurements allow perfect reconstruction while measurements based on principal component analysis (PCA) or independent component analysis (ICA) do not. At the same time, for other signal and noise distributions, PCA and ICA can significantly outperform random projections in terms of enabling reconstruction from a small number of measurements. In this paper we ask: given a training set typical of the signals we wish to measure, what are the optimal set of linear projections for compressed sensing? We show that the optimal projections are in general not the principal components nor the independent components of the data, but rather a seemingly novel set of projections that capture what is still uncertain about the signal, given the training set. We also show that the projections onto the learned uncertain components may far outperform random projections. This is particularly true in the case of natural images, where random projections have vanishingly small signal to noise ratio as the number of pixels becomes large. Joint work with HyunSung Chang and Bill Freeman. Abstract: I will give a brief introduction to questions of representation, learning and inference in probabilistic graphical models and illustrate these ideas in applications from our own work in computational biology and computer vision. Patrick Wolfe, Harvard University Title: Testing for Stationarity in Acoustic Signals: Parametric and Nonparametric Approaches Abstract: This talk presents techniques for nonstationarity detection in the context of speech and audio waveforms, with broad application to any class of time series that exhibits locally stationary behavior. Many such waveforms, in particular informationcarrying natural sound signals, exhibit a degree of controlled nonstationarity, and are often well modeled as slowly timevarying systems. The talk first describes the basic concepts of such systems and their analysis via local Fourier methods. Parametric approaches appropriate for speech are then introduced by way of timevarying autoregressive models, along with nonparametric approaches based on variation of timelocalized estimates of the power spectral density of an observed random process, along with an efficient offline bootstrap procedure based on the Wold representation. ÊSeveral realworld examples are given. Stephen Wright, University of WisconsinMadison Title: Optimization Algorithms in Support Vector Machines Abstract: This talk presents techniques for nonstationarity detection in the context of speech and audio waveforms, with broad application to any class of time series that exhibits locally stationary behavior. Many such waveforms, in particular informationcarrying natural sound signals, exhibit a degree of controlled nonstationarity, and are often well modeled as slowly timevarying systems. The talk first describes the basic concepts of such systems and their analysis via local Fourier methods. Parametric approaches appropriate for speech are then introduced by way of timevarying autoregressive models, along with nonparametric approaches based on variation of timelocalized estimates of the power spectral density of an observed random process, along with an efficient offline bootstrap procedure based on the Wold representation. Several realworld examples are given. Bin Yu, University of California, Berkeley Title: Seeking Interpretable Models for High Dimensional Data Abstract: Extracting useful information from highdimensional data is the focus of today's statistical research and practice. After broad success of statistical machine learning on prediction through regularization, interpretability is gaining attention and sparsity has been used as its proxy. With the virtues of both regularization and sparsity, Lasso (L1 penalized L2 minimization) has been very popular recently. In this talk, I would like to discuss the theory and pratcice of sparse modeling. First, I will give an overview of recent research on sparsity and explain what useful insights have been learned from theoretical analyses of Lasso. Second, I will present collaborative research with the Gallant Lab at Berkeley on building sparse models (linear, nonlinear, and graphical) that describe fMRI responses in primary visual cortex area V1 to natural images. DingXuan Zhou, City University of Hong Kong Title: Analysis of Reproducing Kernel Spaces for Learning Abstract: Reproducing kernel Hilbert spaces play an important role in learning theory. Their mathematical properties determine approximation powers and learning ability of generated algorithms. In this talk we provide some analysis of reproducing kernel Hilbert spaces including smoothness of functions ensured by regularity of kernels. In particular, we show that a Gaussian kernel with a fixed variance has weak learning ability while allowing flexible variances of the Gaussian kernels increases the approximation powers. It is verified that the union of unit balls of reproducing kernel Hilbert spaces generated by Gaussian kernels with flexible variances is a uniform GlivenkoCantelli class. This confirms the learnability of Gaussian kernels with changing variances. Some concrete algorithms for regression and online classification are discussed and learning rates are demonstrated. Jerry Zhu, University of WisconsinMadison Tutorial Title: SemiSupervised Learning Abstract: This tutorial covers classification approaches that utilize both labeled and unlabeled data. We will review selftraining, Gaussian mixture models, cotraining, multiview learning, graphtransduction and manifold regularization, transductive SVMs, and a PAC bound for semisupervised learning. We then discuss some new development, including online semisupervised learning, multimanifold learning, and human semisupervised learning. Afra Zomorodian, Dartmouth College Title: Abstract: 