Geometric Methods in Machine Learning (Spring 2019)

Introduction [slides]
Notes in progress.
Slides on SVM's
Slides on Metric Learning

Projects (due May 2nd, 2019)

You can choose either of the projects below. You can complete this assignment alone or as a group of two, but I will grade them accordingly (I will have higher expectations for an assignment completed jointly by two students). Please send me, in an email to marcocuturicameto+assignment@gmail.com, a zip file containing your repord in pdf (not a .doc) and code in whatever format (you can send a notebook).

Studying Time Series using Dynamic Time Warping

Download the Sales Transactions Weekly Dataset. You will recover 800 time series of sales of different products with 52 timestamps (one for each week). Alternatively, you can also download any other dataset of your liking, with the same type, i.e. labelled time series, with a few hundreds or thousands of time series of moderate length.

Implement the dynamic time warping distance in your programming language of choice.
Compute the dynamic time warping distance matrix between all 800 instances and try to embed these points in 2 dimensions using Isomap (you should implement it, not use scikitlean or other public implementations). Display these results.
Embed now these points in a higher dimension using still MDS (e.g. 5). Check that the stress you have found is indeed smaller. Use a simple k-means algorithm on these representations to gather these 800 observations in k subgroups (set k as you wish).
In order to visualize representative examples for each class in these samples, we consider time series averaging. We will compute an average for each of the k clusters computed in the previous question. Code to compute such averages is available in R or Python. Plot these averages and compare them with the usual Euclidean averaging you would obtain by simply summing these time-series.

Sinkhorn Emebddings

Read and summarize the findings provided in a recent paper on Sinkhorn embeddings, a nice idea published very recently to visualize datapoints as point clouds. You can use this idea to embed any arbitrary family of datapoints (e.g. that introduced in the question abos) to define a simple MDS type criterion whose aim is to compute point cloud representations in 2D. You need to use backpropagation (e.g. using autograd, tensorflow or pytorch) to achieve this.

Low rank factorization of kernel matrices

Read, summarize the method (not the theorertical guarantees) and implement the approach presented in this paper to carry out low-rank factorization of kernel matrices. Compare the method proposed by the authors with one baseline, the Nyström method, on the ijcnn1 dataset.