Stochastic Optimization and Automatic Differentiation for Machine Learning (Spring 2018)
Coding sessions
Ipython notebooks
Projects (due May 4th 12th 16th, by email)
You can choose either of the projects below. You can complete this assignment alone or as a group of two, but I will grade them accordingly (I will have higher expectations for an assignment completed jointly by two students). Please send me, in an email to marcocuturicameto+assignment@gmail.com, a zip file containing your repord in pdf (not a .doc) and code in whatever format (you can send a notebook).
SDCA
Implement the SDCA algorithm to estimate support Vector Machines. Test the algorithm on databases of your choice and compare it with a subgradient descent approach such as this one
Incremental methods with second order information
Implement and discuss the efficiency of the algorithm proposed in this paper by benchmarking it against SVRG (only) on a problem of your choice.
Predicting heatmaps using a 2layer neural net and a Wasserstein loss
Download the Geographical Origin of Music dataset.
Your task will be to predict where, in the world, comes a given piece of music. The dataset you are given contains features for 1059 songs. There are 116 features per song (described in the default_plus_chromatic_features_1059_tracks file). In the following task, please report simultaneously train error and validation error of your network as a function of the number of updates you perform (you can choose to split the dataset as you see fit). Use a simple gradient descent scheme to update your network by minimizing the loss.
Knowing that it is usually difficult to pinpoint exactly the location associated with a song, I have created a dataset where location is no longer given as a point, but, instead, a heatmap on the world (discretized on a 20x20 grid). This information is stored in this text file. That text file has 1059 lines, each of which contains 400 numbers, which correspond to the 20x20 heatmap matrix (enumerated in column order, as standard). Your task is now to fit a neural network whose output is no longer two numbers, but an entire heatmap covering the world. The function you should fit now is where is now a matrix and a matrix, an activation function, and the softmax function which ensures that the output of your network is a normalized histogram of size 400, which can be reshaped as a heatmap. For the loss function, the simplest choice is a KullbackLeibler divergence, which you can easily put in your code. BONUS: Another (more involved) option is the earth mover's distance between these two histograms, using a simple squaredEuclidean distance on the grid. An efficient way to do this would be to use the function Sinkhorn in the POT toolbox, dimension reduction scripts which approximates the earthmover's distance with a function that can be backpropped through autograd.
