Deep Learning (Spring 2020)

Project (Latest Instructions Update: March 11 2020)

The deadline for submission is Apr. 19.

Please send your

  • pdf

  • colab link with experiments/code

to (if you use a different email alias, your assignment may risk ending “lost” in my inbox).

Choose one topic from those presented below. If you wish to explore a different direction, send me a proposal by email.

Generative Adversarial Networks and Cycle-GAN

Generative Adversarial Networks are neural network architectures trained to produce a generative models for images, that is a sampling mechanism that allows you to produce images from a code. In this assignment, after a short bibliographical review in which you will summarize two papers: the DC-GAN and the Wasserstein GAN (ignore the “math-y” theoretical results, focus on the equations/definitions) papers, you will consider a more advanced architecture that combines two GANs and which can be used to “translate” images with different characteristics, the cycleGAN. I give you two choices: either use this detailed walk-through to an assignment, or do it yourselves. In both cases, and in order to keep things simple, I will ask you to translate from the MNIST dataset to the USPS dataset.

Automatic differentiation of optimization solutions

A common criticism formulated against deep learning is the lack of interpretability of “black-box” networks. The alternative that is often proposed is that of more classic statistical procedures which typically rely on specifying a simpler model (e.g. linear) and train it with convex solvers (e.g. least-squares, lasso).

However, an important feature of deep learning over “training with simple models with convex loss” is that they can be used end-to-end, that is all intermediate representations of data can be considered in a single data-processing pass, as opposed to fragmenting ML pipeline in well-posed but “broken” flow (e.g. do k-means, then PCA of those clusters, then linear regression).

There has been an important research effort in recent years to turn the output of these “well posed” convex solvers into solve implicit functions around solutions of convex programs to obtain Jacobians for lasso, quadratic programs, conic programs and disciplined convex problems. For discrete problems, SAT solvers, ranking problems and more generic mixed integer programs (also here) have been handled using various flavors of relaxations and smoothing.

In this assignment, I ask you to consider two of the papers described above, summarize them, and propose experimental results on a different dataset.