neural aesthetic class notes


  • data points are combinations inside feature space
  • Embeddings give us relationships between data points (closer points are more similar)
  • magnitude and direction have meaning, allow many basic retrieval applications
  • feature vectors and latent spaces are examples of embeddings
  • two vectors between two pairs of points have meaning

features are patterns of activations

  • every layer becomes less abstract/ more specific: edges, parallel lines, shapes, categories
  • last layer of activations; distance or correlation between vectors

transfer learning with images

  • dimensionality reduction; tries to preserve geometries
  • linearly-independent components


  • man>woman; country>capital; singular>plural
  • words are units; sentences are infinite—sentences and paragraphs can be embedded in feature space
    • word vectors are learned implicitly
    • question-inversion vector

principle component analysis to reduce

t-SNE better for visualization and discovery of similar neighbors, but for smaller datasets;

Neural Aesthetic W3 Class Notes

convolutional neural networks allow you to look for patterns anywhere in the image

Measuring cost

  • look at the shape of the loss function for all combinations of m and b
  • bottom of the “bowl” is the best fit
  • gradient descent
    • start at a random point
    • calculate its gradient (generalization of a slope in multiple dimensions; which direction is the slope going?)
    • go down the gradient until the loss stops decreasing
  • gradient descent for NN
    • backpropagation
    • calculate gradient using chain rule
    • relates the gradient to the individual activations, error is distributed to the weights
    • problem: local minima; no way of finding the global minimum (“batch gradient descent” is not used because of this)
      • how to deal:
        • calculate the gradient on subsets of the dataset: stochastic gradient descent, mini-batch gradient descent
        • momentum: able to roll out of a local minimum to the next
          • Nesterov momentum
        • adaptive methods: AdaGrad, AdaDelta, RMSprop, ADAM (when in doubt, use ADAM)
    • overfitting

Neural Aesthetic W2 Class Notes

Features are:

  1. patterns in data
  2. implicit
  3. indicative of salient aspects of objects
  4. closely related to bias


  • Linear regression doesn’t give much flexibility; you can give a neuron more by outputting it through a non-linearity, ie a sigmoid function
    • ReLU (rectified linear unit) is preferred over a sigmoid function
  • adding a hidden layer gives y (the output) even more flexibility

Convolutional NNs: scans for certain patterns throughout the entire image

activation= value of the neuron

weights on the connections