convolutional neural networks allow you to look for patterns anywhere in the image
Measuring cost
- look at the shape of the loss function for all combinations of m and b
- bottom of the “bowl” is the best fit
- gradient descent
- start at a random point
- calculate its gradient (generalization of a slope in multiple dimensions; which direction is the slope going?)
- go down the gradient until the loss stops decreasing
- gradient descent for NN
- backpropagation
- calculate gradient using chain rule
- relates the gradient to the individual activations, error is distributed to the weights
- problem: local minima; no way of finding the global minimum (“batch gradient descent” is not used because of this)
- how to deal:
- calculate the gradient on subsets of the dataset: stochastic gradient descent, mini-batch gradient descent
- momentum: able to roll out of a local minimum to the next
- Nesterov momentum
- adaptive methods: AdaGrad, AdaDelta, RMSprop, ADAM (when in doubt, use ADAM)
- how to deal:
- overfitting