Learning Machine W4: Class Notes

(Grapher: Mac Os app)

Multilayer Perceptron:

  • Perceptron changes the weights to get better answers; multilayer perceptron gets better outputs with hidden weights that represent facets of a problem that the machine determines
    • rather than inputs going straight into the output, there is a hidden layer of factors of indeterminate length that sit between input and output
    • inputs are what are visible to the computer; computer assumes there’s more than what is visible and compensates with the hidden layer to explain what’s visible
      • “wires” are layers; each layer has inputs and outputs
      • “input” is output of the previous layer
  • backpropagation: “error” of one layer is a function of previous layer, so for loop runs backward through layers to adjust weights
    • relates to gradient descent: moves toward lowest point (error rate) one step at a time
    • HW visualizing error rate is helpful (over epochs, aka training iterations)
      • if error rate flattens out well before zero, it’s hit a local minimum or is overfit
      • if error rate nears zero then suddenly increases, it’s overfit to your training examples
        • solutions: give more examples, lower the learning rate; most effective are drop out (every time we train, we block some neurons from changing) and regularization (Occam’s razor: simplest solution is usually the correct one; penalizes extreme conclusions)
          • can attach an unsupervised learner to the supervised learner
  • learning rate is just a multiplier so we don’t learn too much too quickly (would require fewer examples/iterations, and therefore develop perceptions too early that are hard to back out of)

Supervised Learning:

  • two categories of problems: regression and classification problems
    • regression: function (curve, ie stock prices)
      • linear regression: take a curve and fit it to a straight line that is the best approximation
      • nonlinear regression: try to fit trend to a curve that is the best approximation
      • deals with a continuous function
    • classification: discreet (as opposed to continuous) outputs
      • one-hot encoding: each category has its own dimension
        • as many output dimensions as there are categories; training data= 1 for the thing it’s representing, 0 for things it’s not
      • one-cold encoding: inverse of one-hot
  • 90-95% accuracy is best
  • can create sub-datasets within historic data set and loop inside subsets as training: three-day input /one-day output

Activation functions:

  • Sigmoid (0 to 1) and tanh (-1 to 1)
    • squashes any values above max and below min
    • outputs need to be within the domain (use mapping function) of activation function
    • tanh twice as much precision (ie twice as many numbers)

HW data sets: http://archive.ics.uci.edu/ml/index.php

  • Just use one hidden layer, variation can be how many nodes
    • number of nodes = somewhere between average number of input dimensions + number of output dimensions and twice the largest dimension (x+y/2 and 2x)

Leave a Reply

Your email address will not be published. Required fields are marked *