# Learning Machines: W3 Class Notes

Graph theory describes hierarchy of related elements:

• vertices (nodes) are entities; edges represent the relationship between nodes
• can represent grouped paths in photoshop; a computer program;
• perceptron is a directed graph (info flows in one direction)
• recurrent neural networks allow cycles (loops) in their flow of info

Perceptron Implementation Notes:

• sign activation function: if number is greater than zero, output is 1; less than zero, output is -1
• bias input: always equal to one
• supervised training procedure:
• make predictions (ie, weights are random)
• have perceptron guess outputs; compare to actual known outputs
• compute the error; adjust all weights accordingly
• repeat
• HW:
• construct data sets, train it on all three (AND and OR should be 100% accurate)
• extra column of 1’s for bias input
• input is 2 columns (3 for bias input), output is 1 column
• AND set: true = 1, false = -1; two column pairs
• input [1,1], output = ; input [1, -1], output [-1], etc
• OR set
• XOR set will not be 100% accuracy (probably 50%)
• class Perceptron
• initializer function (number_of_input_dimensions, num_of_output_dimensions)
• weight = np.rand(num_input)
• predict(inputs)
• return array of output predictions
• training function(iterations, inputs, known outputs)
• for iter in range(num_iters):
• predict = (
• myPerceptron = Perceptrion()
• myPerceptron.train()
• myPerceptron.predict()

Linear separation:

• Exclusive Or: (a OR b) AND (NOT (a AND b))
• Both variables are dependent on each other, whereas in AND and OR models neither nodes need to know about each other
• not linearly separated like AND and OR
• If machine-learning about whether pixels compose a picture of a person, the perceptron asks each pixel if it may be part of a picture of a person, and if more than 50% say yes, then the output is yes, the picture is of a person
• does not account for interdependency of pixels

Calculus Primer:

• Calculus is about approximating the analog world
• derivative: rate of change in some phenomenon
• power rule: multiply power by variable’s coefficient, reduce power by 1
• derivative of x^2 is 2x
• chain rule: f(g(x)) = f'(g(x))g'(x) >> nested function, able to compute derivative by splaying it out