**Task T:**

- Classification: output is a category
- Regression: continuous vs discreet
**anomaly detection: EEG data**- synthesis and sampling
- density estimation or probability mass function estimation
- ie clusters of distribution

**Performance Measure, P:**

**Accuracy**of predicted labels compared to true labels**mean squared error**between target and predictions**likelihood**: probability of predicting the true outcome label for samples- classification
- function of the model parameters
- conditional probability
- i.i.d. = identically and independently drawn

Machine learning formulation:

- input: X
- output: Y
- task: f
**θ**(x) - evaluation: loss

**Supervised Learning- Classification**

- input X: continuous or categorical vector or Matrix or Tensor
- output Y: categorical label
- task f
**θ**(x): some function f that computes probability of the each for each sample - evaluation loss

**Supervised Learning- Regression**

- input X: continuous or categorical vector or Matrix or Tensor
- output Y: continuous target
- task f
**θ**(x): some function f that computes target for each sample - evaluation loss: mean squared error loss, or adversarial loss, etc

**Supervised Learning- Structured Output**

- input X: continuous or categorical vector or Matrix or Tensor
- output Y: continuous or categorical vector or Matrix or Tensor
- task f
**θ**(x): some function f that computes a vector/matrix/tensor for each sample - evaluation loss: combination

**Unsupervised Learning- Density Estimation**

- input X: continuous or categorical vector or Matrix or Tensor
- output Y: P(X)
- task f
**θ**(x): - evaluation loss: log likelihood of observing Xs are they are

**Unsupervised Learning- Denoising**

- input X (noisy): continuous or categorical vector or Matrix or Tensor
- output X: denoised continuous or categorical vector or Matrix or Tensor
- task f
**θ**(x): some function that returns an output identical in size but with nice constraints - evaluation loss

Gradient descent: used to find the local minimum

Solution to under/overfitting:

- randomly sample and set aside a test set; the rest of the data becomes the training set
- optimize the loss function on the training set only
- have 3 sets: training set, validation set, and test set