Stay hungry,stay foolish.

Machine Learning Linear regression with one variable

Posted on By Evictor

Model representation

Notation:

  • m = Number of training examples
  • x’s = “input” variable/features
  • y’s = “output” variables/”target” variable
  • (x,y) = one training example
  • Machine_learning_image= ith training example
  • h = hypothesis Machine_learning_image1!

    Cost function

  • Hypothesis:Machine_learning_image
  • Machine_learning_image:Parameters , but how to choose Machine_learning_image?
  • Minimize modeling error:
  • Machine_learning_image

    Gradient descent

  • Machine_learning_image
  • Machine_learning_image= learning rate
  • Batch Gradient Descent:Each step of gradient descent uses all the training examples.
  • Stochastic gradient descent(SGD):Use one example in each iteration
  • Mini-batch Gradient descent:Use some examples in each iteration Machine_learning_image!
  • Momentum:
    Momentum is a method that helps accelerate SGD in the relevant direction and dampens oscillations.
  • AdaGrad:
    Adagrad is an algorithm for gradient-based optimization that does just this: It adapts the learning rate to the parameters, performing larger updates for infrequent and smaller updates for frequent parameters. For this reason, it is well-suited for dealing with sparse data.
  • Adam:
    Adaptive Moment Estimation (Adam) is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients vt like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum. Machine_learning_image!Machine_learning_image!