Optimisation

Terry Benzschawel

Although optimisation is typically considered separately from regularisation, both are used in combination to address undersampling and local minimum problems. The interest in optimisation techniques stems mainly from their role in deep learning neural networks: optimisation methods are used to choose parameters that are optimal for a given problem. Demonstrated successes of optimisation methods have inspired research designed to model more challenging machine learning problems, and to design new methods.

8.1 OPTIMISATION ALGORITHMS

This chapter describes the popular optimisation methods stochastic gradient descent (SGD), RMSProp, AdaGrad, AdaDelta, Adam and AdaMax.

Stochastic gradient descent

Although SGD was introduced earlier, in Chapter 6, for backpropagation, many optimisation methods are built on this method. First-order optimisation algorithms minimise or maximise a loss function E(x) using its gradient values with respect to the parameters. The most widely used first-order optimisation algorithm is gradient descent. The first-order derivative tells us whether the error function is decreasing or increasing at a particular point. The first-order derivative basically gives us a

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to Risk.net? View our subscription options

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here