Friday, July 28, 2023

Activation Functions

Activation Functions

An activation function is a crucial component of artificial neural networks, a type of machine learning model inspired by the functioning of the human brain. It is applied to the output of each neuron (node) in a neural network to introduce non-linearity into the model. This non-linearity allows the network to learn and approximate complex, non-linear relationships within the data it processes.


When information flows through a neural network, each neuron receives inputs, performs a weighted sum of those inputs, and then passes the result through the activation function before forwarding it to the next layer. The activation function then decides whether the neuron should be "activated" or not based on the input it received.


There are several types of activation functions, but some of the most commonly used ones include:

Sigmoid function: The sigmoid function maps the input to a value between 0 and 1, which can be interpreted as a probability. It is given by the formula: f(x) = 1 / (1 + exp(-x)). However, it has some drawbacks like vanishing gradients that can lead to slow convergence during training.

Rectified Linear Unit (ReLU): The ReLU activation function is simple and widely used. It returns the input value if it's positive and zero otherwise (f(x) = max(0, x)). ReLU helps mitigate the vanishing gradient problem and accelerates training, but it can also suffer from the "dying ReLU" problem if neurons get stuck during training and never activate.

Leaky ReLU: To address the "dying ReLU" issue, Leaky ReLU allows a small, non-zero gradient for negative input values. The function is defined as f(x) = max(a*x, x), where 'a' is a small positive constant.

Hyperbolic Tangent (tanh): The tanh function is similar to the sigmoid but maps the input to a range between -1 and 1. It has a steeper gradient around zero, which can help with convergence compared to the sigmoid function.

Softmax: The softmax activation function is commonly used in the output layer of classification neural networks. It converts the raw scores of the output layer into probability values, allowing the network to predict the class with the highest probability.


Different activation functions suit different types of problems and architectures, and choosing the right one can significantly impact the performance and training of a neural network. The choice of activation function depends on factors like the type of data, the architecture of the network, and the specific task the network is designed for.

No comments:

Post a Comment