Sunday, July 30, 2023

Hyperbolic Tangent (tanh) and Python Implementation

Hyperbolic Tangent, commonly referred to as tanh, is an activation function frequently used in artificial neural networks. It is an extension of the sigmoid function but maps the input to a range between -1 and 1, making it zero-centered and capable of handling both positive and negative inputs. The tanh function exhibits stronger gradients around the origin compared to the sigmoid function, which can be advantageous during training.


The mathematical definition of the tanh activation function is as follows:

f(x) = (2 / (1 + exp(-2x))) - 1

Where:

x is the input to the function, which can be a single value or a vector (in the case of neural networks, it is usually the weighted sum of inputs to a neuron).

exp denotes the exponential function, which raises the mathematical constant "e" (approximately 2.71828) to the power of the argument.

Key characteristics of the tanh function:

  1. Range: The output of tanh ranges from -1 to 1. When 'x' is large and positive, the function approaches 1, and when 'x' is large and negative, the function approaches -1. When 'x' is close to zero, the tanh function approaches zero.
  2. Zero-centered: Unlike the sigmoid function, which has its midpoint at 0.5, tanh is zero-centered, meaning that its midpoint is at 0. This can be beneficial for optimization algorithms and helps avoid issues like vanishing gradients.


Advantages of tanh activation function:

  1. Stronger gradients: The tanh function has steeper gradients around zero compared to the sigmoid function. This can facilitate faster learning and convergence during training.
  2. Zero-centered output: Having a zero-centered output can help neural networks converge faster, especially in situations where the data distribution is centered around zero.


Despite its advantages, tanh shares some drawbacks with the sigmoid function, such as the potential for vanishing gradients for very large or very small inputs. In many cases, ReLU and its variants are preferred over tanh as activation functions in deep learning architectures due to their simplicity and better performance in avoiding vanishing gradients.


However, tanh can still be useful in specific cases, especially when a zero-centered output is desired or for certain network architectures where it performs well. As with all activation functions, the choice of tanh or other alternatives depends on the specific problem, the network's structure, and empirical experimentation to find the most suitable activation function for the given task.


Python Implementation

To implement the tanh function in the python programming language, all you have to do is write the following code.

The result of running this code is shown in the original image at the top of the article. You can run the code in the following colab notebook: tanh activation function

No comments:

Post a Comment