Multiple choices questions in Machine learning. Interview questions on machine learning, quiz questions for data scientist answers explained, What is the potential problem with Sigmoid activation function in neural net? Which activation function causes vanishing gradient problem? How vanishing gradient problem caused? What does Sigmoid do? How to solve "vanishing gradient" problem in neural network?
Machine Learning MCQ - Potential problem with Sigmoid activation function in neural network
1. Which among the following is one of the major problems with Sigmoid activation function in neural network?
a) It is convex, and convex functions cannot solve non-convex problems
b) It does not work well with the entropy loss function
c) Gradients are small for values away from 0, leading to the "Vanishing Gradient" problem for large or recurrent neural nets
d) It can have negative values
Answer: (c) Gradients are small for values away from 0, leading to the "Vanishing Gradient" problem for large or recurrent neural nets
The biggest disadvantage with the Sigmoid activation function is the problem of Vanishing Gradient. During backpropagation, on moving towards deep networks, the gradient becomes very close to 0. So, weight doesn’t get updated much and that leads to very slow convergence. If the gradient reaches 0, no learning happens. The output of
sigmoid saturates (i.e. the curve becomes parallel to x-axis) for a large
positive or large negative number. Thus, the gradient at these regions is
almost zero. During backpropagation, this local gradient is multiplied with
the gradient of this gates’ output. Thus, if the local gradient is very
small, it’ll kill the the gradient and the network will not learn. [Refer here for more]
Sigmoid activation functionIt is a logarithmic function with a characteristic S shape. The output value of the function is between 0 and 1. The main purpose of the activation function is to maintain the output or predicted value in the particular range, which makes the good efficiency and accuracy of the model.
What is “vanishing gradient” problem? [Source: here]The vanishing gradient problem is essentially a situation in which a deep multilayer feed-forward network or a recurrent neural network (RNN) does not have the ability to propagate useful gradient information from the output end of the model back to the layers near the input end of the model. When the backpropagation algorithm advances downwards(or backward) going from the output layer to the input layer, the gradients tend to shrink, becoming smaller and smaller till they approach zero. This ends up leaving the weights of the initial or lower layers practically unchanged. In this situation, the gradient descent does not ever end up converging to the optimum. Vanishing gradient does not necessarily imply that the gradient vector is all zero (with the exception of numerical overflow). It implies that the gradients are minuscule, which would cause the learning to be very slow.
How to solve “vanishing gradient” problem?“vanishing gradient” problem can be solved by ReLU (Rectified Linear Unit) activation function. ReLU is an activation function that generates a positive linear output when applied to positive input values. If the input is negative, the function will return zero.
|
No comments:
Post a Comment