TOPICS (Click to Navigate)

Pages

Wednesday, July 13, 2022

Machine Learning MCQ - Potential problem with Sigmoid activation function

Multiple choices questions in Machine learning. Interview questions on machine learning, quiz questions for data scientist answers explained, What is the potential problem with Sigmoid activation function in neural net? Which activation function causes vanishing gradient problem? How vanishing gradient problem caused? What does Sigmoid do? How to solve "vanishing gradient" problem in neural network?

Machine Learning MCQ - Potential problem with Sigmoid activation function in neural network

< Previous                      

Next >

 

1. Which among the following is one of the major problems with Sigmoid activation function in neural network?

a) It is convex, and convex functions cannot solve non-convex problems

b) It does not work well with the entropy loss function

c) Gradients are small for values away from 0, leading to the "Vanishing Gradient" problem for large or recurrent neural nets

d) It can have negative values

Answer: (c) Gradients are small for values away from 0, leading to the "Vanishing Gradient" problem for large or recurrent neural nets

 

The biggest disadvantage with the Sigmoid activation function is the problem of Vanishing Gradient. During backpropagation, on moving towards deep networks, the gradient becomes very close to 0. So, weight doesn’t get updated much and that leads to very slow convergence. If the gradient reaches 0, no learning happens.

The output of sigmoid saturates (i.e. the curve becomes parallel to x-axis) for a large positive or large negative number. Thus, the gradient at these regions is almost zero. During backpropagation, this local gradient is multiplied with the gradient of this gates’ output. Thus, if the local gradient is very small, it’ll kill the the gradient and the network will not learn. [Refer here for more]

 

Sigmoid activation function

It is a logarithmic function with a characteristic S shape. The output value of the function is between 0 and 1. The main purpose of the activation function is to maintain the output or predicted value in the particular range, which makes the good efficiency and accuracy of the model.

 

What is “vanishing gradient” problem? [Source: here]

The vanishing gradient problem is essentially a situation in which a deep multilayer feed-forward network or a recurrent neural network (RNN) does not have the ability to propagate useful gradient information from the output end of the model back to the layers near the input end of the model.

When the backpropagation algorithm advances downwards(or backward) going from the output layer to the input layer, the gradients tend to shrink, becoming smaller and smaller till they approach zero. This ends up leaving the weights of the initial or lower layers practically unchanged. In this situation, the gradient descent does not ever end up converging to the optimum.

Vanishing gradient does not necessarily imply that the gradient vector is all zero (with the exception of numerical overflow). It implies that the gradients are minuscule, which would cause the learning to be very slow.

 

How to solve “vanishing gradient” problem?

“vanishing gradient” problem can be solved by ReLU (Rectified Linear Unit) activation function. ReLU is an activation function that generates a positive linear output when applied to positive input values. If the input is negative, the function will return zero.

 

 

  

< Previous                      

Next >

 

************************

Related links:

What is an activation function in neural network?

Define sigmoid activation function

How to solve "vanishing gradient" problem in neural network

What causes "vanishing gradient" problem?

Explain "vanishing gradient" problem in layman terms

Why Sigmoid causes "vanishing gradient" problem?

Machine learning solved mcq, machine learning solved mcq

No comments:

Post a Comment