้ข็ฎ
CS-7643-O01, OAN, OSZ Quiz #4: Module 3
ๆฐๅผ้ข
Context: Let's look at a simple example of why vanishing and exploding gradients occur in RNNs. Consider a univariate version of RNN with the following update rules ๐ง ( ๐ก ) = ๐ข ๐ฅ ( ๐ก ) + ๐ค โ ( ๐ก โ 1 ) โ ( ๐ก ) = ๐ ( ๐ง ( ๐ก ) ) To keep things simple, let us assume ๐ is the identity function, i.e., ๐ ( ๐ ) = ๐ Consider we have the a final loss ๐ฟ , and computed the derivative of โ ๐ฟ โ โ ๐ for some ๐ก = ๐ Using the update rules, the value of โ โ ๐ โ โ 1 ย comes out to be ๐ค ( ๐ ๐ + ๐ ) Main Question: What is the value of a? ย
ๆฅ็่งฃๆ
ๆ ๅ็ญๆก
Please login to view
ๆ่ทฏๅๆ
We start by restating the problem setup and what is being asked.
Context: In a simple RNN with univariate x and identity activation, z(t) = u x(t) + w h(tโ1) and h(t) = ฯ(z(t)). Since ฯ is the identity, h(t) = z(t) for all t. We consider a final loss L and the derivative โL/โh_T for time T. It is given that this derivative takes the form โL/โh_T = w (a_T + b), where a_T and b are terms that accumulate from the backpropagation through......Login to view full explanation็ปๅฝๅณๅฏๆฅ็ๅฎๆด็ญๆก
ๆไปฌๆถๅฝไบๅ จ็่ถ 50000้่่ฏๅ้ขไธ่ฏฆ็ป่งฃๆ,็ฐๅจ็ปๅฝ,็ซๅณ่ทๅพ็ญๆกใ
็ฑปไผผ้ฎ้ข
ย What are the drawbacks of Recurrent Neural Networks (RNNs)? ย I RNNs can only solve regression problems. II RNNs can only produce single-valued outputs. III RNNs suffer from vanishing gradients, which make it difficult to know which direction the parameters should move, and exploding gradients, which can make learning unstable. IV One can only use the sigmoid function as the activation function for its hidden layers. ย
ย Select all that apply to the figure below ย I The h dots represents the intermediate output of the sequential operation. II It is the unrolling of a Recurrent Neural Network module. III It represents a feedforward layer where each module A is a neuron. IV It represents how a specific type of neural network can use sequential information. ย
ๅ่ฎพไฝ ๅฐ่ฏๅฐ็ฅ็ป็ฝ็ปๆๅๅฐไปๆญฃๅผฆๆฒ็บฟๅฝๆฐ้ๆ ท็ๆฐๆฎไธญใไฝ ็็ฝ็ปๅชๆไธไธช่พๅ ฅ๏ผ็ธ๏ผใๅช็ง็ฅ็ป็ฝ็ปๆ้ๅ๏ผ Suppose that you are trying to fit a neural network into data that were sampled from a sine-curve function. Your network has only one input (phase). Which neural network is best suited for this?
Given an n-character word, we want to predict which character would be the n+1th character in the sequence. For example, our input is โpredictioโ (which is a 9 character word) and we have to predict what would be the 10th character. Which of the following neural network architectures would be best suited to complete this task?
ๆดๅค็ๅญฆ็ๅฎ็จๅทฅๅ ท
ๅธๆไฝ ็ๅญฆไน ๅๅพๆด็ฎๅ
ๅ ๅ ฅๆไปฌ๏ผ็ซๅณ่งฃ้ ๆตท้็้ข ไธ ็ฌๅฎถ่งฃๆ๏ผ่ฎฉๅคไน ๅฟซไบบไธๆญฅ๏ผ