้ข˜็›ฎ
้ข˜็›ฎ

CS-7643-O01, OAN, OSZ Quiz #4: Module 3

ๆ•ฐๅ€ผ้ข˜

Context: Let's look at a simple example of why vanishing and exploding gradients occur in RNNs. Consider a univariate version of RNN with the following update rules ๐‘ง ( ๐‘ก ) = ๐‘ข ๐‘ฅ ( ๐‘ก ) + ๐‘ค โ„Ž ( ๐‘ก โˆ’ 1 ) โ„Ž ( ๐‘ก ) = ๐œ™ ( ๐‘ง ( ๐‘ก ) ) To keep things simple, let us assume ๐œ™ is the identity function, i.e., ๐œ™ ( ๐‘– ) = ๐‘– Consider we have the a final loss ๐ฟ , and computed the derivative of โˆ‚ ๐ฟ โˆ‚ โ„Ž ๐‘‡ for some ๐‘ก = ๐‘‡ Using the update rules, the value of โˆ‚ โ„Ž ๐‘‡ โˆ‚ โ„Ž 1 ย  comes out to be ๐‘ค ( ๐‘Ž ๐‘‡ + ๐‘ ) Main Question: What is the value of a? ย 

ๆŸฅ็œ‹่งฃๆž

ๆŸฅ็œ‹่งฃๆž

ๆ ‡ๅ‡†็ญ”ๆกˆ
Please login to view
ๆ€่ทฏๅˆ†ๆž
We start by restating the problem setup and what is being asked. Context: In a simple RNN with univariate x and identity activation, z(t) = u x(t) + w h(tโˆ’1) and h(t) = ฯ†(z(t)). Since ฯ† is the identity, h(t) = z(t) for all t. We consider a final loss L and the derivative โˆ‚L/โˆ‚h_T for time T. It is given that this derivative takes the form โˆ‚L/โˆ‚h_T = w (a_T + b), where a_T and b are terms that accumulate from the backpropagation through......Login to view full explanation

็™ปๅฝ•ๅณๅฏๆŸฅ็œ‹ๅฎŒๆ•ด็ญ”ๆกˆ

ๆˆ‘ไปฌๆ”ถๅฝ•ไบ†ๅ…จ็ƒ่ถ…50000้“่€ƒ่ฏ•ๅŽŸ้ข˜ไธŽ่ฏฆ็ป†่งฃๆž,็Žฐๅœจ็™ปๅฝ•,็ซ‹ๅณ่Žทๅพ—็ญ”ๆกˆใ€‚

็ฑปไผผ้—ฎ้ข˜

ย  What are the drawbacks of Recurrent Neural Networks (RNNs)? ย  I RNNs can only solve regression problems. II RNNs can only produce single-valued outputs. III RNNs suffer from vanishing gradients, which make it difficult to know which direction the parameters should move, and exploding gradients, which can make learning unstable. IV One can only use the sigmoid function as the activation function for its hidden layers. ย 

ย  Select all that apply to the figure below ย  I The h dots represents the intermediate output of the sequential operation. II It is the unrolling of a Recurrent Neural Network module. III It represents a feedforward layer where each module A is a neuron. IV It represents how a specific type of neural network can use sequential information. ย 

ๅ‡่ฎพไฝ ๅฐ่ฏ•ๅฐ†็ฅž็ป็ฝ‘็ปœๆ‹ŸๅˆๅˆฐไปŽๆญฃๅผฆๆ›ฒ็บฟๅ‡ฝๆ•ฐ้‡‡ๆ ท็š„ๆ•ฐๆฎไธญใ€‚ไฝ ็š„็ฝ‘็ปœๅชๆœ‰ไธ€ไธช่พ“ๅ…ฅ๏ผˆ็›ธ๏ผ‰ใ€‚ๅ“ช็ง็ฅž็ป็ฝ‘็ปœๆœ€้€‚ๅˆ๏ผŸ Suppose that you are trying to fit a neural network into data that were sampled from a sine-curve function. Your network has only one input (phase). Which neural network is best suited for this?

Given an n-character word, we want to predict which character would be the n+1th character in the sequence. For example, our input is โ€œpredictioโ€ (which is a 9 character word) and we have to predict what would be the 10th character. Which of the following neural network architectures would be best suited to complete this task?

ๆ›ดๅคš็•™ๅญฆ็”Ÿๅฎž็”จๅทฅๅ…ท

ๅŠ ๅ…ฅๆˆ‘ไปฌ๏ผŒ็ซ‹ๅณ่งฃ้” ๆตท้‡็œŸ้ข˜ ไธŽ ็‹ฌๅฎถ่งฃๆž๏ผŒ่ฎฉๅคไน ๅฟซไบบไธ€ๆญฅ๏ผ