Questions
Questions

11785/11685/11485 Quiz-14

Multiple choice

 Consider a single-headed attention layer. What happens to the dimensions of the value weight matrix Wv, when we double the maximum input sequence length? Select all that apply

Options
A.None of the above
B.Half the number of columns
C.Half the number of rows
D.Double the number of rows
E.Double the number of columns
View Explanation

View Explanation

Verified Answer
Please login to view
Step-by-Step Analysis
Start by recalling the role of the value weight matrix Wv in a single-headed attention layer: Wv is the projection that maps the input feature dimension (commonly denoted as d_model) to the value space (often denoted as d_v). Crucially, this matrix is defined by feature dimensions, not by the sequence length. Option 1: None of th......Login to view full explanation

Log in for full answers

We've collected over 50,000 authentic exam questions and detailed explanations from around the globe. Log in now and get instant access to the answers!

Similar Questions

More Practical Tools for Students Powered by AI Study Helper

Join us and instantly unlock extensive past papers & exclusive solutions to get a head start on your studies!