你还在为考试焦头烂额?找我们就对了!
我们知道现在是考试月,你正在为了考试复习到焦头烂额。为了让更多留学生在备考与学习季更轻松,我们决定将Gold会员限时免费开放至2025年12月31日!原价£29.99每月,如今登录即享!无门槛领取。
助你高效冲刺备考!
题目
11785/11685/11485 Quiz-10
数值题
We want to find the self-attention weights assigned to the tokens in the sequence “Attention is everything” using scaled dot product attention. A single head is used. The sequence is of length 3, and the dimensionality of the transformer is 4. Below is the input embedding of shape (3, 4). Note that this embedding is the sum of the token embedding and the position embedding. X = [1, 2, 3, 4] [5, 0, 7, 0] [9, 0, 1, 2] The weights of the Q, K, and V are: Wq = [0.3, 0.2, 0.8, 0.9] [0.4, 0.1, 0.4, 0.5] [0.5, 0.7, 0.2, 0.8] [0.8, 0.8, 0.7, 0.4] Wk = [0.3, 0.9, 0.2, 0.7] [0.5, 0.4, 0.2, 0.2] [0.1, 0.7, 0.3, 0.6] [0.8, 0.4, 0.5, 0.9] Wv = [0.2, 0.2, 0.3, 0.9] [0.2, 0.3, 0.8, 0.6] [0.7, 0.5, 0.9, 0.9] [1.0, 0.4, 0.2, 0.5] If a causal mask is applied, what attention weight does “is” assign to “everything” in the sequence “Attention is everything”? Give the answer to 2dp. Hint: Lecture 19 slides 17 - 27
选项
A.0
查看解析
标准答案
Please login to view
思路分析
We are asked to find the self-attention weight that the token 'is' (the second token in the sequence) assigns to the token 'everything' (the third token) under scaled dot-product attention with a causal mask.
First, restating the setup: we have a 3-token sequence 'Attention is everything' with transformer dimension 4. The model uses a single attention head, with given Q, K, V weight matrices. A causal mask is applie......Login to view full explanation登录即可查看完整答案
我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。
类似问题
As defined in Attention is All You Need, what is the size of the self-attention matrix in the encoder given the following English to Spanish translation: I am very handsome -> Soy muy guapo Please assume the following: d_k = d_q = 64 d_v = 32 Please ignore the <SOS> and <EOS> tokens. self-attention means Attention(Q, K, V) NOTE: Please round to the nearest integer. [Fill in the blank] rows[Fill in the blank] columns
What key mechanism do transformers use to process sequential data effectively?
What is the primary role of the self-attention mechanism in Transformer-based language models?
Consider the sentence “Mary went to the mall because she wanted a new pair of shoes.” This sentence is passed through an encoder-only transformer model. What model component enables it to learn that “she” refers to “Mary”? Hint: Lec 19.
更多留学生实用工具
希望你的学习变得更简单
为了让更多留学生在备考与学习季更轻松,我们决定将Gold 会员限时免费开放至2025年12月31日!