你还在为考试焦头烂额?找我们就对了!

我们知道现在是考试月,你正在为了考试复习到焦头烂额。为了让更多留学生在备考与学习季更轻松,我们决定将Gold会员限时免费开放至2025年12月31日!原价£29.99每月,如今登录即享!无门槛领取。

助你高效冲刺备考!

题目
题目

11785/11685/11485 Quiz-10

数值题

We want to find the self-attention weights assigned to the tokens in the sequence “Attention is everything” using scaled dot product attention. A single head is used. The sequence is of length 3, and the dimensionality of the transformer is 4. Below is the input embedding of shape (3, 4). Note that this embedding is the sum of the token embedding and the position embedding.   X = [1, 2, 3, 4]        [5, 0, 7, 0]        [9, 0, 1, 2]   The weights of the Q, K, and V are:   Wq = [0.3, 0.2, 0.8, 0.9]           [0.4, 0.1, 0.4, 0.5]           [0.5, 0.7, 0.2, 0.8]           [0.8, 0.8, 0.7, 0.4]   Wk = [0.3, 0.9, 0.2, 0.7]           [0.5, 0.4, 0.2, 0.2]           [0.1, 0.7, 0.3, 0.6]           [0.8, 0.4, 0.5, 0.9]   Wv = [0.2, 0.2, 0.3, 0.9]           [0.2, 0.3, 0.8, 0.6]           [0.7, 0.5, 0.9, 0.9]           [1.0, 0.4, 0.2, 0.5]   If a causal mask is applied, what attention weight does “is” assign to “everything” in the sequence “Attention is everything”? Give the answer to 2dp. Hint: Lecture 19 slides 17 - 27

选项
A.0
查看解析

查看解析

标准答案
Please login to view
思路分析
We are asked to find the self-attention weight that the token 'is' (the second token in the sequence) assigns to the token 'everything' (the third token) under scaled dot-product attention with a causal mask. First, restating the setup: we have a 3-token sequence 'Attention is everything' with transformer dimension 4. The model uses a single attention head, with given Q, K, V weight matrices. A causal mask is applie......Login to view full explanation

登录即可查看完整答案

我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。

更多留学生实用工具

为了让更多留学生在备考与学习季更轻松,我们决定将Gold 会员限时免费开放至2025年12月31日!