你还在为考试焦头烂额?找我们就对了!

我们知道现在是考试月,你正在为了考试复习到焦头烂额。为了让更多留学生在备考与学习季更轻松,我们决定将Gold会员限时免费开放至2025年12月31日!原价£29.99每月,如今登录即享!无门槛领取。

助你高效冲刺备考!

题目
题目

EE699-009/EE599-009 Quiz 5 - Swin

判断题

Cyclic Shift + Masked MSA are necessary for the correct operation of SW-MSA; their absence will render SW-MSA either non-functional or produce incorrect results.

选项
A.True
B.False
查看解析

查看解析

标准答案
Please login to view
思路分析
This statement concerns the role of cyclic shift and masking in SW-MSA. Option 1 (True): Proposing that cyclic shift plus masked MSA are strictly necessary for SW-MSA to function implies there is no workable alternative configuration. In practice, SW-MSA can operate in variants that do not use......Login to view full explanation

登录即可查看完整答案

我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。

类似问题

As defined in Attention is All You Need, what is the size of the self-attention matrix in the encoder given the following English to Spanish translation: I am very handsome -> Soy muy guapo Please assume the following: d_k = d_q = 64 d_v = 32 Please ignore the <SOS> and <EOS> tokens. self-attention means Attention(Q, K, V) NOTE: Please round to the nearest integer. [Fill in the blank] rows[Fill in the blank] columns

What key mechanism do transformers use to process sequential data effectively? 

What is the primary role of the self-attention mechanism in Transformer-based language models?

We want to find the self-attention weights assigned to the tokens in the sequence “Attention is everything” using scaled dot product attention. A single head is used. The sequence is of length 3, and the dimensionality of the transformer is 4. Below is the input embedding of shape (3, 4). Note that this embedding is the sum of the token embedding and the position embedding.   X = [1, 2, 3, 4]        [5, 0, 7, 0]        [9, 0, 1, 2]   The weights of the Q, K, and V are:   Wq = [0.3, 0.2, 0.8, 0.9]           [0.4, 0.1, 0.4, 0.5]           [0.5, 0.7, 0.2, 0.8]           [0.8, 0.8, 0.7, 0.4]   Wk = [0.3, 0.9, 0.2, 0.7]           [0.5, 0.4, 0.2, 0.2]           [0.1, 0.7, 0.3, 0.6]           [0.8, 0.4, 0.5, 0.9]   Wv = [0.2, 0.2, 0.3, 0.9]           [0.2, 0.3, 0.8, 0.6]           [0.7, 0.5, 0.9, 0.9]           [1.0, 0.4, 0.2, 0.5]   If a causal mask is applied, what attention weight does “is” assign to “everything” in the sequence “Attention is everything”? Give the answer to 2dp. Hint: Lecture 19 slides 17 - 27

更多留学生实用工具

为了让更多留学生在备考与学习季更轻松,我们决定将Gold 会员限时免费开放至2025年12月31日!