你还在为考试焦头烂额?找我们就对了!

我们知道现在是考试月,你正在为了考试复习到焦头烂额。为了让更多留学生在备考与学习季更轻松,我们决定将Gold会员限时免费开放至2025年12月31日!原价£29.99每月,如今登录即享!无门槛领取。

助你高效冲刺备考!

题目
题目

BUSML 4382 SP2025 (36004) Final Exam- Requires Respondus LockDown Browser

单项选择题

What is the primary role of the self-attention mechanism in Transformer-based language models?

选项
A.To assign weights to output layers based on model depth.
B.To evaluate the importance of each word in a sentence relative to every other word.
C.To randomly initialize new tokens during training.
D.To reduce the training time by compressing input sequences.
查看解析

查看解析

标准答案
Please login to view
思路分析
First, let's lay out the core question and the given options in a clear way to ground the discussion. Question: What is the primary role of the self-attention mechanism in Transformer-based language models? Answer options: 1) To assign weights to output layers based on model depth. 2) To evaluate the importance of each word in a sentence relative to every other word. 3) To randomly initialize new tokens during training. 4) To reduce the training time by co......Login to view full explanation

登录即可查看完整答案

我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。

类似问题

As defined in Attention is All You Need, what is the size of the self-attention matrix in the encoder given the following English to Spanish translation: I am very handsome -> Soy muy guapo Please assume the following: d_k = d_q = 64 d_v = 32 Please ignore the <SOS> and <EOS> tokens. self-attention means Attention(Q, K, V) NOTE: Please round to the nearest integer. [Fill in the blank] rows[Fill in the blank] columns

What key mechanism do transformers use to process sequential data effectively? 

We want to find the self-attention weights assigned to the tokens in the sequence “Attention is everything” using scaled dot product attention. A single head is used. The sequence is of length 3, and the dimensionality of the transformer is 4. Below is the input embedding of shape (3, 4). Note that this embedding is the sum of the token embedding and the position embedding.   X = [1, 2, 3, 4]        [5, 0, 7, 0]        [9, 0, 1, 2]   The weights of the Q, K, and V are:   Wq = [0.3, 0.2, 0.8, 0.9]           [0.4, 0.1, 0.4, 0.5]           [0.5, 0.7, 0.2, 0.8]           [0.8, 0.8, 0.7, 0.4]   Wk = [0.3, 0.9, 0.2, 0.7]           [0.5, 0.4, 0.2, 0.2]           [0.1, 0.7, 0.3, 0.6]           [0.8, 0.4, 0.5, 0.9]   Wv = [0.2, 0.2, 0.3, 0.9]           [0.2, 0.3, 0.8, 0.6]           [0.7, 0.5, 0.9, 0.9]           [1.0, 0.4, 0.2, 0.5]   If a causal mask is applied, what attention weight does “is” assign to “everything” in the sequence “Attention is everything”? Give the answer to 2dp. Hint: Lecture 19 slides 17 - 27

Consider the sentence “Mary went to the mall because she wanted a new pair of shoes.” This sentence is passed through an encoder-only transformer model. What model component enables it to learn that “she” refers to “Mary”? Hint: Lec 19. 

更多留学生实用工具

为了让更多留学生在备考与学习季更轻松,我们决定将Gold 会员限时免费开放至2025年12月31日!