你还在为考试焦头烂额?找我们就对了!

我们知道现在是考试月,你正在为了考试复习到焦头烂额。为了让更多留学生在备考与学习季更轻松,我们决定将Gold会员限时免费开放至2025年12月31日!原价£29.99每月,如今登录即享!无门槛领取。

助你高效冲刺备考!

题目
题目

Learning AI Through Visualization 4 Module 5 Quiz

单项选择题

What key mechanism do transformers use to process sequential data effectively? 

选项
A.Convolutional operations
B.Recurrent connections
C.Self-attention
查看解析

查看解析

标准答案
Please login to view
思路分析
When evaluating how transformers handle sequential data, it helps to compare the listed options with the core design of the architecture. Option 1: Convolutional operations. While convolutions are powerful for capturing local patterns and are......Login to view full explanation

登录即可查看完整答案

我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。

类似问题

As defined in Attention is All You Need, what is the size of the self-attention matrix in the encoder given the following English to Spanish translation: I am very handsome -> Soy muy guapo Please assume the following: d_k = d_q = 64 d_v = 32 Please ignore the <SOS> and <EOS> tokens. self-attention means Attention(Q, K, V) NOTE: Please round to the nearest integer. [Fill in the blank] rows[Fill in the blank] columns

What is the primary role of the self-attention mechanism in Transformer-based language models?

We want to find the self-attention weights assigned to the tokens in the sequence “Attention is everything” using scaled dot product attention. A single head is used. The sequence is of length 3, and the dimensionality of the transformer is 4. Below is the input embedding of shape (3, 4). Note that this embedding is the sum of the token embedding and the position embedding.   X = [1, 2, 3, 4]        [5, 0, 7, 0]        [9, 0, 1, 2]   The weights of the Q, K, and V are:   Wq = [0.3, 0.2, 0.8, 0.9]           [0.4, 0.1, 0.4, 0.5]           [0.5, 0.7, 0.2, 0.8]           [0.8, 0.8, 0.7, 0.4]   Wk = [0.3, 0.9, 0.2, 0.7]           [0.5, 0.4, 0.2, 0.2]           [0.1, 0.7, 0.3, 0.6]           [0.8, 0.4, 0.5, 0.9]   Wv = [0.2, 0.2, 0.3, 0.9]           [0.2, 0.3, 0.8, 0.6]           [0.7, 0.5, 0.9, 0.9]           [1.0, 0.4, 0.2, 0.5]   If a causal mask is applied, what attention weight does “is” assign to “everything” in the sequence “Attention is everything”? Give the answer to 2dp. Hint: Lecture 19 slides 17 - 27

Consider the sentence “Mary went to the mall because she wanted a new pair of shoes.” This sentence is passed through an encoder-only transformer model. What model component enables it to learn that “she” refers to “Mary”? Hint: Lec 19. 

更多留学生实用工具

为了让更多留学生在备考与学习季更轻松,我们决定将Gold 会员限时免费开放至2025年12月31日!