题目
Business Analysis with Unstructured Data - DAT-7471 - BMBAN2 In-class knowledge check #2 (Remotely Proctored)
多项选择题
When analyzing articles, the tf-idf-tf_idf framework is used to:
选项
A.identify tokens or terms that are most frequent to each article
B.identify tokens or terms that are most important/specific to each article
C.identify tokens or terms that are both, most frequent and most important/specific to each article
查看解析
标准答案
Please login to view
思路分析
To analyze how tf-idf-tf_idf works, we need to consider what the metric is designed to do across a collection of documents.
Option 1: 'identify tokens or terms that are most frequent to each article' – While term frequency within a document can be high for some words, tf-idf specifically downplays terms that are merely frequent across many documents and emphasizes terms that are distinctive for that documen......Login to view full explanation登录即可查看完整答案
我们收录了全球超50000道考试原题与详细解析,现在登录,立即获得答案。
类似问题
By vectorizing text using TF-IDF approach we lose some information contained in the raw document:
The TF-IDF approach considers information about the occurrences of tokens in all documents of a text corpus:
The term frequency - inverse document frequency (TF-IDF) approach to text vectorization is based on the bag-of-words representation:
In a consumer society, many adults channel creativity into buying things
更多留学生实用工具
希望你的学习变得更简单
加入我们,立即解锁 海量真题 与 独家解析,让复习快人一步!