文件名称:20_newsgroups.tar
介绍说明--下载内容均来自于网络,请自行研究使用
20NewsGroups用于文本分类、文本挖据和信息检索研究的国际标准数据集之一。-20Newsgroups corpus contains 19997 articles for 20 categories taken from the Usenet newsgroups collection. We used the subject and the body of each message only. Some of the newsgroups are very closely related to each other (e.g., IBM computer system hardware /Macintosh computer system hardware), while others are highly unrelated (e.g. misc forsale/social religion and christian). This corpus is different from the previous corpora because it includes a larger vocabulary and words typically have more meanings. Moreover, the stylistic writing (e-mail dialogues) is very distant from the other more technical collections.
(系统自动生成,下载前可以参看下载内容)
下载文件列表
6154982320_newsgroups.tar