文件名称:1
介绍说明--下载内容均来自于网络,请自行研究使用
基于WEKA平台的文本聚类研究与实现
文本聚类是文本挖掘领域的一个重要研究分支,是聚类方法在文本处理领域的应用。本文对基于空间向量模型的文本聚类过程做了较深入的讨论和总结,利用文本语料库,基于数据挖掘工具研究并实现了文本聚类的过程。本文首先给出了文本聚类的思想和过程,回顾了文本聚类领域的已有成果,列举了文本聚类领域在特征表示、特征提取等方面的基础研究工作。另外,本文回顾了现有的文本聚类算法,以及常用的文本聚类效果评价指标。在研究了已有成果的基础上,本文利用20 Newsgroup文本语料库,针对向量空间表示模型,在开源的数据挖掘平台WEKA上实现了文本预处理和k-means聚类算法,并根据实际聚类效果,就文本表示、特征选择、特征降维、等方面提出优化方案。-Text clustering is an important field of text mining research branch, is the clustering in the field of text processing applications. In this paper, based on vector space model for text clustering process to do a more in-depth discussion and summary, the use of the text corpus, based on data mining tools to study and realize the document clustering process. This paper shows the ideas and text clustering process, reviewed the existing text clustering results of the field, citing the field of document clustering in the feature representation, feature extraction and other aspects of basic research. In addition, the paper reviews the existing text clustering algorithm, as well as common text clustering validity. In the study has been based on the results, we use 20 Newsgroup corpus, for the vector space representation model, in the WEKA open source data mining platform to achieve a text preprocessing and k-means clustering algorithm, and according to the actual clustering effect to the tex
文本聚类是文本挖掘领域的一个重要研究分支,是聚类方法在文本处理领域的应用。本文对基于空间向量模型的文本聚类过程做了较深入的讨论和总结,利用文本语料库,基于数据挖掘工具研究并实现了文本聚类的过程。本文首先给出了文本聚类的思想和过程,回顾了文本聚类领域的已有成果,列举了文本聚类领域在特征表示、特征提取等方面的基础研究工作。另外,本文回顾了现有的文本聚类算法,以及常用的文本聚类效果评价指标。在研究了已有成果的基础上,本文利用20 Newsgroup文本语料库,针对向量空间表示模型,在开源的数据挖掘平台WEKA上实现了文本预处理和k-means聚类算法,并根据实际聚类效果,就文本表示、特征选择、特征降维、等方面提出优化方案。-Text clustering is an important field of text mining research branch, is the clustering in the field of text processing applications. In this paper, based on vector space model for text clustering process to do a more in-depth discussion and summary, the use of the text corpus, based on data mining tools to study and realize the document clustering process. This paper shows the ideas and text clustering process, reviewed the existing text clustering results of the field, citing the field of document clustering in the feature representation, feature extraction and other aspects of basic research. In addition, the paper reviews the existing text clustering algorithm, as well as common text clustering validity. In the study has been based on the results, we use 20 Newsgroup corpus, for the vector space representation model, in the WEKA open source data mining platform to achieve a text preprocessing and k-means clustering algorithm, and according to the actual clustering effect to the tex
(系统自动生成,下载前可以参看下载内容)
下载文件列表
基于WEKA平台的文本聚类研究与实现.pdf