文件名称:PMl-IR
介绍说明--下载内容均来自于网络,请自行研究使用
Blog信息源和信息量的广泛增长给中文文本分类带来了新的挑战。本文提出了—种基于PMI—IR算法的四种情感分类方法来对Blog文本进行情感分类。该方法以情感词语为中心,通过搜索引擎返回的结果来计算文本
中的情感要素和背景情感词之问的点互信息值,从而对文本进行情感分类。该方法在国家语言资源监测与研究中心网络媒体语言分中-心2008年度的Blog语料和COAE2008的语料上分别进行了测试。与传统方法相比准确率和召回率都有了较大的提高。-Development ofBIog texts information on the internet has brought new challenge tO Chinese text classification.Aim to solving thesemantics deficiency problem in traditional methods for Chinese text classification,this paper
implements a text classification method on classifying a blog asjoy,angry,sador/ bar us/ng a simple unsupervised learning algorithm.The classification ofa.blog text is predicted by the max semantic orientation(SO)ofthe phrases in the blog text that contains删ectives or adverbs.In this paper,the SO ofa phrase is calculated as the mutual information between the given phrase and thepolar words.Then the SO ofthe given blog text is determined by the maxmutual information value.A
blog text is classified asjoy ifthe SO ofits phrases isjoy.Two different corpora are adopted to test our method,one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch
Center,and the other is Chinese dataset provided by COAE200
中的情感要素和背景情感词之问的点互信息值,从而对文本进行情感分类。该方法在国家语言资源监测与研究中心网络媒体语言分中-心2008年度的Blog语料和COAE2008的语料上分别进行了测试。与传统方法相比准确率和召回率都有了较大的提高。-Development ofBIog texts information on the internet has brought new challenge tO Chinese text classification.Aim to solving thesemantics deficiency problem in traditional methods for Chinese text classification,this paper
implements a text classification method on classifying a blog asjoy,angry,sador/ bar us/ng a simple unsupervised learning algorithm.The classification ofa.blog text is predicted by the max semantic orientation(SO)ofthe phrases in the blog text that contains删ectives or adverbs.In this paper,the SO ofa phrase is calculated as the mutual information between the given phrase and thepolar words.Then the SO ofthe given blog text is determined by the maxmutual information value.A
blog text is classified asjoy ifthe SO ofits phrases isjoy.Two different corpora are adopted to test our method,one is the Blog corpus collected by Monitor and Research Center for National Language Resource Network Multimedia Sub-branch
Center,and the other is Chinese dataset provided by COAE200
(系统自动生成,下载前可以参看下载内容)
下载文件列表
PMl-IR.pdf