文件名称:LJClusterDemo
介绍说明--下载内容均来自于网络,请自行研究使用
文本聚类是基于相似性算法的自动聚类技术,自动对大量无类别的文档进行归类,把内容相近的文档归为一类,并自动为该类生成特征主题词。适用于自动生成热点*专题、重大新闻事件追踪、情报的可视化分析等诸多应用。
灵玖Lingjoin(www.lingjoin.com)基于核心特征发现技术,突破了传统聚类方法空间消耗大,处理时间长的瓶颈;不仅聚类速度快,而且准确率高,内存消耗小,特别适合于超大规模的语料聚类和短文本的语料聚类。
灵玖文档聚类组件的主要特色在于:
1、速度快:可以处理海量规模的网络文本数据,平均每小时处理至少50万篇文档;
2、聚类精准:Top N的聚类中心往往能反映出当时的时事热点,适合于舆情热点计算;与国际上以聚类见长的Autonomy公司技术相比,灵玖的各项指标远远领先,或许是灵玖更懂中文吧
3、精准排序:各个类别按照影响权重排序,每个类中的文档按照重要性排序;
4、可定制:可以定制类别数、类别中心。
5、开放式接口:灵玖文档聚类组件作为LJParser的一部分,采用灵活的开发接口,可以方便地融入到用户的业务系统中,可以支持各种操作系统,各类调用语言。
灵玖文档聚类可以应用于文本挖掘、知识管理、搜索聚类、舆情监测等多种应用中。
-Text clustering algorithm is based on the similarity of automatic clustering techniques, automatically a large number of non-classified categories of documents, the contents of the documents fall into a similar category, and automatically generate the features for this kind of keywords. For automatic generation of hot topics of public opinion, major news event tracking, information visualization analysis and many other applications.
Ling Jiu Lingjoin (www.lingjoin.com) found that based on the core features of technology, a breakthrough of traditional clustering method of space consumption, processing time is long bottlenecks not only the clustering speed and high accuracy, memory consumption is small, is particularly suitable for ultra-large-scale corpus clustering and short text corpus clustering.
Ling-Jiu document clustering component of the main features are:
1, fast: the size of the network can handle the massive text data, the average hourly processing at least 50 mil
灵玖Lingjoin(www.lingjoin.com)基于核心特征发现技术,突破了传统聚类方法空间消耗大,处理时间长的瓶颈;不仅聚类速度快,而且准确率高,内存消耗小,特别适合于超大规模的语料聚类和短文本的语料聚类。
灵玖文档聚类组件的主要特色在于:
1、速度快:可以处理海量规模的网络文本数据,平均每小时处理至少50万篇文档;
2、聚类精准:Top N的聚类中心往往能反映出当时的时事热点,适合于舆情热点计算;与国际上以聚类见长的Autonomy公司技术相比,灵玖的各项指标远远领先,或许是灵玖更懂中文吧
3、精准排序:各个类别按照影响权重排序,每个类中的文档按照重要性排序;
4、可定制:可以定制类别数、类别中心。
5、开放式接口:灵玖文档聚类组件作为LJParser的一部分,采用灵活的开发接口,可以方便地融入到用户的业务系统中,可以支持各种操作系统,各类调用语言。
灵玖文档聚类可以应用于文本挖掘、知识管理、搜索聚类、舆情监测等多种应用中。
-Text clustering algorithm is based on the similarity of automatic clustering techniques, automatically a large number of non-classified categories of documents, the contents of the documents fall into a similar category, and automatically generate the features for this kind of keywords. For automatic generation of hot topics of public opinion, major news event tracking, information visualization analysis and many other applications.
Ling Jiu Lingjoin (www.lingjoin.com) found that based on the core features of technology, a breakthrough of traditional clustering method of space consumption, processing time is long bottlenecks not only the clustering speed and high accuracy, memory consumption is small, is particularly suitable for ultra-large-scale corpus clustering and short text corpus clustering.
Ling-Jiu document clustering component of the main features are:
1, fast: the size of the network can handle the massive text data, the average hourly processing at least 50 mil
(系统自动生成,下载前可以参看下载内容)
下载文件列表
LJClusterDemo\Dict.pdat
.............\Dict.wordlist
.............\LJCluster.dll
.............\LJClusterDemo.exe
.............\LJCluster说明.txt
.............\stop.ung
LJClusterDemo
.............\Dict.wordlist
.............\LJCluster.dll
.............\LJClusterDemo.exe
.............\LJCluster说明.txt
.............\stop.ung
LJClusterDemo