文件名称:IR
- 所属分类:
- JSP源码/Java
- 资源属性:
- [Java] [源码]
- 上传时间:
- 2012-11-26
- 文件大小:
- 3.64mb
- 下载次数:
- 0次
- 提 供 者:
- 赵
- 相关连接:
- 无
- 下载说明:
- 别用迅雷下载,失败请重下,重下不扣分!
介绍说明--下载内容均来自于网络,请自行研究使用
索引词的选择
1、 切词及词频统计:利用已选择的分词软件对文档进行切词处理,并进行词频统计,形成DocIndex文件,结构为:文档号、频率、词。注意保留中间结果,建立合理的数据结构来存储。
2、 分配词权重: 采用词频标准化(tfi = tfi/Max(tf))和tf*idf两种方式分配词的权重。由DocIndex文件生成DocIndex(tf) 和DocIndex(tf*idf)文件。注意阈值的确定,词的取舍。
3、 形成倒置文档:将DocIndex(tf) 和DocIndex(tf*idf)文件转换为DocInvert(tf) 和DocInvert (tf*idf)文件。-Index word choice, the cut word and word frequency statistics: the use of the selected word segmentation software documentation the cut word processing, and word frequency statistics to the formation DocIndex file structure: document number, frequency, word. Note retain intermediate results, establish a reasonable data structure to store. 2, is assigned the term weight: the using word frequency Standardization (TFI = the TFI/Max (TF)) and tf* idf two ways to allocate the right of the word weight. Generated by DocIndex file DocIndex (tf) and DocIndex (tf* idf) files. Attention to the determination of the threshold, the word choice. 3, the formation of the inverted document: the DocIndex (tf) and DocIndex (tf* idf) files into DocInvert (tf) and DocInvert (tf* idf) files.
1、 切词及词频统计:利用已选择的分词软件对文档进行切词处理,并进行词频统计,形成DocIndex文件,结构为:文档号、频率、词。注意保留中间结果,建立合理的数据结构来存储。
2、 分配词权重: 采用词频标准化(tfi = tfi/Max(tf))和tf*idf两种方式分配词的权重。由DocIndex文件生成DocIndex(tf) 和DocIndex(tf*idf)文件。注意阈值的确定,词的取舍。
3、 形成倒置文档:将DocIndex(tf) 和DocIndex(tf*idf)文件转换为DocInvert(tf) 和DocInvert (tf*idf)文件。-Index word choice, the cut word and word frequency statistics: the use of the selected word segmentation software documentation the cut word processing, and word frequency statistics to the formation DocIndex file structure: document number, frequency, word. Note retain intermediate results, establish a reasonable data structure to store. 2, is assigned the term weight: the using word frequency Standardization (TFI = the TFI/Max (TF)) and tf* idf two ways to allocate the right of the word weight. Generated by DocIndex file DocIndex (tf) and DocIndex (tf* idf) files. Attention to the determination of the threshold, the word choice. 3, the formation of the inverted document: the DocIndex (tf) and DocIndex (tf* idf) files into DocInvert (tf) and DocInvert (tf* idf) files.
(系统自动生成,下载前可以参看下载内容)
下载文件列表
信息检索\ir_work1\.classpath
........\........\.project
........\........\.settings\org.eclipse.core.resources.prefs
........\........\.........\org.eclipse.jdt.core.prefs
........\........\bin\org\main\CreateIndexDocument.class
........\........\...\...\....\CreateInvertDocument.class
........\........\...\...\....\Util$1.class
........\........\...\...\....\Util.class
........\........\...\pojo\Token.class
........\........\src\org\main\CreateIndexDocument.java
........\........\...\...\....\CreateInvertDocument.java
........\........\...\...\....\Util.java
........\........\...\pojo\Token.java
........\paoding\analyzer.bat
........\.......\analyzer.sh
........\.......\build.bat
........\.......\build.xml
........\.......\classes\net\paoding\analysis\analyzer\estimate\Estimate$CToken.class
........\.......\.......\...\.......\........\........\........\Estimate$LinePrintGate.class
........\.......\.......\...\.......\........\........\........\Estimate$PrintGate.class
........\.......\.......\...\.......\........\........\........\Estimate$PrintGateToken.class
........\.......\.......\...\.......\........\........\........\Estimate$StringReaderEx.class
........\.......\.......\...\.......\........\........\........\Estimate.class
........\.......\.......\...\.......\........\........\........\TryPaodingAnalyzer.class
........\.......\.......\...\.......\........\........\impl\CompiledFileDictionaries$1.class
........\.......\.......\...\.......\........\........\....\CompiledFileDictionaries.class
........\.......\.......\...\.......\........\........\....\MaxWordLengthTokenCollector.class
........\.......\.......\...\.......\........\........\....\MostWordsModeDictionariesCompiler$1.class
........\.......\.......\...\.......\........\........\....\MostWordsModeDictionariesCompiler.class
........\.......\.......\...\.......\........\........\....\MostWordsTokenCollector$LinkedToken.class
........\.......\.......\...\.......\........\........\....\MostWordsTokenCollector.class
........\.......\.......\...\.......\........\........\....\SortingDictionariesCompiler.class
........\.......\.......\...\.......\........\........\PaodingAnalyzer.class
........\.......\.......\...\.......\........\........\PaodingAnalyzerBean.class
........\.......\.......\...\.......\........\........\PaodingTokenizer.class
........\.......\.......\...\.......\........\........\TokenCollector.class
........\.......\.......\...\.......\........\Constants.class
........\.......\.......\...\.......\........\dictionary\BinaryDictionary.class
........\.......\.......\...\.......\........\..........\Dictionary.class
........\.......\.......\...\.......\........\..........\DictionaryDelegate.class
........\.......\.......\...\.......\........\..........\HashBinaryDictionary$SubDictionaryWrap.class
........\.......\.......\...\.......\........\..........\HashBinaryDictionary.class
........\.......\.......\...\.......\........\..........\Hit.class
........\.......\.......\...\.......\........\..........\support\detection\Detector$1.class
........\.......\.......\...\.......\........\..........\.......\.........\Detector.class
........\.......\.......\...\.......\........\..........\.......\.........\Difference.class
........\.......\.......\...\.......\........\..........\.......\.........\DifferenceListener.class
........\.......\.......\...\.......\........\..........\.......\.........\ExtensionFileFilter.class
........\.......\.......\...\.......\........\..........\.......\.........\Node.class
........\.......\.......\...\.......\........\..........\.......\.........\Snapshot$InnerNode.class
........\.......\.......\...\.......\........\..........\.......\.........\Snapshot.class
........\.......\.......\...\.......\........\..........\.......\filewords\FileWordsReader.class
........\.......\.......\...\.......\........\..........\.......\.........\ReadListener.class
........\.......\.......\...\.......\........\..........\.......\.........\SimpleReadListener.class
........\.......\.......\...\.......\........\..........\.......\.........\SimpleRea
........\........\.project
........\........\.settings\org.eclipse.core.resources.prefs
........\........\.........\org.eclipse.jdt.core.prefs
........\........\bin\org\main\CreateIndexDocument.class
........\........\...\...\....\CreateInvertDocument.class
........\........\...\...\....\Util$1.class
........\........\...\...\....\Util.class
........\........\...\pojo\Token.class
........\........\src\org\main\CreateIndexDocument.java
........\........\...\...\....\CreateInvertDocument.java
........\........\...\...\....\Util.java
........\........\...\pojo\Token.java
........\paoding\analyzer.bat
........\.......\analyzer.sh
........\.......\build.bat
........\.......\build.xml
........\.......\classes\net\paoding\analysis\analyzer\estimate\Estimate$CToken.class
........\.......\.......\...\.......\........\........\........\Estimate$LinePrintGate.class
........\.......\.......\...\.......\........\........\........\Estimate$PrintGate.class
........\.......\.......\...\.......\........\........\........\Estimate$PrintGateToken.class
........\.......\.......\...\.......\........\........\........\Estimate$StringReaderEx.class
........\.......\.......\...\.......\........\........\........\Estimate.class
........\.......\.......\...\.......\........\........\........\TryPaodingAnalyzer.class
........\.......\.......\...\.......\........\........\impl\CompiledFileDictionaries$1.class
........\.......\.......\...\.......\........\........\....\CompiledFileDictionaries.class
........\.......\.......\...\.......\........\........\....\MaxWordLengthTokenCollector.class
........\.......\.......\...\.......\........\........\....\MostWordsModeDictionariesCompiler$1.class
........\.......\.......\...\.......\........\........\....\MostWordsModeDictionariesCompiler.class
........\.......\.......\...\.......\........\........\....\MostWordsTokenCollector$LinkedToken.class
........\.......\.......\...\.......\........\........\....\MostWordsTokenCollector.class
........\.......\.......\...\.......\........\........\....\SortingDictionariesCompiler.class
........\.......\.......\...\.......\........\........\PaodingAnalyzer.class
........\.......\.......\...\.......\........\........\PaodingAnalyzerBean.class
........\.......\.......\...\.......\........\........\PaodingTokenizer.class
........\.......\.......\...\.......\........\........\TokenCollector.class
........\.......\.......\...\.......\........\Constants.class
........\.......\.......\...\.......\........\dictionary\BinaryDictionary.class
........\.......\.......\...\.......\........\..........\Dictionary.class
........\.......\.......\...\.......\........\..........\DictionaryDelegate.class
........\.......\.......\...\.......\........\..........\HashBinaryDictionary$SubDictionaryWrap.class
........\.......\.......\...\.......\........\..........\HashBinaryDictionary.class
........\.......\.......\...\.......\........\..........\Hit.class
........\.......\.......\...\.......\........\..........\support\detection\Detector$1.class
........\.......\.......\...\.......\........\..........\.......\.........\Detector.class
........\.......\.......\...\.......\........\..........\.......\.........\Difference.class
........\.......\.......\...\.......\........\..........\.......\.........\DifferenceListener.class
........\.......\.......\...\.......\........\..........\.......\.........\ExtensionFileFilter.class
........\.......\.......\...\.......\........\..........\.......\.........\Node.class
........\.......\.......\...\.......\........\..........\.......\.........\Snapshot$InnerNode.class
........\.......\.......\...\.......\........\..........\.......\.........\Snapshot.class
........\.......\.......\...\.......\........\..........\.......\filewords\FileWordsReader.class
........\.......\.......\...\.......\........\..........\.......\.........\ReadListener.class
........\.......\.......\...\.......\........\..........\.......\.........\SimpleReadListener.class
........\.......\.......\...\.......\........\..........\.......\.........\SimpleRea