搜索资源列表
ShinglingSimhash
- 网页查重算法Shingling和Simhash研究.对初学者很有用。-Web search algorithms Shingling and Simhash of weight. Useful for beginners.
SimHash
- 网络爬虫相关,计算SimHash及查找近似SimHash,JAVA编写-Web crawler related, and find the approximate calculation of SimHash SimHash, JAVA write
simhash
- simhash的一个简单示例,对初学者很有用-simhash a simple example, useful for beginners
simhash.tar
- 基于BKDR的字符串hash算法 & SimHash算法-Based BKDR string hash algorithm and SimHash algorithm
simhash-f624c65.tar
- simhash.c /* Bibliography * Mark Manasse * Microsoft Research Silicon Valley * Finding similar things quickly in large collections * http://research.microsoft.com/research/sv/PageTurner/similarity.htm *
deduplication
- C语言实现的simhash算法,用于文章查重!-Simhash algorithm C language, and re-check for the article!
SimHash
- simhash算法的实现,可快速比较文本的相似性-achieve simhash algorithm can quickly compare the similarity of text
SimHash
- simhash算法的使用,可以进行网页去重,文本的相似度计算等-Use simhash algorithm can go heavy pages, such as text similarity calculation
simhash
- simhash算法实现 是用python写的-simhash Algorithm
simhash-java-master
- Hasher-master java program
simhash-java-master
- 通过java simhash算法的简单实现(A simple implementation of simhash algorithm by java.)
simHash
- Google 的 simhash 算法的c#实现。通过大量测试,simhash用于比较大文本,比如500字以上效果都还蛮好,距离小于3的基本都是相似,误判率也比较低。(The c# implementation of the simhash algorithm for Google. Through a lot of tests, simhash for relatively large text, such as more than
simhash
- 针对网络爬虫获取的文本进行去重和筛选,保留样本多样的基础上去重重读的文本(web clawer to let the simple word ,and make more information to abtain)