文件名称:Hadoop-based-distributed-crawler
介绍说明--下载内容均来自于网络,请自行研究使用
本文讨论了搜索引擎的基本技术和网络爬虫的基本原理,并对分布式爬虫的技术原型Nutch进行了剖析。 -This article discusses the basic principles and basic techniques of search engine web crawlers, and distributed Nutch crawler technology prototypes were analyzed.
(系统自动生成,下载前可以参看下载内容)
下载文件列表
Hadoop-based distributed crawler.nh