文件名称:WebCollector-master
- 所属分类:
- JSP源码/Java
- 资源属性:
- [Java] [源码]
- 上传时间:
- 2014-10-16
- 文件大小:
- 3.71mb
- 下载次数:
- 0次
- 提 供 者:
- 吴*
- 相关连接:
- 无
- 下载说明:
- 别用迅雷下载,失败请重下,重下不扣分!
介绍说明--下载内容均来自于网络,请自行研究使用
基于WebCollector内核,可以自己编写爬虫的http请求、链接解析器、爬取信息更新器、抓取器等模块,WebCollector把这些基于内核编写的模块称作 插件 ,通过不同的插件组合,可以在1分钟内,把WebCollector组装成一个全新的爬虫。
WebCollector内置了一套插件(cn.edu.hfut.dmic.webcollector.plugin.redis)。基于这套插件,可以把WebCollector的任务管理放到redis数据库上,这使得WebCollector可以爬取海量的数据(上亿级别)。
用户可以根据自己的需求,去开发适合自己的插件-Based WebCollector kernel, you can write your own http request reptiles, link resolver, crawling updates, spider and other modules, WebCollector write these kernel-based modules called plug-in through a combination of different plug-ins, you can within one minute, the WebCollector assembled into a new reptile.
WebCollector built a set of plug-ins (cn.edu.hfut.dmic.webcollector.plugin.redis). Based on this set of plug-ins, you can put WebCollector task management onto the redis , which makes WebCollector be crawling mass of data (on a million levels).
Users can according to their own needs, to develop their own plug-ins. .
WebCollector内置了一套插件(cn.edu.hfut.dmic.webcollector.plugin.redis)。基于这套插件,可以把WebCollector的任务管理放到redis数据库上,这使得WebCollector可以爬取海量的数据(上亿级别)。
用户可以根据自己的需求,去开发适合自己的插件-Based WebCollector kernel, you can write your own http request reptiles, link resolver, crawling updates, spider and other modules, WebCollector write these kernel-based modules called plug-in through a combination of different plug-ins, you can within one minute, the WebCollector assembled into a new reptile.
WebCollector built a set of plug-ins (cn.edu.hfut.dmic.webcollector.plugin.redis). Based on this set of plug-ins, you can put WebCollector task management onto the redis , which makes WebCollector be crawling mass of data (on a million levels).
Users can according to their own needs, to develop their own plug-ins. .
(系统自动生成,下载前可以参看下载内容)
下载文件列表
WebCollector-master
...................\.gitignore
...................\LICENSE
...................\README.md
...................\README.zh-cn.md
...................\WebCollector
...................\............\pom.xml
...................\............\src
...................\............\...\main
...................\............\...\....\java
...................\............\...\....\....\cn
...................\............\...\....\....\..\edu
...................\............\...\....\....\..\...\hfut
...................\............\...\....\....\..\...\....\dmic
...................\............\...\....\....\..\...\....\....\webcollector
...................\............\...\....\....\..\...\....\....\............\crawler
...................\............\...\....\....\..\...\....\....\............\.......\BreadthCrawler.java
...................\............\...\....\....\..\...\....\....\............\fetcher
...................\............\...\....\....\..\...\....\....\............\.......\Fetcher.java
...................\............\...\....\....\..\...\....\....\............\generator
...................\............\...\....\....\..\...\....\....\............\.........\DbUpdater.java
...................\............\...\....\....\..\...\....\....\............\.........\Generator.java
...................\............\...\....\....\..\...\....\....\............\.........\Injector.java
...................\............\...\....\....\..\...\....\....\............\.........\ParamURLGenerator.java
...................\............\...\....\....\..\...\....\....\............\.........\StandardGenerator.java
...................\............\...\....\....\..\...\....\....\............\.........\filter
...................\............\...\....\....\..\...\....\....\............\.........\......\Filter.java
...................\............\...\....\....\..\...\....\....\............\.........\......\IntervalFilter.java
...................\............\...\....\....\..\...\....\....\............\.........\......\URLRegexFilter.java
...................\............\...\....\....\..\...\....\....\............\.........\......\UniqueFilter.java
...................\............\...\....\....\..\...\....\....\............\handler
...................\............\...\....\....\..\...\....\....\............\.......\Handler.java
...................\............\...\....\....\..\...\....\....\............\.......\Message.java
...................\............\...\....\....\..\...\....\....\............\model
...................\............\...\....\....\..\...\....\....\............\.....\AvroModel.java
...................\............\...\....\....\..\...\....\....\............\.....\CommonURL.java
...................\............\...\....\....\..\...\....\....\............\.....\CrawlDatum.java
...................\............\...\....\....\..\...\....\....\............\.....\Link.java
...................\............\...\....\....\..\...\....\....\............\.....\Page.java
...................\............\...\....\....\..\...\....\....\............\output
...................\............\...\....\....\..\...\....\....\............\......\FileSystemOutput.java
...................\............\...\....\....\..\...\....\....\............\parser
...................\............\...\....\....\..\...\....\....\............\......\HtmlParser.java
...................\............\...\....\....\..\...\....\....\............\......\LinkUtils.java
...................\............\...\....\....\..\...\....\....\............\......\ParseResult.java
...................\............\...\....\....\..\...\....\....\............\......\Parser.java
...................\............\...\....\....\..\...\....\....\............\ui
...................\............\...\....\....\..\...\....\....\............\..\BreadthCrawlerUI.form
...................\............\...\....\....\..\...\....\....\............\..\BreadthCrawlerUI.java
...................\............\...\....\....\..\...\....\....\............\util
...................\............\...\