文件名称:YukiSpider
介绍说明--下载内容均来自于网络,请自行研究使用
基于HttpClient4.0的网络爬虫基本框架(Java实现)-Analog HTTP request: HttpClient 4.0
Target page structure analysis, HTTP request header information analysis: Firefox+ firebug/Chrome (F12 developer mode)
HTML parsing: Jsoup
Target page structure analysis, HTTP request header information analysis: Firefox+ firebug/Chrome (F12 developer mode)
HTML parsing: Jsoup
(系统自动生成,下载前可以参看下载内容)
下载文件列表
YukiSpider\.classpath
..........\.project
..........\.settings\org.eclipse.core.resources.prefs
..........\.........\org.eclipse.jdt.core.prefs
..........\conf\log4j.properties
..........\....\spider.properties
..........\lib\apache-mime4j-0.6.jar
..........\...\commons-codec-1.4.jar
..........\...\commons-logging-1.1.1.jar
..........\...\httpclient-4.0.1.jar
..........\...\httpcore-4.0.1.jar
..........\...\httpmime-4.0.1.jar
..........\...\json.jar
..........\...\jsoup-1.5.2.jar
..........\...\log4j-1.2.17.jar
..........\...\mysql-connector-java-5.1.10-bin.jar
..........\src\cn\zju\yuki\spider\fetcher\PageFetcher.java
..........\...\..\...\....\......\handler\ContentHandler.java
..........\...\..\...\....\......\model\FetchedPage.java
..........\...\..\...\....\......\.....\SpiderParams.java
..........\...\..\...\....\......\parser\ContentParser.java
..........\...\..\...\....\......\queue\UrlQueue.java
..........\...\..\...\....\......\.....\VisitedUrlQueue.java
..........\...\..\...\....\......\SpiderStarter.java
..........\...\..\...\....\......\storage\DataStorage.java
..........\...\..\...\....\......\worker\SpiderWorker.java
..........\...\..\...\....\......\fetcher
..........\...\..\...\....\......\handler
..........\...\..\...\....\......\model
..........\...\..\...\....\......\parser
..........\...\..\...\....\......\queue
..........\...\..\...\....\......\storage
..........\...\..\...\....\......\worker
..........\...\..\...\....\spider
..........\...\..\...\yuki
..........\...\..\zju
..........\...\cn
..........\.settings
..........\bin
..........\conf
..........\lib
..........\src
YukiSpider