文件名称:JavaCrawler
介绍说明--下载内容均来自于网络,请自行研究使用
基于BFS的web网络爬虫,支持robot.txt-web crawlers, support robot.txt
(系统自动生成,下载前可以参看下载内容)
下载文件列表
JavaCrawler
...........\.classpath
...........\.DS_Store
__MACOSX
........\JavaCrawler
........\...........\._.DS_Store
JavaCrawler\.project
...........\.settings
...........\.........\org.eclipse.jdt.core.prefs
...........\bin
...........\...\Controller
...........\...\..........\BFSCrawler$1.class
...........\...\..........\BFSCrawler.class
...........\...\..........\GetWebsitePage.class
...........\...\..........\HtmlParser$1.class
...........\...\..........\HtmlParser.class
...........\...\Main.class
...........\...\Models
...........\...\......\LinkFilter.class
...........\...\......\MyQueue.class
...........\...\PracHttpParser.class
...........\DownloadFiles
...........\.............\.DS_Store
__MACOSX\JavaCrawler\DownloadFiles
........\...........\.............\._.DS_Store
JavaCrawler\DownloadFiles\duke.edu.html
...........\.............\today.duke.edu_2014_09_sept11memorial.html
...........\.............\today.duke.edu_alumni-features.html
...........\.............\today.duke.edu_alumni.html
...........\.............\today.duke.edu_topic_alumni#other.html
...........\.............\today.duke.edu_topic_alumni.html
...........\.............\Xinran_Luo_urls
...........\Jars
...........\....\commons-codec-1.3.jar
...........\....\commons-httpclient-3.1.jar
...........\....\commons-lang.jar
...........\....\commons-logging-1.1.1.jar
...........\....\htmllexer.jar
...........\....\htmlparser.jar
...........\....\je-4.0.92.jar
...........\MyQueue
...........\src
...........\...\Controller
...........\...\..........\BFSCrawler.java
...........\...\..........\GetWebsitePage.java
...........\...\..........\HtmlParser.java
...........\...\Main.java
...........\...\Models
...........\...\......\LinkFilter.java
...........\...\......\MyQueue.java
...........\...\PracHttpParser.java