文件名称:NWebCrawler
- 所属分类:
- 搜索引擎
- 资源属性:
- [Windows] [Visual.Net] [源码]
- 上传时间:
- 2013-12-19
- 文件大小:
- 378kb
- 下载次数:
- 0次
- 提 供 者:
- w*
- 相关连接:
- 无
- 下载说明:
- 别用迅雷下载,失败请重下,重下不扣分!
介绍说明--下载内容均来自于网络,请自行研究使用
网络爬虫,
* 可配置:线程数、线程等待时间,连接超时时间,可爬取文件类型和优先级、下载目录等。
* 状态栏显示统计信息:排入队列URL数,已下载文件数,已下载总字节数,CPU使用率和可用内存等。
* 有偏好的爬虫:可针对爬取的资源类型设置不同的优先级。
* 健壮性:十几项URL正规化策略以排除冗余下载、爬虫陷阱避免策略的使用等、多种策略以解析相对路径等。
* 较好的性能:基于正则表达式的页面解析、适度加锁、维持HTTP连接等。-Web crawler
* 可配置:线程数、线程等待时间,连接超时时间,可爬取文件类型和优先级、下载目录等。
* 状态栏显示统计信息:排入队列URL数,已下载文件数,已下载总字节数,CPU使用率和可用内存等。
* 有偏好的爬虫:可针对爬取的资源类型设置不同的优先级。
* 健壮性:十几项URL正规化策略以排除冗余下载、爬虫陷阱避免策略的使用等、多种策略以解析相对路径等。
* 较好的性能:基于正则表达式的页面解析、适度加锁、维持HTTP连接等。-Web crawler
(系统自动生成,下载前可以参看下载内容)
下载文件列表
NWebCrawler
...........\bin
...........\...\download
...........\...\dump.txt
...........\...\NWebCrawler.exe
...........\...\NWebCrawler.pdb
...........\...\NWebCrawler.vshost.exe
...........\...\NWebCrawler.vshost.exe.manifest
...........\...\NWebCrawlerLib.exe
...........\...\NWebCrawlerLib.pdb
...........\...\NWebCrawlerLib_4_13_2013.log
...........\...\NWebCrawlerLib_4_14_2013.log
...........\C# 编写的网络爬虫.txt
...........\data
...........\....\pdc_09.txt
...........\....\sina_12_28.txt
...........\src
...........\...\Backup
...........\...\......\NWebCrawler
...........\...\......\NWebCrawler.sln
...........\...\......\NWebCrawlerLib
...........\...\......\..............\Common
...........\...\......\..............\......\Logger.cs
...........\...\......\..............\......\PriorityQueue.cs
...........\...\......\..............\CrawleHistroyEntry.cs
...........\...\......\..............\CrawlerThread.cs
...........\...\......\..............\Downloader.cs
...........\...\......\..............\NWebCrawlerLib.csproj
...........\...\......\..............\Parser.cs
...........\...\......\..............\Program.cs
...........\...\......\..............\Properties
...........\...\......\..............\..........\AssemblyInfo.cs
...........\...\......\..............\Settings.cs
...........\...\......\..............\UrlFrontierQueueManager.cs
...........\...\......\..............\Utility.cs
...........\...\......\...........\config.ini
...........\...\......\...........\MainForm.cs
...........\...\......\...........\MainForm.Designer.cs
...........\...\......\...........\MainForm.resx
...........\...\......\...........\NWebCrawler.csproj
...........\...\......\...........\Program.cs
...........\...\......\...........\Properties
...........\...\......\...........\..........\AssemblyInfo.cs
...........\...\......\...........\..........\Resources.Designer.cs
...........\...\......\...........\..........\Resources.resx
...........\...\......\...........\..........\Settings.Designer.cs
...........\...\......\...........\..........\Settings.settings
...........\...\......\...........\SettingsForm.cs
...........\...\......\...........\SettingsForm.Designer.cs
...........\...\......\...........\SettingsForm.resx
...........\...\NWebCrawler
...........\...\NWebCrawler.sln
...........\...\NWebCrawler.suo
...........\...\NWebCrawlerLib
...........\...\..............\bin
...........\...\..............\...\Release
...........\...\..............\Common
...........\...\..............\......\Logger.cs
...........\...\..............\......\PriorityQueue.cs
...........\...\..............\CrawleHistroyEntry.cs
...........\...\..............\CrawlerThread.cs
...........\...\..............\Downloader.cs
...........\...\..............\NWebCrawlerLib.csproj
...........\...\..............\NWebCrawlerLib.csproj.user
...........\...\..............\obj
...........\...\..............\...\Debug
...........\...\..............\...\.....\DesignTimeResolveAssemblyReferencesInput.cache
...........\...\..............\...\.....\NWebCrawlerLib.csproj.FileListAbsolute.txt
...........\...\..............\...\.....\NWebCrawlerLib.exe
...........\...\..............\...\.....\NWebCrawlerLib.pdb
...........\...\..............\...\.....\TempPE
...........\...\..............\Parser.cs
...........\...\..............\Program.cs
...........\...\..............\Properties
...........\...\..............\..........\AssemblyInfo.cs
...........\...\..............\Settings.cs
...........\...\..............\UrlFrontierQueueManager.cs
...........\...\..............\Utility.cs
...........\...\...........\bin
...........\...\...........\...\Release
...........\...\...........\config.ini
...........\...\...........\MainForm.cs
...........\...\...........\MainForm.Designer.cs
...........\...\...........\MainForm.resx
...........\...\...........\NWebCrawler.csproj
...........\...\...........\NWebCrawler.csproj.user
...........\...\...........\obj
...........\...\...........\...\Debug
...........\...\...........\...\.....\DesignTimeResolveAssemblyReferencesInput.cache
..........