文件名称:PACHONG
- 所属分类:
- C#编程
- 资源属性:
- [Windows] [Visual.Net] [源码]
- 上传时间:
- 2012-11-26
- 文件大小:
- 780kb
- 下载次数:
- 0次
- 提 供 者:
- 谭*
- 相关连接:
- 无
- 下载说明:
- 别用迅雷下载,失败请重下,重下不扣分!
介绍说明--下载内容均来自于网络,请自行研究使用
网络爬虫程序源码
这是一款用 C# 编写的网络爬虫
主要特性有:
可配置:线程数、线程等待时间,连接超时时间,可爬取文件类型和优先级、下载目录等。
状态栏显示统计信息:排入队列URL数,已下载文件数,已下载总字节数,CPU使用率和可用内存等。
有偏好的爬虫:可针对爬取的资源类型设置不同的优先级。
健壮性:十几项URL正规化策略以排除冗余下载、爬虫陷阱避免策略的使用等、多种策略以解析相对路径等。
较好的性能:基于正则表达式的页面解析、适度加锁、维持HTTP连接等。
今后有空可能加入的特性:
新特性 介绍
爬取文件用Berkeley DB存储 提高性能: 常用操作系统不善于处理大量小文件
基于URL Ranking的优先级队列 主题爬虫: 机器学习算法对链接与主题相关度进行评估,并按照得出的优先级顺序进行爬取
爬虫礼仪 遵循爬虫禁止协议、以及避免对服务器资源的过度使用等
性能优化 用UDP取代封装好的HttpWebRequest/Response
DNS缓存
异步的DNS地址解析
硬盘缓存或内存数据库以避免频繁的磁盘寻道
分布式爬虫以扩展单机能力(CPU、内存和硬盘访问) -GreySky source personal accounting system, management of daily accounting classification of report management user management built several sets of beautiful skin for beginners learning to use.
这是一款用 C# 编写的网络爬虫
主要特性有:
可配置:线程数、线程等待时间,连接超时时间,可爬取文件类型和优先级、下载目录等。
状态栏显示统计信息:排入队列URL数,已下载文件数,已下载总字节数,CPU使用率和可用内存等。
有偏好的爬虫:可针对爬取的资源类型设置不同的优先级。
健壮性:十几项URL正规化策略以排除冗余下载、爬虫陷阱避免策略的使用等、多种策略以解析相对路径等。
较好的性能:基于正则表达式的页面解析、适度加锁、维持HTTP连接等。
今后有空可能加入的特性:
新特性 介绍
爬取文件用Berkeley DB存储 提高性能: 常用操作系统不善于处理大量小文件
基于URL Ranking的优先级队列 主题爬虫: 机器学习算法对链接与主题相关度进行评估,并按照得出的优先级顺序进行爬取
爬虫礼仪 遵循爬虫禁止协议、以及避免对服务器资源的过度使用等
性能优化 用UDP取代封装好的HttpWebRequest/Response
DNS缓存
异步的DNS地址解析
硬盘缓存或内存数据库以避免频繁的磁盘寻道
分布式爬虫以扩展单机能力(CPU、内存和硬盘访问) -GreySky source personal accounting system, management of daily accounting classification of report management user management built several sets of beautiful skin for beginners learning to use.
(系统自动生成,下载前可以参看下载内容)
下载文件列表
NWebCrawler\config.ini
...........\MainForm.cs
...........\MainForm.Designer.cs
...........\MainForm.resx
...........\NWebCrawler.csproj
...........\obj\Debug\NWebCrawler.csproj.FileListAbsolute.txt
...........\...\.....\NWebCrawler.csproj.GenerateResource.Cache
...........\...\.....\NWebCrawler.exe
...........\...\.....\NWebCrawler.MainForm.resources
...........\...\.....\NWebCrawler.pdb
...........\...\.....\NWebCrawler.Properties.Resources.resources
...........\...\.....\NWebCrawler.SettingsForm.resources
...........\...\.....\ResolveAssemblyReference.cache
...........\Program.cs
...........\...perties\AssemblyInfo.cs
...........\..........\Resources.Designer.cs
...........\..........\Resources.resx
...........\..........\Settings.Designer.cs
...........\..........\Settings.settings
...........\SettingsForm.cs
...........\SettingsForm.Designer.cs
...........\SettingsForm.resx
...........Lib\Common\Logger.cs
..............\......\PriorityQueue.cs
..............\CrawleHistroyEntry.cs
..............\CrawlerThread.cs
..............\Downloader.cs
..............\NWebCrawlerLib.csproj
..............\obj\Debug\NWebCrawlerLib.csproj.FileListAbsolute.txt
..............\...\.....\NWebCrawlerLib.exe
..............\...\.....\NWebCrawlerLib.pdb
..............\Parser.cs
..............\Program.cs
..............\...perties\AssemblyInfo.cs
..............\Settings.cs
..............\UrlFrontierQueueManager.cs
..............\Utility.cs
NWebCrawler.sln
NWebCrawler.suo
51aspx源码必读.txt
bin\config.ini
...\download\0003be8238c8302e17c799d9f5d65876.gif
...\........\0718ad68487fa12de0cc75b20f7be03c.html; charset=utf-8
...\........\082e9d970f371da4f6e74dbe2c97f6e2.html; charset=utf-8
...\........\132949602460dfebc35da092329cba0c.gif
...\........\1695505243ceaa9c68e5a00061d1763f.javascript
...\........\1df7133090a0d07c5cec8fccbf6fd8dd.html; charset=utf-8
...\........\203557adfb69f0b4da4e237df2c0899a.html; charset=gb2312
...\........\23e5f50b0b42662c6694e574e74835cd.html; charset=utf-8
...\........\24eebf7019dc355f064372d6a889c60a.html; charset=gb2312
...\........\27439efce81b9ca84182d54aa411418e.html; charset=gb2312
...\........\2a2f02ca86459cde185fc8e8e9045bed.html; charset=utf-8
...\........\349427e49e96cbca35651e55ef94353d.gif
...\........\3891570720e771c847e5ac23e28aa6cc.html
...\........\3ff2932f670fc24203b1290df195dabf.gif
...\........\417d9e708c95da24b75705338598087f.html
...\........\44b19dec343bee7540d2e563399518f6.html; charset=gb2312
...\........\46e1c646c9965ce2581be0e2baa182cf.html; charset=utf-8
...\........\48bfe5c4818bc6d7d0a86b7c5d5a963a.javascript
...\........\4cef95f512517e118d0427cdf40d8d91.javascript
...\........\54cd270476c08dc49137cc587d5420e7.html; charset=utf-8
...\........\5ae7c8b442091b3c740b5f89f2202977.gif
...\........\5f194c03340af2c82af0806b4cd95f44.html; charset=gb2312
...\........\6a78a05748d064e4491b674a391174c7.javascript
...\........\6ba086f85f3602a364dae60f740138c5.html; charset=gb2312
...\........\73e9259e079ac68519bd2cf67af06c13.html; charset=utf-8
...\........\753a67d9417f20f83e1dce17d6146f85.gif
...\........\767223508f1bd57304d84720065f9ee8.x-javascript
...\........\7780c2d0134fad8b7a05a95d0f7b3378.html; charset=gb2312
...\........\7a6721fd05029de13a9df0e2a0948f25.html; charset=UTF-8
...\........\7eedab1d5fa988b034a32f14e08a97c0.gif
...\........\84675a6817fc8715e33bc1c631154b5d.html
...\........\857c3c382495ba1593a316498236e4f8.html; charset=gb2312
...\........\8769fd41800599144d3fffb49173cf71.x-icon
...\........\89253cefeda362f9b403341ccec22420.gif
...\........\8d52d7ccdc272a6bcaf36ae22d856dfc.html; charset=utf-8
...\........\9339d79eed585c1e0b126588c50477a8.javascript
...\........\93c0e58661019bd4a98aa3790a400cdf.x-javascript
...\........\94f1e7adbd48cf364b19771319db6b3f.gif
...\........\956119ce46fe84d5c1e240ef7d417bdb.html; charset=gb2312
...\........\9d71e4ab781e1b9bf3eccf2a47568d6e.html; charset=utf-8
...\........\a2418875c3955a694b18cf795764164a.html; charset=gb2312
...\........\a490c2a29b5986e5cd4e114a0b50d394.html; charset=gb2312
...\........\a6275663cfbb6142241df064c6f249f9.html; charset=gb2312
...........\MainForm.cs
...........\MainForm.Designer.cs
...........\MainForm.resx
...........\NWebCrawler.csproj
...........\obj\Debug\NWebCrawler.csproj.FileListAbsolute.txt
...........\...\.....\NWebCrawler.csproj.GenerateResource.Cache
...........\...\.....\NWebCrawler.exe
...........\...\.....\NWebCrawler.MainForm.resources
...........\...\.....\NWebCrawler.pdb
...........\...\.....\NWebCrawler.Properties.Resources.resources
...........\...\.....\NWebCrawler.SettingsForm.resources
...........\...\.....\ResolveAssemblyReference.cache
...........\Program.cs
...........\...perties\AssemblyInfo.cs
...........\..........\Resources.Designer.cs
...........\..........\Resources.resx
...........\..........\Settings.Designer.cs
...........\..........\Settings.settings
...........\SettingsForm.cs
...........\SettingsForm.Designer.cs
...........\SettingsForm.resx
...........Lib\Common\Logger.cs
..............\......\PriorityQueue.cs
..............\CrawleHistroyEntry.cs
..............\CrawlerThread.cs
..............\Downloader.cs
..............\NWebCrawlerLib.csproj
..............\obj\Debug\NWebCrawlerLib.csproj.FileListAbsolute.txt
..............\...\.....\NWebCrawlerLib.exe
..............\...\.....\NWebCrawlerLib.pdb
..............\Parser.cs
..............\Program.cs
..............\...perties\AssemblyInfo.cs
..............\Settings.cs
..............\UrlFrontierQueueManager.cs
..............\Utility.cs
NWebCrawler.sln
NWebCrawler.suo
51aspx源码必读.txt
bin\config.ini
...\download\0003be8238c8302e17c799d9f5d65876.gif
...\........\0718ad68487fa12de0cc75b20f7be03c.html; charset=utf-8
...\........\082e9d970f371da4f6e74dbe2c97f6e2.html; charset=utf-8
...\........\132949602460dfebc35da092329cba0c.gif
...\........\1695505243ceaa9c68e5a00061d1763f.javascript
...\........\1df7133090a0d07c5cec8fccbf6fd8dd.html; charset=utf-8
...\........\203557adfb69f0b4da4e237df2c0899a.html; charset=gb2312
...\........\23e5f50b0b42662c6694e574e74835cd.html; charset=utf-8
...\........\24eebf7019dc355f064372d6a889c60a.html; charset=gb2312
...\........\27439efce81b9ca84182d54aa411418e.html; charset=gb2312
...\........\2a2f02ca86459cde185fc8e8e9045bed.html; charset=utf-8
...\........\349427e49e96cbca35651e55ef94353d.gif
...\........\3891570720e771c847e5ac23e28aa6cc.html
...\........\3ff2932f670fc24203b1290df195dabf.gif
...\........\417d9e708c95da24b75705338598087f.html
...\........\44b19dec343bee7540d2e563399518f6.html; charset=gb2312
...\........\46e1c646c9965ce2581be0e2baa182cf.html; charset=utf-8
...\........\48bfe5c4818bc6d7d0a86b7c5d5a963a.javascript
...\........\4cef95f512517e118d0427cdf40d8d91.javascript
...\........\54cd270476c08dc49137cc587d5420e7.html; charset=utf-8
...\........\5ae7c8b442091b3c740b5f89f2202977.gif
...\........\5f194c03340af2c82af0806b4cd95f44.html; charset=gb2312
...\........\6a78a05748d064e4491b674a391174c7.javascript
...\........\6ba086f85f3602a364dae60f740138c5.html; charset=gb2312
...\........\73e9259e079ac68519bd2cf67af06c13.html; charset=utf-8
...\........\753a67d9417f20f83e1dce17d6146f85.gif
...\........\767223508f1bd57304d84720065f9ee8.x-javascript
...\........\7780c2d0134fad8b7a05a95d0f7b3378.html; charset=gb2312
...\........\7a6721fd05029de13a9df0e2a0948f25.html; charset=UTF-8
...\........\7eedab1d5fa988b034a32f14e08a97c0.gif
...\........\84675a6817fc8715e33bc1c631154b5d.html
...\........\857c3c382495ba1593a316498236e4f8.html; charset=gb2312
...\........\8769fd41800599144d3fffb49173cf71.x-icon
...\........\89253cefeda362f9b403341ccec22420.gif
...\........\8d52d7ccdc272a6bcaf36ae22d856dfc.html; charset=utf-8
...\........\9339d79eed585c1e0b126588c50477a8.javascript
...\........\93c0e58661019bd4a98aa3790a400cdf.x-javascript
...\........\94f1e7adbd48cf364b19771319db6b3f.gif
...\........\956119ce46fe84d5c1e240ef7d417bdb.html; charset=gb2312
...\........\9d71e4ab781e1b9bf3eccf2a47568d6e.html; charset=utf-8
...\........\a2418875c3955a694b18cf795764164a.html; charset=gb2312
...\........\a490c2a29b5986e5cd4e114a0b50d394.html; charset=gb2312
...\........\a6275663cfbb6142241df064c6f249f9.html; charset=gb2312