文件名称:search
介绍说明--下载内容均来自于网络,请自行研究使用
统一资源定位符(URL)是网站页面的地址判别方式,也是蜘蛛抓取网站网页信息的途径。那搜索引擎蜘蛛是如何通过URL链接抓取网站页面的呢?搜索引擎工作大致分为三个阶段:爬行和抓取阶段(搜索引擎蜘蛛访问页面,并获取页面html代码存入数据库):预处理(对页面文字进行提取、分词、消除噪音、去重 和建立索引);排名(根据页面的相关性和网站权重高低展示给用户)。-Uniform Resource Locator (URL) address discrimination is the way web pages, but also the way spiders crawl the website pages of information. That is how the search engine spiders to crawl web pages by URL link it? Search engines work is broadly divided into three stages: crawling and crawling stage (search engine spiders to access the page, and the page html code to get into the database): pretreatment (on page text extraction, segmentation, eliminate noise, and to re-establish index) ranking (based on the correlation of the page and site level weights presented to the user).
(系统自动生成,下载前可以参看下载内容)
下载文件列表
search.doc