Hybrid Page Scoring Algorithm Based on Centrality and PageRank
-
摘要: 为准确、高效地对网页进行评分,提出了一种基于中心性(结点度、居间度和紧密度)和PageRank算法 的网页评分方法CentralRank.它采用PageRank算法计算网页分数,借助中心性度量的方法计算页面在Web社 会网络中的重要性.为了验证CentralRank的性能优势,设计了一个网页抓取器,可利用该抓取器自动、准确地 下载网页信息.该网页抓取器集成了网络信息采集、页面内容分析和页面消重3项技术.基于大量真实数据的实 验结果表明:CentralRank在保证网页评分时间性能的前提下,比单纯基于中心性的网页评分算法和PageRank 算法更准确、有效,预测准确性分别提高约14.2%和7.5%.
-
关键词:
- 社会网络分析 /
- Web社会网络 /
- 中心性 /
- PageRank算法 /
- 网页评分
Abstract: In order to score Web pages in an effective manner, a new page scoring algorithm, CentralRank, was proposed based on centrality measures, including degree, betweenness and closeness, and the PageRank algorithm. The CentralRank algorithm computes the importance of pages in Web social networks based on the centrality measures and employs the PageRank algorithm to accurately score Web pages. To verify the performance of the CentralRank algorithm, a Web crawler was developed to automatically and effectively crawl Web pages. The Web crawler contains three essential techniques, that is, Web data collection, content analysis and duplicate page detection. Experiments on real data show that the CentralRank algorithm can guarantee less time deficiency and is more exact in scoring Web pages than the centrality measures-based page ranking algorithm and the PageRank algorithm with an average improvement of 14.2% and 7.5%, respectively.-
Key words:
- social network analysis /
- Web social network /
- centrality /
- PageRank algorithm /
- Web page scoring
点击查看大图
计量
- 文章访问数: 1304
- HTML全文浏览量: 62
- PDF下载量: 453
- 被引次数: 0