6

基于HITS算法的关键威胁资产发现

 3 years ago
source link: https://zhuanlan.zhihu.com/p/339728760
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

HITS本来是大名鼎鼎的网页搜索引擎的算法,它定义一个网页有2个属性,一个是Hub属性(枢纽网页),一个是Aut属性(权威网页)。

  • Hub值越高,说明这个网页里包含了大量高Aut值的页面
  • Aut值越高,说明这个网页里被大量的高Hub值的页面包含

简单来说,一个网页的

  • Hub值=它指向的页面的Aut值之和
  • Aut值=指向它的页面的Hub值之和

这和网络安全有啥关系呢?别急,我们可以把不同主机之间的攻击关系往上靠,这样我们也得到了一张攻击图。任何一个风险主机都有2个属性,一个是Hub属性,也就是它攻击了谁,另一个是Aut属性,表示它被谁攻击~

  • HUB属性:攻击性
  • AUT属性:受害性
77vYbey.jpg!mobile
  • 初始状态:A、B主机中毒了,攻击了C主机
  • 进行了2轮HITS迭代,达到稳态(这里状态2和状态3值已经相同)
  • 最终状态:A的攻击性0.7,B的攻击性0.7,C的受害性1

这样我们就可以类比网页搜索引擎识别重要网页的过程,迅速识别重要的风险主机~

使用Python代码进行验证,和我们推理的一模一样~

Ez6FJvy.png!mobile
from pygraph.classes.digraph import digraph
from math import sqrt
class HITSIterator:
    def __init__(self, dg):
        self.max_iterations = 100
        self.min_delta = 0.01
        self.graph = dg

        self.hub = {}
        self.authority = {}
        for node in self.graph.nodes():
            self.hub[node] = 1
            self.authority[node] = 1

    def hits(self):
        if not self.graph:
            return
        flag = False
        for i in range(self.max_iterations):
            change = 0.0
            self.get_hub(change)
            self.get_aut(change)
            print("This is NO.%s iteration" % (i + 1))

            if change < self.min_delta:
                flag = True
                break
        if flag:
            print("finished in %s iterations!" % (i + 1))
        else:
            print("finished out of 100 iterations!")

        for node in self.graph.nodes():
            print("节点{}的攻击性{} 受害性{}".format(node,round(self.hub[node],1),round(self.authority[node]),1))

    def get_aut(self,change):
        norm = 0
        tmp = self.authority.copy()
        for node in self.graph.nodes():
            self.authority[node] = 0
            for incident_page in self.graph.incidents(node):
                self.authority[node] += self.hub[incident_page]
            norm += pow(self.authority[node], 2)
        norm = sqrt(norm)
        for node in self.graph.nodes():
            self.authority[node] /= norm
            change += abs(tmp[node] - self.authority[node])

    def get_hub(self,change):
        norm = 0
        tmp = self.hub.copy()
        for node in self.graph.nodes():
            self.hub[node] = 0
            for neighbor_page in self.graph.neighbors(node):
                self.hub[node] += self.authority[neighbor_page]
            norm += pow(self.hub[node], 2)
        norm = sqrt(norm)
        for node in self.graph.nodes():
            self.hub[node] /= norm
            change += abs(tmp[node] - self.hub[node])


if __name__ == '__main__':
    dg = digraph()

    dg.add_nodes(["A", "B", "C"])
    dg.add_edge(("A", "C"))
    dg.add_edge(("B", "C"))

    hits = HITSIterator(dg)
    hits.hits()

参考

  1. HITS算法--从原理到实现   https://www.cnblogs.com/rubinorth/p/5800624.html

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK