17

Sweet Snippet 之 方差计算

 3 years ago
source link: https://blog.csdn.net/tkokof1/article/details/104344680
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Sweet Snippet 之 方差计算

tkokof1 2020-02-16 16:43:18 515

方差计算的简单实现

在概率统计中,方差用于衡量一组数据的离散程度,相关的计算公式如下(总体方差):

μ = 1 N ∑ i = 1 N x i σ 2 = 1 N ∑ i = 1 N ( x i − μ ) 2

μ=1N∑i=1Nxiσ2=1N∑i=1N(xi−μ)2μ=1N∑i=1Nxiσ2=1N∑i=1N(xi−μ)2
​μ=N1​i=1∑N​xi​σ2=N1​i=1∑N​(xi​−μ)2​

其中 μ \mu μ 为数据的平均值, 而 σ 2 \sigma^2 σ2 即是(总体)方差.

相应的实现代码如下:

-- Lua
function average(values, count)
    local sum = 0
    
    for i = 1, count do
        sum = sum + values[i]
    end
    
    return sum / count
end

function variance(values, count)
    local average = average(values, count)
    local variance = 0
    
    for i = 1, count do
        local delta = values[i] - average 
        variance = variance + delta * delta
    end
    
    return variance / count
end

通常我们需要在获取新样本数据时更新方差,简单的方法就是按照上述公式重新计算一遍,我们可以通过计算数据子集方差的方式来模拟这个过程:

-- Lua
function variance_list(values)
    local ret = {}
    
    for i = 1, #values do
        ret[i] = variance(values, i)
    end
    
    return ret
end

更好的一种方式是通过递推来计算数据子集的方差,这需要对方差的计算公式做一些变形:

σ 2 = 1 N ∑ i = 1 N ( x i − μ ) 2    ⟹    σ 2 = 1 N ∑ i = 1 N ( x i 2 + μ 2 − 2 x i μ )    ⟹    σ 2 = 1 N ( ∑ i = 1 N x i 2 + ∑ i = 1 N μ 2 − ∑ i = 1 N 2 x i μ )    ⟹    σ 2 = 1 N ( ∑ i = 1 N x i 2 + N μ 2 − 2 μ ∑ i = 1 N x i )    ⟹    σ 2 = 1 N ( ∑ i = 1 N x i 2 + N μ 2 − 2 N μ 2 )    ⟹    σ 2 = 1 N ( ∑ i = 1 N x i 2 − N μ 2 )    ⟹    σ 2 = 1 N ( ∑ i = 1 N x i 2 − N ( ∑ i = 1 N x i N ) 2 )    ⟹    σ 2 = 1 N ( ∑ i = 1 N x i 2 − ( ∑ i = 1 N x i ) 2 N )

σ2=1N∑i=1N(xi−μ)2⟹σ2=1N∑i=1N(x2i+μ2−2xiμ)⟹σ2=1N(∑i=1Nx2i+∑i=1Nμ2−∑i=1N2xiμ)⟹σ2=1N(∑i=1Nx2i+Nμ2−2μ∑i=1Nxi)⟹σ2=1N(∑i=1Nx2i+Nμ2−2Nμ2)⟹σ2=1N(∑i=1Nx2i−Nμ2)⟹σ2=1N(∑i=1Nx2i−N(∑Ni=1xiN)2)⟹σ2=1N(∑i=1Nx2i−(∑Ni=1xi)2N)σ2=1N∑i=1N(xi−μ)2⟹σ2=1N∑i=1N(xi2+μ2−2xiμ)⟹σ2=1N(∑i=1Nxi2+∑i=1Nμ2−∑i=1N2xiμ)⟹σ2=1N(∑i=1Nxi2+Nμ2−2μ∑i=1Nxi)⟹σ2=1N(∑i=1Nxi2+Nμ2−2Nμ2)⟹σ2=1N(∑i=1Nxi2−Nμ2)⟹σ2=1N(∑i=1Nxi2−N(∑i=1NxiN)2)⟹σ2=1N(∑i=1Nxi2−(∑i=1Nxi)2N)
​σ2=N1​i=1∑N​(xi​−μ)2⟹σ2=N1​i=1∑N​(xi2​+μ2−2xi​μ)⟹σ2=N1​(i=1∑N​xi2​+i=1∑N​μ2−i=1∑N​2xi​μ)⟹σ2=N1​(i=1∑N​xi2​+Nμ2−2μi=1∑N​xi​)⟹σ2=N1​(i=1∑N​xi2​+Nμ2−2Nμ2)⟹σ2=N1​(i=1∑N​xi2​−Nμ2)⟹σ2=N1​(i=1∑N​xi2​−N(N∑i=1N​xi​​)2)⟹σ2=N1​(i=1∑N​xi2​−N(∑i=1N​xi​)2​)​

基于此,我们就可以递推的计算数据子集的方差了,相关的计算复杂度则降低了一个数量级( O ( n 2 )    ⟹    O ( n ) O(n^2) \implies O(n) O(n2)⟹O(n)):

-- lua
function variance_list_recurrence(values)
    local ret = {}
    
    local pre_square_sum = 0
    local pre_sum = 0
    
    for i = 1, #values do
        local val = values[i]
        
        pre_square_sum = pre_square_sum + val * val
        pre_sum = pre_sum + val
        
        ret[i] = (pre_square_sum - (pre_sum * pre_sum / i)) / i
    end
    
    return ret
end

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK