1

c-index及其在生存分析中的应用

 2 years ago
source link: https://www.bobobk.com/592.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

c-index及其在生存分析中的应用

2021年12月23日

| 技术

一致性指数(concordance index)或者说c-index是用于评估算法预测效果的参数.定义上是指素有时间点上一致性对的比例.该参数在生物学上比如癌症预测上具有重要意义,可以用来评估癌症生存期预测的好坏.在python中可以使用lifelines包下的工具的concordance_index函数进行计算下面从具体的例子看看其具体的意义. 假设实际应用中有个癌症研究,其中6个患者的生存期情况分别为1个月,6个月,12个月,2年,3年,5年,如果预测结果为1个月,6个月,12个月,2年,3年,5年.那么该预测完全正确,c-index最高,为1,计算代码如下:

# 首先导入必要的一些包

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

# 导入计算的函数concordance_index
! pip install lifelines
from lifelines.utils import concordance_index

# 数据dataframe设置

df = pd.DataFrame({"name":["张三","李四","王五","赵二","麻子","someone"],"survive":[1,6,12,24,36,60],"predicted":[1,6,12,24,36,60]})
c_index = concordance_index(df.survive,df.predicted)

print(df)

print(c_index)

#       name  survive  predicted
# 0       张三        1          1
# 1       李四        6          6
# 2       王五       12         12
# 3       赵二       24         24
# 4       麻子       36         36
# 5  someone       60         60

# In [10]: c_index
# Out[10]: 1.0

事实上,c_index是忽略真实数值的,就像spearman一样是非参数检验方法,下面我们分别改变具体数值而保持顺序不变,结果仍然一样,c_index为1.

df = pd.DataFrame({"name":["张三","李四","王五","赵二","麻子"],"survive":[1,6,12,24,36,60],"predicted":[1,1.1,1.2,2.4,3.6,6]})
c_index = concordance_index(df.survive,df.predicted)

print(df)

print(c_index)
#       name  survive  predicted
# 0       张三        1        1.0
# 1       李四        6        1.1
# 2       王五       12        1.2
# 3       赵二       24        2.4
# 4       麻子       36        3.6
# 5  someone       60        6.0
# 1.0

df = pd.DataFrame({"name":["张三","李四","王五","赵二","麻子"],"survive":[1,6,12,24,36,60],"predicted":[1,60,120,240,360,600]})
c_index = concordance_index(df.survive,df.predicted)

print(df)

print(c_index)
#      name  survive  predicted
# 0       张三        1        1
# 1       李四        6        60
# 2       王五       12        120
# 3       赵二       24        240
# 4       麻子       36        360
# 5  someone       60        600
# 1.0

而如果顺序改变了的话,将会极大的造成c_index的变化.比如:

df = pd.DataFrame({"name":["张三","李四","王五","赵二","麻子"],"survive":[1,6,12,24,36,60],"predicted":[1,12,6,36,24,60]})
c_index = concordance_index(df.survive,df.predicted)

print(df)

print(c_index)
#       name  survive  predicted
# 0       张三        1          1
# 1       李四        6         12
# 2       王五       12          6
# 3       赵二       24         36
# 4       麻子       36         24
# 5  someone       60         60

# In [5]: c_index
# Out[5]: 0.8666666666666667

df = pd.DataFrame({"name":["张三","李四","王五","赵二","麻子","someone"],"survive":[1,6,12,24,36,60],"predicted":[60,36,24,12,6,1]})
c_index = concordance_index(df.survive,df.predicted)
print(df)
print(c_index)

iOut[3]: 
#      name  survive  predicted
# 0       张三        1         60
# 1       李四        6         36
# 2       王五       12         24
# 3       赵二       24         12
# 4       麻子       36          6
# 5  someone       60          1

# In [4]: c_index
# Out[4]: 0.0

一致性指数也即c_index在生存期预测方面可以用于评估算法性能,其对于预测的排列顺序敏感,而对具体数值不敏感.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK