

PyCM – the swiss-army knife of confusion matrices
source link: https://www.tuicool.com/articles/hit/bA77Bzu
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Table of contents
- Overview
- Installation
- Usage
- Document
- Issues & Bug Reports
- Todo
- Outputs
- Dependencies
- Contribution
- References
- Cite
- Authors
- License
- Donate
- Changelog
Overview
PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers.

Fig1. PyCM Block Diagram
Installation
Source Code
- Download Version 1.4 or Latest Source
- Run
pip install -r requirements.txt
orpip3 install -r requirements.txt
(Need root access) - Run
python3 setup.py install
orpython setup.py install
(Need root access)
PyPI
- Check Python Packaging User Guide
- Run
pip install pycm --upgrade
orpip3 install pycm --upgrade
(Need root access)
Easy Install
- Run
easy_install --upgrade pycm
(Need root access)
Usage
From Vector
>>> from pycm import * >>> y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2] # or y_actu = numpy.array([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]) >>> y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2] # or y_pred = numpy.array([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]) >>> cm = ConfusionMatrix(actual_vector=y_actu, predict_vector=y_pred) # Create CM From Data >>> cm.classes [0, 1, 2] >>> cm.table {0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}} >>> print(cm) Predict 0 1 2 Actual 0 3 0 0 1 0 1 2 2 2 1 3 Overall Statistics : 95% CI (0.30439,0.86228) AUNP 0.66667 AUNU 0.69444 Bennett_S 0.375 CBA 0.47778 Chi-Squared 6.6 Chi-Squared DF 4 Conditional Entropy 0.95915 Cramer_V 0.5244 Cross Entropy 1.59352 Gwet_AC1 0.38931 Hamming Loss 0.41667 Joint Entropy 2.45915 KL Divergence 0.09352 Kappa 0.35484 Kappa 95% CI (-0.07708,0.78675) Kappa No Prevalence 0.16667 Kappa Standard Error 0.22036 Kappa Unbiased 0.34426 Lambda A 0.16667 Lambda B 0.42857 Mutual Information 0.52421 NIR 0.5 Overall_ACC 0.58333 Overall_CEN 0.46381 Overall_J (1.225,0.40833) Overall_MCC 0.36667 Overall_MCEN 0.51894 Overall_RACC 0.35417 Overall_RACCU 0.36458 P-Value 0.38721 PPV_Macro 0.56667 PPV_Micro 0.58333 Phi-Squared 0.55 RR 4.0 Reference Entropy 1.5 Response Entropy 1.48336 Scott_PI 0.34426 Standard Error 0.14232 Strength_Of_Agreement(Altman) Fair Strength_Of_Agreement(Cicchetti) Poor Strength_Of_Agreement(Fleiss) Poor Strength_Of_Agreement(Landis and Koch) Fair TPR_Macro 0.61111 TPR_Micro 0.58333 Zero-one Loss 5 Class Statistics : Classes 0 1 2 ACC(Accuracy) 0.83333 0.75 0.58333 AUC(Area under the roc curve) 0.88889 0.61111 0.58333 BM(Informedness or bookmaker informedness) 0.77778 0.22222 0.16667 CEN(Confusion entropy) 0.25 0.49658 0.60442 DOR(Diagnostic odds ratio) None 4.0 2.0 ERR(Error rate) 0.16667 0.25 0.41667 F0.5(F0.5 score) 0.65217 0.45455 0.57692 F1(F1 score - harmonic mean of precision and sensitivity) 0.75 0.4 0.54545 F2(F2 score) 0.88235 0.35714 0.51724 FDR(False discovery rate) 0.4 0.5 0.4 FN(False negative/miss/type 2 error) 0 2 3 FNR(Miss rate or false negative rate) 0.0 0.66667 0.5 FOR(False omission rate) 0.0 0.2 0.42857 FP(False positive/type 1 error/false alarm) 2 1 2 FPR(Fall-out or false positive rate) 0.22222 0.11111 0.33333 G(G-measure geometric mean of precision and sensitivity) 0.7746 0.40825 0.54772 IS(Information score) 1.26303 1.0 0.26303 J(Jaccard index) 0.6 0.25 0.375 LR+(Positive likelihood ratio) 4.5 3.0 1.5 LR-(Negative likelihood ratio) 0.0 0.75 0.75 MCC(Matthews correlation coefficient) 0.68313 0.2582 0.16903 MCEN(Modified confusion entropy) 0.26439 0.5 0.6875 MK(Markedness) 0.6 0.3 0.17143 N(Condition negative) 9 9 6 NPV(Negative predictive value) 1.0 0.8 0.57143 P(Condition positive or support) 3 3 6 POP(Population) 12 12 12 PPV(Precision or positive predictive value) 0.6 0.5 0.6 PRE(Prevalence) 0.25 0.25 0.5 RACC(Random accuracy) 0.10417 0.04167 0.20833 RACCU(Random accuracy unbiased) 0.11111 0.0434 0.21007 TN(True negative/correct rejection) 7 8 4 TNR(Specificity or true negative rate) 0.77778 0.88889 0.66667 TON(Test outcome negative) 7 10 7 TOP(Test outcome positive) 5 2 5 TP(True positive/hit) 3 1 3 TPR(Sensitivity, recall, hit rate, or true positive rate) 1.0 0.33333 0.5 dInd(Distance index) 0.22222 0.67586 0.60093 sInd(Similarity index) 0.84287 0.52209 0.57508 >>> cm.matrix() Predict 0 1 2 Actual 0 3 0 0 1 0 1 2 2 2 1 3 >>> cm.normalized_matrix() Predict 0 1 2 Actual 0 1.0 0.0 0.0 1 0.0 0.33333 0.66667 2 0.33333 0.16667 0.5 >>> cm.matrix(one_vs_all=True,class_name=0) # One-Vs-All, new in version 1.4 Predict 0 ~ Actual 0 3 0 ~ 2 7
Direct CM
>>> from pycm import * >>> cm2 = ConfusionMatrix(matrix={"Class1": {"Class1": 1, "Class2":2}, "Class2": {"Class1": 0, "Class2": 5}}) # Create CM Directly >>> cm2 pycm.ConfusionMatrix(classes: ['Class1', 'Class2']) >>> print(cm2) Predict Class1 Class2 Actual Class1 1 2 Class2 0 5 Overall Statistics : 95% CI (0.44994,1.05006) AUNP 0.66667 AUNU 0.66667 Bennett_S 0.5 CBA 0.52381 Chi-Squared 1.90476 Chi-Squared DF 1 Conditional Entropy 0.34436 Cramer_V 0.48795 Cross Entropy 1.2454 Gwet_AC1 0.6 Hamming Loss 0.25 Joint Entropy 1.29879 KL Divergence 0.29097 Kappa 0.38462 Kappa 95% CI (-0.354,1.12323) Kappa No Prevalence 0.5 Kappa Standard Error 0.37684 Kappa Unbiased 0.33333 Lambda A 0.33333 Lambda B 0.0 Mutual Information 0.1992 NIR 0.625 Overall_ACC 0.75 Overall_CEN 0.44812 Overall_J (1.04762,0.52381) Overall_MCC 0.48795 Overall_MCEN 0.29904 Overall_RACC 0.59375 Overall_RACCU 0.625 P-Value 0.36974 PPV_Macro 0.85714 PPV_Micro 0.75 Phi-Squared 0.2381 RR 4.0 Reference Entropy 0.95443 Response Entropy 0.54356 Scott_PI 0.33333 Standard Error 0.15309 Strength_Of_Agreement(Altman) Fair Strength_Of_Agreement(Cicchetti) Poor Strength_Of_Agreement(Fleiss) Poor Strength_Of_Agreement(Landis and Koch) Fair TPR_Macro 0.66667 TPR_Micro 0.75 Zero-one Loss 2 Class Statistics : Classes Class1 Class2 ACC(Accuracy) 0.75 0.75 AUC(Area under the roc curve) 0.66667 0.66667 BM(Informedness or bookmaker informedness) 0.33333 0.33333 CEN(Confusion entropy) 0.5 0.43083 DOR(Diagnostic odds ratio) None None ERR(Error rate) 0.25 0.25 F0.5(F0.5 score) 0.71429 0.75758 F1(F1 score - harmonic mean of precision and sensitivity) 0.5 0.83333 F2(F2 score) 0.38462 0.92593 FDR(False discovery rate) 0.0 0.28571 FN(False negative/miss/type 2 error) 2 0 FNR(Miss rate or false negative rate) 0.66667 0.0 FOR(False omission rate) 0.28571 0.0 FP(False positive/type 1 error/false alarm) 0 2 FPR(Fall-out or false positive rate) 0.0 0.66667 G(G-measure geometric mean of precision and sensitivity) 0.57735 0.84515 IS(Information score) 1.41504 0.19265 J(Jaccard index) 0.33333 0.71429 LR+(Positive likelihood ratio) None 1.5 LR-(Negative likelihood ratio) 0.66667 0.0 MCC(Matthews correlation coefficient) 0.48795 0.48795 MCEN(Modified confusion entropy) 0.38998 0.51639 MK(Markedness) 0.71429 0.71429 N(Condition negative) 5 3 NPV(Negative predictive value) 0.71429 1.0 P(Condition positive or support) 3 5 POP(Population) 8 8 PPV(Precision or positive predictive value) 1.0 0.71429 PRE(Prevalence) 0.375 0.625 RACC(Random accuracy) 0.04688 0.54688 RACCU(Random accuracy unbiased) 0.0625 0.5625 TN(True negative/correct rejection) 5 1 TNR(Specificity or true negative rate) 1.0 0.33333 TON(Test outcome negative) 7 1 TOP(Test outcome positive) 1 7 TP(True positive/hit) 1 5 TPR(Sensitivity, recall, hit rate, or true positive rate) 0.33333 1.0 dInd(Distance index) 0.66667 0.66667 sInd(Similarity index) 0.5286 0.5286 >>> cm3 = ConfusionMatrix(matrix={"Class1": {"Class1": 1, "Class2":0}, "Class2": {"Class1": 2, "Class2": 5}},transpose=True) # Transpose Matrix >>> cm3.matrix() Predict Class1 Class2 Actual Class1 1 2 Class2 0 5
Activation Threshold
threshold
is added in Version 0.9
for real value prediction.
For more information visit Example3
Load From File
file
is added in Version 0.9.5
in order to load saved confusion matrix with .obj
format generated by save_obj
method.
For more information visit Example4
Sample Weights
sample_weight
is added in Version 1.2
For more information visit Example5
Transpose
transpose
is added in Version 1.2
in order to transpose input matrix (only in Direct CM
mode)
Online Help
online_help
function is added in Version 1.1
in order to open each statistics definition in web browser
>>> from pycm import online_help >>> online_help("J") >>> online_help("Strength_Of_Agreement(Landis and Koch)") >>> online_help(2)
- list of items are available by calling
online_help()
(without argument)
Acceptable Data Types
-
actual_vector
: pythonlist
or numpyarray
of any stringable objects -
predict_vector
: pythonlist
or numpyarray
of any stringable objects -
matrix
:dict
-
digit
:int
-
threshold
:FunctionType (function or lambda)
-
file
:File object
-
sample_weight
: pythonlist
or numpyarray
of any stringable objects -
transpose
:bool
- run
help(ConfusionMatrix)
forConfusionMatrix
object details
For more information visit here
Issues & Bug Reports
Just fill an issue and describe it. We'll check it ASAP! or send an email to [email protected] .
Todo
Moved here
Outputs
Dependencies
Contribution
Changes and improvements are more than welcome!
:heart:
Feel free to fork and open a pull request. Please make your changes in a specific branch and request to pull into dev
Remember to write a few tests for your code before sending pull requests.
References
1- J. R. Landis, G. G. Koch, “The measurement of observer agreement for categorical data. Biometrics,” in International Biometric Society, pp. 159–174, 1977.
2- D. M. W. Powers, “Evaluation: from precision, recall and f-measure to roc, informedness, markedness & correlation,” in Journal of Machine Learning Technologies, pp.37-63, 2011.
3- C. Sammut, G. Webb, “Encyclopedia of Machine Learning” in Springer, 2011.
4- J. L. Fleiss, “Measuring nominal scale agreement among many raters,” in Psychological Bulletin, pp. 378-382.
5- D.G. Altman, “Practical Statistics for Medical Research,” in Chapman and Hall, 1990.
6- K. L. Gwet, “Computing inter-rater reliability and its variance in the presence of high agreement,” in The British Journal of Mathematical and Statistical Psychology, pp. 29–48, 2008.”
7- W. A. Scott, “Reliability of content analysis: The case of nominal scaling,” in Public Opinion Quarterly, pp. 321–325, 1955.
8- E. M. Bennett, R. Alpert, and A. C. Goldstein, “Communication through limited response questioning,” in The Public Opinion Quarterly, pp. 303–308, 1954.
9- D. V. Cicchetti, "Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology," in Psychological Assessment, pp. 284–290, 1994.
10- R.B. Davies, "Algorithm AS155: The Distributions of a Linear Combination of χ2 Random Variables," in Journal of the Royal Statistical Society, pp. 323–333, 1980.
11- S. Kullback, R. A. Leibler "On information and sufficiency," in Annals of Mathematical Statistics, pp. 79–86, 1951.
12- L. A. Goodman, W. H. Kruskal, "Measures of Association for Cross Classifications, IV: Simplification of Asymptotic Variances," in Journal of the American Statistical Association, pp. 415–421, 1972.
13- L. A. Goodman, W. H. Kruskal, "Measures of Association for Cross Classifications III: Approximate Sampling Theory," in Journal of the American Statistical Association, pp. 310–364, 1963.
14- T. Byrt, J. Bishop and J. B. Carlin, “Bias, prevalence, and kappa,” in Journal of Clinical Epidemiology pp. 423-429, 1993.
15- M. Shepperd, D. Bowes, and T. Hall, “Researcher Bias: The Use of Machine Learning in Software Defect Prediction,” in IEEE Transactions on Software Engineering, pp. 603-616, 2014.
16- X. Deng, Q. Liu, Y. Deng, and S. Mahadevan, “An improved method to construct basic probability assignment based on the confusion matrix for classification problem, ” in Information Sciences, pp.250-261, 2016.
17- Wei, J.-M., Yuan, X.-Y., Hu, Q.-H., Wang, S.-Q.: A novel measure for evaluating classifiers. Expert Systems with Applications, Vol 37, 3799–3809 (2010).
18- Kononenko I. and Bratko I. Information-based evaluation criterion for classifier’s performance. Machine Learning, 6:67–80, 1991.
19- Delgado R., Núñez-González J.D. (2019) Enhancing Confusion Entropy as Measure for Evaluating Classifiers. In: Graña M. et al. (eds) International Joint Conference SOCO’18-CISIS’18-ICEUTE’18. SOCO’18-CISIS’18-ICEUTE’18 2018. Advances in Intelligent Systems and Computing, vol 771. Springer, Cham
20- Gorodkin J (2004) Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry 28: 367–374
21- Freitas C.O.A., de Carvalho J.M., Oliveira J., Aires S.B.K., Sabourin R. (2007) Confusion Matrix Disagreement for Multiple Classifiers. In: Rueda L., Mery D., Kittler J. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2007. Lecture Notes in Computer Science, vol 4756. Springer, Berlin, Heidelberg
22- Branco P., Torgo L., Ribeiro R.P. (2017) Relevance-Based Evaluation Metrics for Multi-class Imbalanced Domains. In: Kim J., Shim K., Cao L., Lee JG., Lin X., Moon YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science, vol 10234. Springer, Cham
23- Ballabio, D., Grisoni, F. and Todeschini, R. (2018). Multivariate comparison of classification performance measures. Chemometrics and Intelligent Laboratory Systems, 174, pp.33-44.
24- Cohen, Jacob. 1960. A coefficient of agreement for nominal scales. Educational And Psychological Measurement 20:37-46
25- Siegel, Sidney and N. John Castellan, Jr. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw Hill.
26- Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press, page 282 (Chapter 21. The two-dimensional case)
27- Matthews, B. W. (1975). "Comparison of the predicted and observed secondary structure of T4 phage lysozyme". Biochimica et Biophysica Acta (BBA) - Protein Structure. 405 (2): 442–451.
28- Swets JA. (1973). "The relative operating characteristic in Psychology". Science. 182 (14116): 990–1000.
29- Jaccard, Paul (1901), "Étude comparative de la distribution florale dans une portion des Alpes et des Jura", Bulletin de la Société Vaudoise des Sciences Naturelles, 37: 547–579.
30- Thomas M. Cover and Joy A. Thomas. 2006. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, New York, NY, USA.
31- Keeping, E.S. (1962) Introduction to Statistical Inference. D. Van Nostrand, Princeton, NJ.
Cite
If you use PyCM in your research , please cite this JOSS paper :
Haghighi, S., Jasemi, M., Hessabi, S. and Zolanvari, A. (2018). PyCM: Multiclass confusion matrix library in Python. Journal of Open Source Software, 3(25), p.729.
@article{Haghighi2018, doi = {10.21105/joss.00729}, url = {https://doi.org/10.21105/joss.00729}, year = {2018}, month = {may}, publisher = {The Open Journal}, volume = {3}, number = {25}, pages = {729}, author = {Sepand Haghighi and Masoomeh Jasemi and Shaahin Hessabi and Alireza Zolanvari}, title = {{PyCM}: Multiclass confusion matrix library in Python}, journal = {Journal of Open Source Software} }
Download PyCM.bib
JOSS Zenodo ResearchgateLicense
Donate to our project
If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK