Thomas Packer, Ph.D.
1 min readDec 5, 2019

--

I figure you used ROC-AUC because the Kaggle competition did so, too. But I just wonder: why? Why use ROC? I would ask the Kaggle competition the same question. I’ve done a similar multi-label classification task recently. I started with AU-ROC, which gave very high values just like you are showing. It looked like there was no room for improvement. But once I switched to precision, recall, and F-measure, I could easily see there was plenty of room for improvement. ROC can be deceptive.

ROC is distinguishable from P, R, and F in that it is sensitive to the accuracy of the classifier on the actual negatives (true negatives and false positives). Precision can be low (and therefore F-score low) when specificity is high (and therefore AU-ROC is high).

If you want to know how well you do on the actual and predicted positives, especially when there is class imbalance, use area under the precision-recall curve, not AU-ROC. If you want to know how well you do on both actual positives and actual negatives equally, then use AU-ROC. But I struggle to understand why you really want to know how well you do on the “actual negatives”. Usually people want to know if they found all the positives (recall) and if the predicted-positives are actually positive (precision).

https://www.kaggle.com/lct14558/imbalanced-data-why-you-should-not-use-roc-curve

https://stackoverflow.com/questions/44172162/f1-score-vs-roc-auc

--

--

Thomas Packer, Ph.D.
Thomas Packer, Ph.D.

Written by Thomas Packer, Ph.D.

I do data science (QU, NLP, conversational AI). I write applicable-allegorical fiction. I draw pictures. I have a PhD in computer science and I love my family.

No responses yet