MONKEY challenge: Detection of inflammation in kidney biopsies Banner

🔁 The leaderboard phase is now in the "open submissions" cycle. Stay tuned for the presentation at MIDL and the publication. 🐵

Evaluation and Ranking 📊 🏆¶

We will use the Free Response Operating Characteristic (FROC) analysis as the ranking metric for both tasks. We will also derive an "FROC score" from the FROC curve by calculating sensitivity at five pre-selected values of FP/mm²: [10, 20, 50, 100, 200, 300].

🏅 Task 1: FROC score for the inflammation cell (or MNL) detection (overall inflammation).

🏅 Task 2: FROC score for the lymphocyte and monocyte detection results (separately). The FROC will be computed per class and then averaged for the final ranking. However, the individual values will be visible on the leaderboard.

Additionally, we also report the Precision and Recall at two prediction thresholds (0.4 and 0.9). This metric was added after reopening the challenge for phase two and is thus not present for all submissions.

🔗 The evaluation script can be found on our GitHub.

The TPs are computed by comparing the prediction dot coordinates with the ground truth dot annotations with an error margin based on the cell size (5μm for monocytes, 4μm for lymphocytes, and 5μm for the combined inflammation cells).

Free Response Operating Characteristic (FROC) Curve¶

The FROC curve plots the true positive rate (TPR, a.k.a. sensitivity or recall) on the y-axis against the average number of false positives (FP) per mm² over all slides on the x-axis. It is thus an alternative to the ROC curve, where the x-axis plots the false positive rate instead. Based on this definition, we will compute the TP, FP, and false negatives (FN) and use them in the FROC analysis. The TPR is defined as TP/(TP+FN).

The score computation may be fine-tuned during the challenge to compare the best methods better.