Evaluation and Ranking 🏆

We will use the Free Response Operating Characteristic (FROC) analysis as the ranking metric for both leaderboards.

🏅 Leaderboard 1 will show the results for the inflammation cell (or MNL) detection (overall inflammation).

🏅 Leaderboard 2 will show the lymphocyte and monocyte detection results. The FROC will be computed per class and then averaged for the final ranking. However, the individual values will be visible on the leaderboard.

🔗 The evaluation script can be found on our GitHub.

The TPs are computed by comparing the prediction dot coordinates with the ground truth dot annotations with an error margin based on the cell size (10μm for monocytes, 4μm for lymphocytes, and 7.5μm for the combined inflammation cells).

Free Response Operating Characteristic (FROC) Curve

The FROC curve plots the true positive rate (TPR, a.k.a. sensitivity or recall) on the y-axis against the average number of false positives (FP) per mm² over all slides on the x-axis. It is thus an alternative to the ROC curve, where the x-axis plots the false positive rate instead. Based on this definition, we will compute the TP, FP, and false negatives (FN) and use them in the FROC analysis. The TPR is defined as TP/(TP+FN).

We will also derive an "FROC score" from the FROC curve by calculating sensitivity at five pre-selected values of FP/mm²: [10, 20, 50, 100, 200, 300].

The score computation may be fine-tuned during the challenge to compare the best methods better.