Questions about Ground Truth Images

I have two question about annotated ground truth tiff images:

Are those image *exhaustively* annotated? I.e can we consider any non-annotated cell negative for mitosis?

Thank you,

Yes, they are exhaustively annotated.

One further thing: the ground truth for the ROIs is non-exhaustive, i.e. the number of mitotic figures in other regions is/might be also relevant. These are just some of the locations where pathologists would perform counting.

Hi Mitko,

Could you clarify if, for the 148 cases from the auxiliary ROI dataset, the mitosis score (as provided in training_ground_truth.csv) was or not determined based on manual counting within the provided ROI (as provided in the ROIs.zip) ? If so, was the total count somehow a sum of mitoses found in the 3 ROIs ?

Thanks,
Raphaël.

The annotation for the mitotic score was done independently from the annotation of the ROIs. It is possible that the mitotic score was assigned based on a ROI that is not one of the annotated ones. We followed, however, the same guidelines for annotating the ROIs as during the mitosis counting for the scoring. These are the same guidelines that are used in routine clinical practice. I hope this answers your question.

is there any xy information about the locations of the region of interest in the training images?

We have provided annotations of rectangular ROIs for 148 cases from the training set: http://tupac.tue-image.nl/system/files/ROIs.zip

Is this true statement that every ROI on whole slide images must contain at least 1 mitotic cell? Or there may be ROIs without any mitotic cells?

It is very likely that there are ROIs without mitotic cells, particularly for cases with score 1 that corresponds to <6 mitotic figures per 2 mm².

Hi Mitko, I still have the following questions:
1) I understand that the 148 provided ROI samples are meant for detecting ROIs only (no relevant to detecting mitoses), is this correct?
2) There are only positive samples (being ROIs) but no negative samples (not being ROIs, e.g. background, muscle fiber) among the 148 ROI samples, but in order to conduct the training, we need both types. So what about the negative ROI samples?
3) Just to confirm that the two auxiliary datasets are independently collected, i.e. the provided mitoses samples have no relation to the ROI dataset, and they are NOT all collected from the 148 ROI samples. Correct?
Thank you

1) This is correct. The ideas is to use these ROIs to design an ROI detector (for example). 2) It is true that there are no negative ROIs. The problem is that it is difficult to annotate "negative" samples. This makes the problem more difficult not impossible. For example (and this is just an example, not a strong recommendation) you can try a one-class approach. 3) This is correct - the data is independently collected.

Hi Mitko, thank you for your reply. I have another question. I see that the cases of the mitosis detection set come without their WSI ID (i.e. from which WSI does one of the 73 mitosis cases come from). Although I understand the samples in the same case come from the same WSI, I still don't know which WSI is that. I think this information is useful, may this information be shared with us? Otherwise we have to do some region-matching work to find the results. Thank you.

The 73 cases in the mitosis auxiliary dataset come from WSIs that are different the 500 cases in the primary training dataset (they also originate from a different medical center). Does this answer your question or did I misunderstand you?

Hi Mitko, thank you for your reply. I still have a few more questions:
1) Since the 73-cased auxiliary mitosis dataset comes from different WSIs, I wonder do the 73 cases have their corresponding two types of mitosis scores (i.e. the two types of scores of the two tasks)? Or are they come from a public dataset (e.g. MITOS-ATYPIA-14)?
2) There are 143 WSIs containing ROIs out of the 500 training WSIs. I wonder are these 143 WSI simply randomly selected, or it means that the rest 357 WSI contain no ROI at all?
3) The size of the provided ROIs in the 143 cases seem to random. However, in order to qualify for mitosis counting, each ROI has to be at least 2mm (about 5657*5657). So are the ROIs randomly marked by pathologists? These ROI-containing WSIs are not exhaustively annotated, right?

The first 23 cases come from the AMIDA13 challenge (see amida13.isi.uu.nl). Due to the way that the cases are split into individual images, it is difficult to compute a mitotic score. For the remaining 50 cases, which come from two different hospitals in the Netherlands and are previously unpublished, it is quite easy to compute the mitotic score as each region has area of 2 mm2. Yes, these 143 cases were randomly selected. All WSIs have regions where mitosis counting can be performed. The 143 are not exhaustively annotated, yes. The boxes indicate areas where a pathologist would perform mitosis counting. It is true that they are smaller than 2 mm2, but as I've said the annotations is not exhaustive.

Can you please clarify these two queries:
Query part1: a) How does the ROI of area 2mm^2 corresponds to 5657x5657 pixels (for the auxiliary mitoses dataset, cases: 24-73 )? I mean how do you do these conversions?
b) What will be the size (in pixels and mm^2) of the 10 HPFs per ROI for these cases?
Query part2: a) What will be the size (in pixels) of ROI for the main training data (500 cases)?
b) What will be the size (in pixels and mm^2) of the 10 HPFs per ROI of these 500 training cases as well as the test cases?

a) It's calculated from the resolution of the images. 1 pixel = 0.25 micrometers/pixel. The resolution of the slides at different levels can be read with openslide (eg. with the openslide-show-properties binary tool). b) We assume that 1 HPF = 0.2 mm2. Thus 2 mm2 = 10 HPF. It's just a convention that relates back to optical microscopes. c) You can compute that from the resolution for a specific level. Most of the slides have resolution that is ~ 0.25 micrometers/pixel. Thus, in pixels 10 HPFs would be around 5657x5657 pixels (as with the mitoses training set).

Sorry Mitko, but I am still missing something. Could you please elaborate a little more how the 10 HPFs would be around 5657x5657 pixels and what will be the size of 1 HPF (in pixels)?
According to AMIDA13 site (http://amida13.isi.uu.nl/?q=node/3) if the resolution is: 1 pixel = 0.25 micrometers and 1 HPF is assumed to be 0.5x0.5 mm^2 then the size of 1 HPF is 2000×2000 pixels,
because 0.5 mm = 500 micrometers and so 500/0.25 = 2000 pixels, which will require ROI of size 6000x6000 pixels to extract 9 HPFs and if the ROI is a square then to extract 10 HPFs we require 8000x8000 pixels.
Now for the dataset of TUPAC16: 1 pixel = 0.25 micrometers and 1 HPF is assumed to be 0.2x0.2 mm^2 then the size of 1 HPF is 800x800 pixels (is this right?), because 0.2 mm = 200 micrometers and so 200/0.25 = 800 pixels, which will require ROI of size 2400x2400 pixels to extract 9 HPFs and if the ROI is a square then to extract 10 HPFs we require 3200x3200 pixels.

You have to note that HPF depends on the microscope. For some microscopes 1 HPF = 0.18 mm2, for some 0.2 mm2 and for some it is > 0.2 mm2. The rule for the mitosis score states that the pathologists should report the number of mitoses in 2 mm2. Most microscopes have HPF size of 0.2 mm2, so that's why we say mitoses should be counted in 10 HPFs. If the microscope that they are using has HPF of 0.18 mm2, then they have to count mitoses in 11 HPFs etc. For AMIDA13 for convenience reasons we defined 1 HPF = 0.25 mm2 or 2000x2000 pixels. The mitosis score for the 500 cases is based on the number of mitoses counted in 2 mm2. This is the only thing that is important for the challenge. Note that the resolution of the slides at different levels can be read with Openslide. Most slides have resolution of around 0.25 micrometers/pixels (mpp) at level 0 (for a very few this is not the case). I hope this was more clear.

To clarify further, 2mm2 for a slide level with resolution of 0.25 mpp (as most of the images in the dataset are), 10 HPF would be an area of 5657x5657 pixels.

(Just to confirm) the mitosis score is based on the mitoses count in one ROI of 5657x5657 pixels. For processing convenience and (to emulate standard lab procedure) one can divide this ROI into 10 HPFs (for example each of 1131x2828 pixels).

Yes, you can make that assumption.

Still three queries:
1)- Is there any mathematical equation we can use to find out the pixels area corresponding to 2mm2 by providing the mpp?
I can't relate the 2mm2 (for a slide level with resolution of 0.25 mpp) to 5657x5657 pixels. I get around 8000x8000 pixels for 2mm2 at 0.25 mpp.
I checked out case TUPAC-TR-493 on http://host.simagis.com/live/anonymous-folder/jHYquxqfRhga6IROaBcRG4T4 where it shows: 1px=0.2505 micro m. But if you slide the pointer to 2003x2002 micro m (from the bottom left corner) you will get corresponding pixel area of around 7997x7994 pixels.
Similarly for case TUPAC-TR-490: 1px=0.2325 micro m. And 2003x1999 micro m corresponds to around 8616x8600 pixels.
2)- How would one divide the square area of ROI (eg 5657x5657 pixels) into 10 HPFs of square areas (which is usually the case with microscopes)?
3)- Some of the WSIs have more than one tumor area separated by quite a large distance (For example case TUPAC-TR-072 have 4 areas, case128 has 3 and case54 has 2). Should all of these areas be considered for selecting ROI?

1) This is trivial. Lets say that level 0 has resolution of 0.25 mpp. Then the area of 1 pixel is 0.25^2 micrometers^2. You can take it from there. 2) You can divide it however you like, or not divide it at all. You don't actually have to divide it in 10 HPFs. 3) That depends. Perhaps you will get the best accruacy by reporting the score for the region with highest density of mitotic figures. This is part of the method design and is something you need to evaluate. This topic is quite cluttered already. If you have any more questions please open a new topic with appropriate tile or sent me an email.

Thanks for clarifying. A little info needed on Task 2. Opening under a new title "Task 2: score based on molecular data"

Pages