with the aid of geoelectrical - imaging techniques
and geographical information systems 
Master’s Thesis in the Master Degree Programme, Civil Engineering  

KEVIN HINE

Department of Civil and Environmental Engineering  
Division of GeoEngineering 
Engineering Geology Research Group
CHALMERS UNIVERSITY OF TECHNOLOGY
Göteborg, Sweden 2005
Master’s Thesis  2005:22

Chromatin pattern analysis of cell nuclei

for improved cervical cancer screening

Master’s Thesis in Biomedical Engineering

0 5 10 15 20 25 30 35 400

200

400

600

800

1000

1200

1400

1600

1800

Ramin
Moshavegh

Babak
Ehteshami Bejnordi

Department of Signals & Systems
Chalmers University of Technology

Gothenburg, Sweden 2013
Thesis number: EX030/2013


Chromatin pattern analysis of cell nuclei

for improved cervical cancer screening

Master’s Thesis in Biomedical Engineering

Ramin
Moshavegh

Babak
Ehteshami Bejnordi

Thesis supervisor and examiner:

Assistant Professor
Andrew James Heinrich Mehnert

Image analysis
Chalmers University of Technology
Department of Signals and Systems

SE-412 96 Gothenburg Sweden

Thesis number: EX030/2013
ISSN: 99-2747920-4


Abstract

The Papanicolaou (Pap) test or Pap smear is the primary screening test for cervical
cancer. It involves the microscopic examination of cells sampled from the cervix. Two
major factors affect the accuracy of the Pap test. The first is sampling error wherein
no diagnostic cells make it on to the slide. The other is interpretation error for reasons
including fatigue, inexperience, and habituation. Whilst computer-assisted interpreta-
tion can potentially address the issue of interpretation error it cannot address sampling
error. However the malignancy-associated change (MAC) phenomenon may offer a solu-
tion. MACs are subtle sub-visual changes in the appearance of otherwise normal-looking
cells from an abnormal Pap smear slide.

An essential first step in the development of an automated screener, based on MACs,
is robust automatic segmentation of free-lying cell nuclei in digitized Pap smear images.
This thesis presents and evaluates a fully automated algorithm for robustly detecting
and segmenting free-lying cell nuclei in bright-field microscope images of Pap smears.
The proposed novel segmentation algorithm makes use of grey-scale annular closings
to identify free-lying nuclei-like objects together with marker-controlled watershed seg-
mentation to accurately delineate the nuclear boundaries. The method was evaluated
empirically using images digitised from Pap smears sourced from the Regional Cancer
Centre in Thiruvananthapuram in India. The results show that the sensitivity and speci-
ficity of nucleus detection is 94.71% and 85.30% respectively, and that the accuracy of
segmentation, measured using the Dice coefficient, of the detected nuclei is 97.30±1.3%.

This thesis also presents and evaluates a set of novel structural texture features for
quantifying and classifying nuclear chromatin patterns in cells on a conventional Pap
smear. The experimental results demonstrate the efficacy of the proposed structural
approach and that a combination of the structural texture features and conventional
features can be used to discriminate between normal and abnormal slides with high
accuracy (0.954±0.019 AUC±SE ). They also demonstrate that it is possible to detect
MACs in Papanicoloau stain (which is not stoichiometric). This in turn suggests the
possibility of developing a fully automated Pap smear screener based on MACs.

, Signals and Systems, Master of Science Thesis 2013 i


Acknowledgements

First and foremost we would like to express our deepest appreciation to our supervi-
sor Assistant Professor Andrew Mehnert, whose guidance, stimulating suggestions and
encouragement, helped us to coordinate the thesis and write this report.

We also extend a big thank you to our co-supervisors Professor Ewert Bengtsson and
Patrik Malm, CBA Uppsala, for their guidance and support and also for providing the
data needed.

We express our gratitude to MedTech West and its board for providing an inspiring
research environment.

Furthermore we would also like to acknowledge, with much appreciation, our fel-
low researchers at MedTech West, Yazdan Shirvany, Qaiser Mahmmod and Mohammad
Alipoor for their friendship and support.

Finally, we thank our families, and especially our parents, for their unflagging love
and support throughout our life; this thesis would have been impossible without them.

The Authors, Gothenburg, Sweden 2013/03/20

, Signals and Systems, Master of Science Thesis 2013 iii


Contents

1 Introduction 1

1.1 The Pap smear test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 Shortcomings of the Pap smear test . . . . . . . . . . . . . . . . . 3

1.1.2 Automated screening . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Malignancy associated changes (MACs) . . . . . . . . . . . . . . . . . . . 4

1.3 Aim and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Literature review 9

2.1 Review of existing methods for segmenting cells and nuclei in Pap smear
images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Review of existing features devised to quantitatively characterise chro-
matin distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 Review of chromatin texture features . . . . . . . . . . . . . . . . . 12

3 Novel algorithm for segmenting free-lying cell nuclei 15

3.1 High-level description of the proposed algorithm . . . . . . . . . . . . . . 15

3.2 Extraction of inner markers (step 1) . . . . . . . . . . . . . . . . . . . . . 16

3.3 Marker-controlled watershed segmentation of the detected nucleus-like ob-
jects (steps 2-3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.4 Artefact Rejection (steps 4-5) . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4.1 Size criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4.2 Shape criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4.3 Texture criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.5 Empirical evaluation of the proposed segmentation algorithm . . . . . . . 28

3.5.1 Image data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.5.2 Method and Experiment . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

v


CONTENTS

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Texture features characterising chromatin distribution 35

4.1 Chromatin segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2 Proposed structural texture features . . . . . . . . . . . . . . . . . . . . . 37

4.2.1 Graph-based features . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2.2 Margination features . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2.3 Clustering features . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2.4 Contextual chromatin particle features . . . . . . . . . . . . . . . . 41

4.2.5 Statistics of chromatin particle morphometric features . . . . . . . 41

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5 Evaluation of the proposed texture features for screening Pap smear
slides 45

5.1 Pap smear images and ground truth . . . . . . . . . . . . . . . . . . . . . 45

5.2 Feature selection methods considered in this study . . . . . . . . . . . . . 46

5.2.1 The curse of dimensionality and the need for feature selection . . . 46

5.2.2 MSVM-RFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2.3 Guided Regularized Random Forest (GRRF) . . . . . . . . . . . . 47

5.2.4 L1-regularization procedure with generalised linear model . . . . . 48

5.3 Experiment I: Determination of the most discriminatory subset of features
for the classification of Pap smears . . . . . . . . . . . . . . . . . . . . . . 48

5.3.1 Cytology slides for experiment I . . . . . . . . . . . . . . . . . . . 49

5.3.2 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3.3 Feature selection and classification . . . . . . . . . . . . . . . . . . 50

5.3.4 Results of experiment I . . . . . . . . . . . . . . . . . . . . . . . . 51

5.4 Experiment II: MAC-based classifiers for Pap smear screening . . . . . . . 52

5.4.1 Cytology slides for experiment II . . . . . . . . . . . . . . . . . . . 53

5.4.2 Classification performance evaluation of the best subset of features
derived in experiment I . . . . . . . . . . . . . . . . . . . . . . . . 53

5.4.3 Results of experiment II . . . . . . . . . . . . . . . . . . . . . . . 54

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6 Summary and Conclusion 57

6.1 Thesis summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.2 Key contributions and findings . . . . . . . . . . . . . . . . . . . . . . . . 58

6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.5 Opportunities for further research . . . . . . . . . . . . . . . . . . . . . . . 61

Appendices 63

vi , Signals and Systems, Master of Science Thesis 2013


CONTENTS

A Fundamental operations from Mathematical Morphology 64
A.1 Mathematical morphology for binary images . . . . . . . . . . . . . . . . . 64
A.2 Mathematical morphology for grey-scale images . . . . . . . . . . . . . . . 65

B Segmentation methods underlying methods in Chapters 2 and 3 66
B.1 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
B.2 Active contours and Deformable models . . . . . . . . . . . . . . . . . . . 67
B.3 Seeded Region Growing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
B.4 Watershed Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

C Cytology features 75

D Candidate cell nuclei marker extraction methods 76
D.1 Morphological reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 76

D.1.1 Grey-scale reconstruction: . . . . . . . . . . . . . . . . . . . . . . . 76
D.1.2 Regional minima: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
D.1.3 H-extrema: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
D.1.4 Morphological Top-hat Transform . . . . . . . . . . . . . . . . . . 79
D.1.5 Improved Morphological Top-hat Transform . . . . . . . . . . . . . 80

E Shape Criteria 82
E.1 Danielsson’s G shape factor . . . . . . . . . . . . . . . . . . . . . . . . . . 82
E.2 Dice similarity coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
E.3 Eccentricity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

F Classifiers used in the experiment II 85
F.1 SVM classifier with a linear kernel . . . . . . . . . . . . . . . . . . . . . . 85

F.1.1 Theory of linearly separable binary classification . . . . . . . . . . 85
F.2 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

G Publications 91

Bibliography 109

, Signals and Systems, Master of Science Thesis 2013 vii


List of Figures

1.1 Data acquisition and nucleus segmentation in an automated screening
system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 Extracting an inner marker for a free-lying cell nucleus. . . . . . . . . . . 17

3.2 Segmentation of the detected nuclei-like objects. . . . . . . . . . . . . . . 19

3.3 Cells of the squamous epithelium . . . . . . . . . . . . . . . . . . . . . . . 21

3.4 Ellipse fitting to an arbitrary shape . . . . . . . . . . . . . . . . . . . . . . 22

3.5 Computing averages at different scales . . . . . . . . . . . . . . . . . . . . 23

3.6 Measure of Tamura coarseness for 6 different cell nuclei. . . . . . . . . . . 24

3.7 Artefacts rejected by the Tamura coarseness feature . . . . . . . . . . . . 24

3.8 Two sample FOVs from a Pap smear slide . . . . . . . . . . . . . . . . . . 25

3.9 GUI for Ground truth generation . . . . . . . . . . . . . . . . . . . . . . . 29

3.10 GUI for Ground truth generation . . . . . . . . . . . . . . . . . . . . . . . 30

3.11 DSC scores for the evaluation of the proposed automatic segmentation . . 32

3.12 Examples of false positive detection by the algorithm . . . . . . . . . . . . 33

3.13 missed nuclei due to presence of background noise and watershed failure . 34

3.14 Missed nuclei due to unsuitable choice of parameter values . . . . . . . . . 34

4.1 Representation of a nucleus as a topographic relief (grey-level represents
height) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2 Chromatin segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3 Illustration of how the area-Voronoi diagram and generalised Delaunay
graph are computed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4 Graphs related to the Delaunay graph . . . . . . . . . . . . . . . . . . . . 38

4.5 Illustration of how the features M1 and M2 are computed. . . . . . . . . 39

4.6 Illustration of how the features C1 and C2 are computed. . . . . . . . . . 40

4.7 Illustration of how the features C3 and C4 are computed. . . . . . . . . . 41

5.1 Pie chart showing the contribution of the four main object feature cate-
gories in this study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

viii


LIST OF FIGURES

5.2 Illustration of 100 times iterated 5-fold cross validation feature selection . 51
5.3 Boxplots of the features F1, F2, F3, F4, F5, F6 for experiment 1. . . . . . 52
5.4 An example of a digitalized Pap smear FOV . . . . . . . . . . . . . . . . . 54
5.5 Boxplots of the features F1, F2, F5, F6 for experiment II. . . . . . . . . . 55

6.1 Focus stacking for extended depth of field in optical microscopy. . . . . . 60

B.1 Global Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
B.2 Parametric representation of a contour . . . . . . . . . . . . . . . . . . . . 69
B.3 Active contour segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 70
B.4 Cell nuclei segmentation by GVF snake. . . . . . . . . . . . . . . . . . . . 71
B.5 Simulating the flooding of a topographic surface . . . . . . . . . . . . . . 73
B.6 Computing the magnitude of gradient image: Watershed transform con-

siders the gradient image as a topographic surface. . . . . . . . . . . . . . 74

D.1 Illustration of regional minimum at level 2 in an image . . . . . . . . . . . 78
D.2 H-maxima transformation on a 1-D signal and extracting the regional

maxima by computing f −Rδf (f − h). . . . . . . . . . . . . . . . . . . . . 78
D.3 H-minima transformation on a 1-D signal and extracting the regional min-

ima by computing Rεf (f + h)− f . . . . . . . . . . . . . . . . . . . . . . . . 79
D.4 Morphological Top-hat Transform on an image . . . . . . . . . . . . . . . 80

E.1 G Shape-factor calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
E.2 Parametric description of an ellipse . . . . . . . . . . . . . . . . . . . . . . 83

F.1 SVM for two-class data classification by a hyperplane . . . . . . . . . . . 86
F.2 Plot of a logistic function . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
F.3 Two-class data classification by a linear logistic regression decision boundary 89

, Signals and Systems, Master of Science Thesis 2013 ix


List of Tables

2.1 Comparison between cell and cell nucleus segmentation approaches found
in the literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Proposed algorithm to detect and segment free-lying intermediate cell nuclei 16
3.2 Shape and granularity features for 6 different cell nuclei. . . . . . . . . . . 26
3.3 Shape and granularity features for 6 non-nuclei artefacts . . . . . . . . . . 27

4.1 Summary of the extracted nucleus features . . . . . . . . . . . . . . . . . 42

5.1 Frequency table for top 5 ranking features in experiment I. . . . . . . . . 51
5.2 Classification results for experiment I. . . . . . . . . . . . . . . . . . . . . 52
5.3 Classification results for experiment II. . . . . . . . . . . . . . . . . . . . . 55

x


1
Introduction

C
ervical cancer is the third most diagnosed cancer and the fourth leading
cause of cancer death among women worldwide accounting for 9% of all ma-
lignancies among females in 2008 [1]. More than 85% of the cervical cancer
cases occur in developing countries where public health infrastructure does

not support Papanicolaou testing [1]. In countries with developed healthcare systems,
widespread cervical screening programmes, aimed at detecting precancerous changes that
can then be treated to prevent invasive cancer, have significantly reduced the number of
deaths from the disease. The Papanicolaou (Pap) smear, is the primary screening test
for cervical cancer. It has been largely responsible for diagnosing cancerous and precan-
cerous lesions in many developed countries [2]. There is compelling evidence showing
the efficacy of organised screening programmes based on performing the Pap smear test
every 3-5 years. In some of the Nordic countries the incidence of invasive cervical cancer
has declined by 80% since the introduction of organised screening programmes [3]. In
Sweden the incidence of cervical cancer has decreased by 65% during the last 40 years
but the incidence of invasive cervical cancer and cervical cancer mortality figures have
been rather stable over the last decade [4].

Fortunately, invasive cervical cancer takes years to develop from slowly progressing
precancerous lesions, thus enabling early detection through screening and diagnostic
tests. Currently screening by conventional cervical cytology (Pap smear test) remains
the principal global strategy to prevent invasive cervical cancer. However, the Pap smear
test has several shortcomings including: subjective nature (dependent on individual
interpretation), low sensitivity (i.e. ability to detect abnormal changes) and the need for
frequent retesting [5].

The remainder of the present chapter is organised as follows: Section 1.1 describes
the Pap smear test and discusses its shortcomings, and then presents the rationale for
automated screening of Pap smear slides and describes the problems inherent in current
automated screening systems. Section 1.2 then discusses a phenomenon known as ma-

1


CHAPTER 1. INTRODUCTION

lignancy associated changes (MACs), which can offer a solution to the automated Pap
smear screening problem. The aim and objectives of the thesis are presented in section
1.3 and section 1.4 outlines the scope of the research. Finally, section 1.5 outlines the
structure of this thesis.

1.1 The Pap smear test

The Papanicolaou (Pap smear) test is the most cost-effective cancer prevention and
detection program ever invented [6]. It was devised by George N. Papanicolaou. In 1928
he first presented his findings that malignant cells from the cervix can be detected in a
small sample of cells collected from the cervix (vaginal smears) [7].

Over a decade passed before collaboration between Dr. Papanicolaou and Dr. Her-
bert Traut, a gynaecologist and pathologist, provided scientific evidence of the potential
of vaginal smears for the identification of cervical cancer and precancerous changes. Traut
provided Papanicolaou with a large number of clinical samples from female patients at
Cornell’s Hospital. Papanicolaou published a detailed description of pre-invasive cervical
lesions in his treatise, “Diagnosis of uterine cancer by the vaginal smear” [8] and in a
major paper [9].

The conventional Pap test is a simple procedure comprising the following steps [6]:

• A speculum is inserted into the vagina to widen the opening so that the cervix can
be viewed;

• Cells are sampled from inside and around the cervix using a swab, brush, or spatula;

• Cells are pressed on a glass slide and a fixative (preservative) is applied to preserve
the sample;

• The samples are stained to ameliorate the contrast in the specimen and highlight
the structural patterns to be analysed with a light microscope in a cytology labo-
ratory [10, 11].

Liquid-based cytology (LBC) is a new method for cellular sample preparation for
cytological tests. The main difference between LBC and the conventional Pap smear
is related to the underlying preparation technique [12]. The sample is collected in a
similar way to the Pap smear, but rather than smearing the cells onto a glass slide, the
cellular material is immediately rinsed into a preservative liquid solution. The sample
is then sent to the laboratory where special filtering techniques are used to remove non-
diagnostic materials such as mucus, pus and blood cells. In the next step, a thin layer
of cells is deposited onto a slide. Finally, the slide is examined under the microscope by
a cytologist in the same way as in the conventional smear test [12].

2 , Signals and Systems, Master of Science Thesis 2013


1.1. THE PAP SMEAR TEST

1.1.1 Shortcomings of the Pap smear test

It usually takes several days or weeks to prepare the final results of a Pap smear test.
Doctors and nurses sample the cervix and send the specimen to a pathology laboratory
for visual evaluation under a microscope [13]. The microscopic examination itself is
laborious and time-consuming involving the review of possibly hundreds of thousands of
cells for signs of cancer or precancer. It is not surprising therefore that 1 in every 10 to
20 positive cases is missed in routine screening [5].

Two major factors affect the accuracy of the Pap smear test. The first is sampling
error wherein no diagnostic cells make it onto the slide. This occurs when health care
providers fail to adequately sample the cervix (failing to sample precancerous/cancerous
cells when they are present). It also occurs when the precancerous/cancerous cells on the
collecting device do not make it onto the glass slide. The second factor affecting the ac-
curacy of the Pap smear test is interpretation error by the laboratory specialist (for any
of a number of reasons including fatigue, inexperience, and habituation). These short-
comings have motivated the research and development of automated screening systems
[5, 13].

1.1.2 Automated screening

Automated screening machines can analyse Pap smear slides in a short time without
fatigue, providing consistent and objective classification results. The rationale for auto-
mated screening is to improve the limitations of the conventional Pap smear test in the
following ways [14]:

• Increase the sensitivity1 and specificity2 of the Pap smear test;

• Decrease the workload of technicians and pathologists;

• Reduce the cost for cervical cancer screening programmes; and

• Lower the probability of incidence of cervical cancer and the mortality rate from
the disease.

At present there are two FDA-approved cervical cancer screening systems: BD Focal-
Point GS Imaging System (formerly known as TriPath AutoPap system) and HOLOGIC
ThinPrep Imaging System (formerly known as Cytyc Thin Prep Imaging System) [15].

The FocalPoint GS Imaging System works on LBC slides and any Pap stained con-
ventional smear. The machine scores and ranks the slides based on the likelihood of
abnormality and categorizes them into four groups: review, no further review (NFR),
process review and quality control (QC) review [15]. Among the analysed slides the NFR
group, which comprises up to 25% of the total qualified slides, can be archived without
human review. The ThinPrep AutoPap system, by contrast, can only work on ThinPrep

1The sensitivity measures the proportion of actual positives that are correctly classified.
2The specificity measures the proportion of negatives which are correctly classified.

, Signals and Systems, Master of Science Thesis 2013 3


CHAPTER 1. INTRODUCTION

LBC slides with a special stain which is nearly stoichiometric1 for DNA content. This
system automatically selects 22 field of view (FOV) images that are of diagnostic interest
for cytopathologists, from among a total of 120 FOVs acquired on each slide. These 22
FOVs must then be fully reviewed by a cytopathologist. The system provides no scoring
or ranking of the slides [15].

Whilst automated screening systems can reduce false negatives attributed to inter-
pretation errors, they cannot reduce false negatives due to sampling errors. Research
[16] suggests that a phenomenon known as malignancy-associated changes (MACs) may
offer a solution.

1.2 Malignancy associated changes (MACs)

The expression malignancy associated changes (MACs) was coined by Nieburgs et al.
(1959) [17]. They reported subtle changes in nuclei of apparently normal looking cells
“adjacent to or distant from malignant tumours” [18]. Research in the 1980s identified
sub-visual alterations in intermediate cells from cervical atypical smears [19].

The development of an automated screener based on detecting MACs can potentially
overcome the problem of sampling error because it is not necessary to perform an ex-
haustive review of all of the cellular material to identify diagnostic cells but rather to
look for subtle nuclear texture changes in a sub-population of cells sampled from the
slide.

Based on the available literature, the most effective MAC features seem to be tex-
tural features [20]. To date, most of the approaches for defining nucleus texture have
been based on stochastic approaches. However, the work of Mehnert [20] suggests that
an alternative structural approach to defining texture, which corresponds well to what
cytopatologists perceive from cell nuclei, is more effective to describe the chromatin2

distribution inside the nuclei.

1.3 Aim and objectives

The aim of this research was to fully explore the structural approach to chromatin pattern
description and to evaluate the efficacy of the features derived from it for discriminating
between normal and abnormal Pap slides. The research had the following objectives:

1. To develop a robust algorithm for detecting and segmenting cell nuclei in digitized
Pap smear images obtained using bright-field microscopy;

2. To develop structural texture features that quantitatively characterise the pattern
(arrangement, size, shape, etc.) of the nuclear chromatin;

1Stains for which the amount of stain uptake in the nucleus is proportional to the amount of DNA
are Stoichiometric stains.

2Chromatin is the combination of DNA and proteins that make up the contents of the nucleus of a
cell.

4 , Signals and Systems, Master of Science Thesis 2013


1.4. SCOPE

3. To determine the most discriminatory subset of features for discriminating between
normal and abnormal slides using real clinical data; and

4. To evaluate the performance of a classifier(s), based on the selected features, using
real clinical data.

1.4 Scope

As noted in section 1.3, MACs are subtle sub-visual alterations in the appearance of
normal looking cells from an abnormal Pap smear. The features which appear to have
the most discriminatory power are nuclear texture features [21]. These features reflect
chromatin structure and characterise the distribution of chromatin inside the nucleus.
The MAC approach to analysing Pap smear slides is conceptually straight forward (see
Figure 1.1). The process involves automatically capturing digital images of individual
FOVs from a Pap smear slide, identifying the location of nucleus like objects (scene
segmentation), segmenting the nuclei like objects in the image (nucleus segmentation),
extracting quantitative texture and other features for each nucleus and finally classifying
the slides as either normal or abnormal based on these features.

The proposed MAC-based cervical screening approach has the following steps:

1. Scanning the Pap-stained slide using a light microscope coupled with a CCD cam-
era with multiple objectives.

2. Capturing multiple images at different focal planes from interesting fields of view
(FOVs) on the slide.

3. Generating extended depth-of-field (EDF) images for each FOV. This involves
combining multiple focal planes for each FOV to obtain a single image where each
object is all in focus.

4. Locating and segmenting the free-lying cell nuclei in each EDF image, and per-
forming artefact rejection to make sure only nuclei-like objects are retained;

5. Segmenting the nuclear chromatin inside each nucleus into texture primitives (blobs);

6. Extracting features from this structural model to quantitatively characterise the
chromatin pattern;

7. Deriving slide-based features from the features in 6 in order to classify the slide as
normal or abnormal.

This thesis does not consider the slide scanning and FOV acquisition steps (steps 1
and 2).

, Signals and Systems, Master of Science Thesis 2013 5


CHAPTER 1. INTRODUCTION

Figure 1.1: Data acquisition and nucleus segmentation in an automated screening system.
(a) The cytometer1 scans the Pap-stained slide and captures scenes from the deposition
area on the slide in a predefined way. (b) The location of the cervical epithelial nuclei are
identified inside each microscope field of view. (c) Each nucleus-like object is then segmented
by defining its boundaries.

1The cytometer utilises a CCD camera mounted on a light microscope (fitted with a 40× objective
lens) to acquire 8-bit images of Papanicolaou-stained cells.

6 , Signals and Systems, Master of Science Thesis 2013


1.5. STRUCTURE OF THE THESIS

1.5 Structure of the thesis

This chapter has:

• Provided an overview of the Pap smear test, including its shortcomings, and dis-
cussed the rationale for automated screening of Pap smear slides.

• Described the malignancy associated changes (MACs) phenomenon and its rele-
vance to automated screening, in particular in addressing the problem of sampling
error.

• Defined the aim, objectives, and scope of this research.

The remainder of the thesis is organised as follows:

Chapter 2 This chapter presents two literature reviews pertinent to chapter 3 and
chapter 4. The first is a review of the cell and cell nucleus segmentation meth-
ods in Pap smear images published in the literature. The second is a review of
the texture features available in the literature devised to quantify chromatin tex-
ture/distribution.

Chapter 3 This chapter deals specifically with the problem of accurately and robustly
segmenting the cervical cell nuclei in digitized light microscopy images of Pap
smears. A novel algorithm is developed to detect and segment free-lying interme-
diate cell nuclei. The algorithm includes three main steps of locating the free-lying
nuclei, delineating nuclei and rejecting the artefacts. This chapter also presents
an empirical evaluation of the proposed segmentation algorithm for both detection
and delineation of free-lying nuclei in Pap smear images.

Chapter 4 This chapter deals with the problem of quantitative characterisation of chro-
matin texture and presents a set of novel structural texture features to describe
nuclear chromatin patterns in cells on a conventional Pap smear. These features
are derived from a segmentation of the chromatin into blob-like primitives. The
proposed set of features are, in particular, derived from statistics of morphometric
features and contextual features computed for these blobs.

Chapter 5 This chapter presents an evaluation of the performance of the proposed
structural chromatin texture features. In particular, it presents an investigation
of the most discriminatory subset of features, from among the proposed features
and a wide range of features drawn from the literature, for discriminating between
normal and abnormal Pap smears using the MAC approach. The chapter presents
the details of the two experiments carried out in this study. The first is a fea-
ture selection experiment performed to obtain the most discriminatory subset of
features. The second experiment is to evaluate the performance of a variety of clas-
sifiers built using the feature subset obtained in the first experiment to discriminate
between the normal and abnormal slides.

, Signals and Systems, Master of Science Thesis 2013 7


CHAPTER 1. INTRODUCTION

Chapter 6 This chapter reviews the work that is presented in this thesis and sum-
marises the major contributions and findings. In addition, it outlines the limita-
tions of the research undertaken, and proposes an avenue of future research.

The material presented in chapters 3, 4 and 5 have been published in the Proceedings of
the 2012 Annual International Conference of the IEEE Engineering in Medicine and Bi-
ology Society (EMBC 2012) [22] and the Proceedings of the 2013 SPIE Medical Imaging
Conference [23].

8 , Signals and Systems, Master of Science Thesis 2013


2
Literature review

T
his chapter presents two literature reviews pertinent to chapter 3 and chapter 4.
The first is presented in Section 2.1. It is a review of the cell and cell nucleus
segmentation methods in Pap smear images published in the literature. The

conclusions motivate the method proposed in chapter 3. The second is presented in
section 2.2. It is a review of the standard texture features available in the literature
devised to quantify chromatin texture/distribution.

2.1 Review of existing methods for segmenting cells and
nuclei in Pap smear images

This section provides a review of the existing literature on the topic of automated de-
tection and segmentation of cells and cell nuclei in Pap smear images. The following
scientific databases were searched: IEEEXplore1, Inspec2, Medline3, ScienceDirect4. The
main keywords chosen were: Segmentation, cell segmentation, nuclei segmentation, Pap
smears. Image segmentation is the process of partitioning an image into sub-images
corresponding to the objects of interest and background. In general, automated segmen-
tation is one of the most difficult tasks in image processing. Numerous algorithms have
been published in the literature for segmenting cells and cell nuclei in microscopy images.
They can be categorized according to the primary underlying segmentation methodol-
ogy used: global and adaptive thresholding [24], watershed transform [24, 25], boundary
detection algorithms and deformable models [26, 27, 28], and edge enhancement based
techniques [29, 30]. These methods are summarized in table 2.1.

1http://ieeexplore.ieee.org
2http://www.engineeringvillage.com
3http://www.ncbi.nlm.nih.gov/pubmed
4http://www.sciencedirect.com

9


CHAPTER 2. LITERATURE REVIEW

The basic methods underlying the existing cell and cell nucleus segmentation al-
gorithms mentioned in table 2.1 are presented in Appendix B. These methods employ
fundamental concepts and elements of mathematical morphology (see Appendix A).

An objective comparison between the performances of these methods is difficult,
either because no quantitative analysis of performance is provided or because widely
different evaluation methodologies (including data and methods) have been used. The
lack of information about how some parameters are derived further complicates the
assessment of these methods. Table 2.1 shows a detailed comparison of the cell and cell
nuclei segmentation methods in Pap smear images available in the literature.

The watershed approach to segmentation has proved to be a powerful and fast seg-
mentation technique for both object boundary delineation and region-based segmenta-
tion. Simplicity, speed and complete division of the image are the properties that make
the watershed transform a popular method for many different image segmentation appli-
cations. The method has been frequently applied to biological images and has produced
good results. Watershed-based methods have also been applied for segmentation of clus-
tered cell nuclei [25, 31]. However, these methods usually lead to over-segmentation.
In order to alleviate this issue, heuristic rules are defined (as a post-processing step) to
merge the over-segmented regions to produce the final segmented image. The alternative
approach is to use marker-controlled watershed, which effectively handles the problem of
over segmentation [32]. The method requires that the extracted markers represent true
cell nuclei. In the following chapter we present a novel algorithm for extracting markers
of candidate free-lying nuclei-like objects for subsequent marker-controlled watershed
segmentation to obtain the nucleus boundaries.

10 , Signals and Systems, Master of Science Thesis 2013


2.1. REVIEW OF EXISTING METHODS FOR SEGMENTING CELLS AND NUCLEI IN PAP SMEAR IMAGES

T
a
b
le

2
.1

:
C

om
p

ar
is

on
b

et
w

ee
n

ce
ll

a
n

d
ce

ll
n
u

cl
eu

s
se

g
m

en
ta

ti
o
n

a
p
p
ro

a
ch

es
fo

u
n
d

in
th

e
li
te

ra
tu

re

A
u
th

o
r

S
e
g
m

e
n
ta

ti
o
n

M
e
th

o
d

Im
a
g
e
s

Im
a
g
e

si
z
e

N
u
m

b
e
r

o
f

c
e
ll
s

Q
u
a
n
ti

ta
ti

v
e

m
e
a
su

re
o
f

p
e
rf

o
rm

a
n
c
e

B
a
m

fo
rd

et
a
l.

[2
6
]

A
ct

iv
e

co
n
to

u
rs

fo
r

n
u
cl

eu
s

se
g
m

en
ta

ti
o
n

2
0
1
3
0

1
2
8
×

1
2
8

2
0
1
3
0

(S
in

g
le

ce
ll

im
a
g
es

)
V

is
u

a
l

ev
a
lu

a
ti

o
n
:

9
9
.6

%
o
f

th
e

P
a
p

st
a
in

ed
ce

ll

im
a
g
es

w
er

e
co

rr
ec

tl
y

se
g
m

en
te

d
.

W
u

et
a
l.
[2

8
]

T
em

p
la

te
fi

tt
in

g
fo

r

n
u
cl

eu
s

se
g
m

en
ta

ti
o
n

1
8
0
×

1
0
0

1
M

is
cl

a
ss

ifi
ca

ti
o
n

ra
te

o
f

th
e

p
a
ra

m
et

ri
c

fi
tt

in
g

a
p
-

p
ro

a
ch

fo
r

th
e

sy
n
th

et
ic

ce
ll

im
a
g
e≤

5
%

G
a
rr

id
o

et
a
l.

[2
7
]

T
em

p
la

te
fi

tt
in

g
fo

r

n
u
cl

eu
s

se
g
m

en
ta

ti
o
n

3
U

n
k
n

o
w

n
U

n
k
n

o
w

n
N

o
Q

u
a
n
ti

ta
ti

v
e

m
ea

su
re

s

L
ez

o
ra

y
et

a
l.

[2
5
]

W
a
te

rs
h

ed
tr

a
n
sf

o
rm

fo
r

n
u
cl

eu
s

se
g
m

en
ta

ti
o
n

1
0

U
n

k
n

o
w

n
2
0
9

V
in

et
m

ea
su

re
*
:

2
.2

4
a
n
d

3
.4

1
in

R
G

B
a
n

d
H

S
L

(H
u

e,
S

a
tu

ra
ti

o
n

,
L

u
m

in
a
n
ce

)
co

lo
r

sp
a
ce

.

M
ea

n
d

iff
er

en
ce

fr
o
m

th
e

g
ro

u
n

d
tr

u
th

:
2
.8

%

(R
G

B
)

-
0
.4

7
%

(H
S
L

).

L
a
ss

o
u
a
o
u
i

et
a
l.

[3
3
]

G
en

et
ic

a
lg

o
ri

th
m

fo
r

ce
ll

se
g
m

en
ta

ti
o
n

2
2
5
6
×

2
5
6

U
n

k
n

o
w

n
N

o
Q

u
a
n
ti

ta
ti

v
e

m
ea

su
re

s

Y
a
n

g
-M

a
o

et
a
l.

[2
9
]

E
d

g
e

d
et

ec
to

r
fo

r

n
u
cl

eu
s

a
n

d
cy

to
p

la
sm

se
g
m

en
ta

ti
o
n

U
n

k
n

o
w

n
6
4
×

6
4

1
2
4

(s
in

g
le

ce
ll

im
a
g
es

)

T
h
e

a
v
er

a
g
e

se
g
m

en
ta

ti
o
n

er
ro

rs
(M

H
D

,
E

M
M

,

R
A

E
,

M
E

)*
*

o
f

0
.1

5
2
3

a
n

d
0
.0

7
7
5

fo
r

n
u

cl
eu

s
a
n

d

cy
to

p
la

sm
se

g
m

en
ta

ti
o
n

,
re

sp
ec

ti
v
el

y.

L
in

et
a
l.

[3
0
]

E
d

g
e

d
et

ec
to

r
fo

r

n
u
cl

eu
s

a
n

d
cy

to
p

la
sm

se
g
m

en
ta

ti
o
n

1
0

U
n

k
n

o
w

n
1
0

(s
in

g
le

ce
ll

im
a
g
es

)

A
v
er

a
g
e

se
g
m

en
ta

ti
o
n

er
ro

r
(t

h
e

a
v
er

a
g
e

o
f

M
E

,

R
A

E
,

a
n

d
M

H
D

)
o
f

0
.1

3
2
3

fo
r

n
u

cl
eu

s
se

g
m

en
ta

-

ti
o
n

.

P
li
ss

it
ie

t
a
l.

[3
4
]

E
d

g
e

d
et

ec
to

r
fo

r

n
u
cl

eu
s

se
g
m

en
ta

ti
o
n

3
8

1
5
3
6
×

2
0
4
8

5
6
1
7

S
en

si
ti

v
it

y
:

9
0
.5

7
%

fu
zz

y
C

-m
ea

n
s

(F
C

M
),

6
9
.8

6
%

su
p
p

o
rt

v
ec

to
r

m
a
ch

in
e

(S
V

M
)

S
p

ec
ifi

ci
ty

:
7
5
.2

8
%

(F
C

M
),

9
2
.0

2
%

(S
V

M
)

*
V

in
e
t

m
e
a
su

re
[3

5
]

is
u
se

d
to

q
u
a
n
ti

fy
th

e
d
is

ta
n
c
e

b
e
tw

e
e
n

tw
o

se
g
m

e
n
ta

ti
o
n
s

a
n
d

c
o
rr

e
sp

o
n
d
s

to
th

e
c
o
rr

e
c
t

c
la

ss
ifi

c
a
ti

o
n

ra
te

.

*
*

M
is

c
la

ss
ifi

c
a
ti

o
n

e
rr

o
r

(M
E

),
e
d
g
e

m
is

m
a
tc

h
(E

M
M

),
re

la
ti

v
e

fo
re

g
ro

u
n
d

a
re

a
e
rr

o
r

(R
A

E
),

a
n
d

m
o
d
ifi

e
d

H
a
u
sd

o
rff

d
is

ta
n
c
e

(M
H

D
),

a
re

o
ft

e
n

u
se

d
a
s

se
g
m

e
n
ta

ti
o
n

p
e
rf

o
rm

a
n
c
e

m
e
a
su

re
s

v
a
ry

in
g

fr
o
m

0
fo

r
a
n

a
b
so

lu
te

ly
c
o
rr

e
c
t

se
g
m

e
n
ta

ti
o
n

to
1

fo
r

a
to

ta
ll
y

e
rr

o
n
e
o
u
s

c
a
se

[3
6
].

, Signals and Systems, Master of Science Thesis 2013 11


CHAPTER 2. LITERATURE REVIEW

2.2 Review of existing features devised to quantitatively
characterise chromatin distribution

This section presents a review of methods proposed for quantitative characterisation of
the distribution of chromatin; i.e chromatin texture.

2.2.1 Review of chromatin texture features

A taxonomy of features for cytometry (cell measurement) on microscope images can be
found in Appendix C. One class of features in this taxonomy are texture features. In
principle such features can be applied to any image object including a cell nucleus.

Two main approaches exist for describing the chromatin arrangement/texture in the
cell nucleus. The first approach, called the statistical approach, assumes that texture
is a realization of a stochastic process governed by a set of parameters [37] and charac-
terises the chromatin distribution by second or higher order statistics. In the other, the
structural approach, the chromatin distribution is assumed to be composed of primitives
that are arranged according to certain placement rules.

Traditionally MAC features have been based on a statistical approach to defining
texture. However such features do not correspond well to the terms used by cytopathol-
ogists to describe chromatin texture such as heterogeneity, granularity, margination,
condensation, compaction, clumping, diffuse, blobs and particles [20]. Another difficulty
is that these features are sensitive to changes in, or non-uniformity of, illumination and
staining. This motivates interest in a structural approach to chromatin texture analysis.

Several methods based on structural texture analysis have been proposed to detect
structural alterations of the nuclear chromatin. Beil et al. [38] proposed a dual ap-
proach to structural texture analysis for microscopic cell images by region or by lines.
In particular they describe the texture in terms of:

• the properties and the arrangement of regions; and

• the properties and the arrangement of lines.

Beil et al. [39] proposed a set of region texture features that corresponds to human
vision, such as: number of regions, number of large regions (size>threshold), number
of small regions (size<=threshold), number of regions at the boundary of the analysed
area, average size of region, etc. They also proposed features for line textures including
directionality and fractal dimension of line structures. Albregtsen et al. [40] presented
a structural texture analysis approach that “for each pixel in the image uses concepts
from adaptive filtering to find the neighbouring pixels that belong to the local texel”.
Thereafter, moment based features are extracted to characterise the grey level texel as
an object. Walker and Jackway [41] used features based on the statistics of geometrical
(SGF) attributes of connected regions developed by Chen et al. (statistical geometrical
features for texture analysis) to quantitatively characterise the chromatin in the nuclei
of Papanicolaou-stained cervical cells [41]. Mehnert [20] proffered a method, called the

12 , Signals and Systems, Master of Science Thesis 2013


2.2. REVIEW OF EXISTING FEATURES DEVISED TO QUANTITATIVELY CHARACTERISE CHROMATIN

DISTRIBUTION

adjacency graph attribute co-occurrence matrix (AGACM) that combines both struc-
tural and statistical/stochastic aspects of texture for characterising both blob-like and
mosaic patterns (texture) in the plane.

The Cyto-Savant imaging system computes 116 nucleus features, which can be grouped
into 3 general categories: nuclear morphological features, chromatin texture, and DNA
content [42]. The measured texture features can be divided into statistical and struc-
tural groups. Chromatin distribution is statistically described by Markovian and non-
Markovian texture features, fractal texture and run length features. Discrete texture
features, however, reflect the structural aspect of chromatin distribution and are com-
puted by first segmenting the nucleus into regions of low, medium, and high optical
density. These regions are defined by two global thresholds. Discrete texture features
characterise the segmented regions by computing their size, shape, optical density, and
spatial distribution. Details of algorithms are described elsewhere [42].

, Signals and Systems, Master of Science Thesis 2013 13


3
Novel algorithm for segmenting

free-lying cell nuclei

T
he present chapter deals specifically with the problem of accurately and robustly
segmenting the cervical cell nuclei in digitized light microscopy images of Pap
smears. Given that the aim of the segmentation is to support MAC analysis,

the goal is not to segment every nucleus but rather to segment free-lying nuclei. These
are easier to segment and the nuclear texture is less likely to be affected by overlapping.
Based on the literature review in Section 2.1 we opted to develop our own novel algorithm
based on the marker controlled watershed transform for detecting and segmenting free-
lying intermediate cell nuclei.

The remainder of this chapter is organised as follows. Section 3.1 presents the high-
level description of the proposed algorithm. Section 3.2 presents the proposed marker
extraction algorithm corresponding to the first step of the segmentation algorithm. Sec-
tion 3.3 describes the implementation of the marker-controlled watershed algorithm for
segmenting and delineating the detected nuclei. Section 3.4 presents the artefact rejec-
tion strategy developed to discard segmented objects that are not nuclei. Section 3.5
describes the choice of features for the artefact rejection step. An empirical evaluation
of the performance of the proposed segmentation algorithm is presented in section 3.6.
Finally, section 3.7 summarises the chapter.

3.1 High-level description of the proposed algorithm

The proposed segmentation approach is conceptually a 3 step process (see Table 3.1):

1. Detecting or locating the objects of interest;

2. Delineating and labelling those objects; and

15


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

3. Artefact rejection to ensure only desired objects are retained in the segmentation
results.

Table 3.1: The proposed algorithm to detect and segment free-lying intermediate cell nuclei

• Input: Grey-scale image containing a field-of-view (FOV) from a Pap smear
slide.

• Output: Binary image containing connected components, each corresponding
to a free-lying intermediate cell nucleus.

Steps:

1. Extract inner markers for free-lying nuclei-like objects (these locate the inte-
riors of candidate objects).

2. Apply the marker-controlled watershed transform on the FOV image with
respect to the inner markers (this yields an outer marker that lies between the
candidate objects).

3. Apply the marker-controlled watershed transform on the gradient image
with respect to the inner and outer markers (this yields the object bound-
aries/masks).

4. Compute the area and quantitative measures of shape and texture granularity
for each segmented nuclei-like object.

5. Reject objects that are too small or large to be intermediate cells, that do not
have an elliptical shape, and that do not have a granular texture.

3.2 Extraction of inner markers (step 1)

In Pap-stained cervical images, the cell nucleus appears darker than the rest of the cellu-
lar material. Approaches to finding a marker (connected component) within each nucleus
exploit this fact. Several candidate methods for extracting marker for nuclei were con-
sidered including: tophat transform [43], Jackway tophat [44], h-minima [45]. Appendix
D presents the basic methods underlying the existing nuclear marker extraction algo-
rithms. However, we found that none of these methods robustly detects free-lying nuclei.
Therefore, we developed a new nuclear marker extraction algorithm for this purpose.

The algorithm is based on the tophat transform defined in terms of an annular closing.
The grey-scale annular closing operator is defined

Ψanclo(f,B) = (f 	B) ∨ f (3.1)

16 , Signals and Systems, Master of Science Thesis 2013


3.3. MARKER-CONTROLLED WATERSHED SEGMENTATION OF THE DETECTED NUCLEUS-LIKE OBJECTS

(STEPS 2-3)

where f is a grey-scale image and B is a symmetric structuring element that does not
contain its origin. When annular closing is applied to a grey-scale image, isolated dark
spots will be removed. The tophat by annular closing, given by Ψanclo(f,B)− f , yields
the removed isolated dark spots. These spots serve as candidate cell nuclei markers [46].

The procedure is illustrated in Figure 3.1 for a single cervical cell and an annu-
lar structuring element. The control over the size and relative isolation of the nuclei
is achieved by changing the inner radius and outer radius of the annular structuring
element.

(a) (b) (c) (d)

Figure 3.1: Extracting an inner marker for a free-lying cell nucleus. (a) Original image.
(b) Grey-scale erosion with an annular structuring element. (c) Pixel-wise maximum of (a)
and (b). (d) Arithmetic difference between (c) and (a).

In order to detect a nucleus, an annular structuring element with an inner radius
larger than that of the nucleus is needed. The nuclei of normal intermediate and
parabasal cells measure approximately 8µm in diameter and may enlarge up to 15µm in
the case of malignant or rare benign disorders changes [47]. Hence, to detect all nuclei
within this range, a set of independent annular closings with structuring elements with
a range of inner diameters is needed. This is then the basis for the more sophisticated
inner marker extraction algorithm presented in Algorithm 1.

3.3 Marker-controlled watershed segmentation of the de-
tected nucleus-like objects (steps 2-3)

Marker-controlled watershed segmentation is used to delineate the boundaries of the cell
nuclei detected by the inner marker extraction algorithm. Rather than flooding from
the regional minima, as is the case for the traditional watershed transform, flooding
is initiated from the markers. The procedure for delineating the detected nuclei-like
objects is as follows. First, a watershed segmentation of the original image (f) with
respect to the inner markers is performed to obtain the outer marker. Next a watershed
segmentation of the gradient magnitude image with respect to the union of the inner

, Signals and Systems, Master of Science Thesis 2013 17


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

markers and the outer marker is performed. The resulting watershed lines then delineate

Algorithm 1 Proposed nuclei inner marker extraction algorithm

Input: Grey-scale image (f), and parameters λ0, α,r1 and r2

Output: Inner markers binary mask (Xm) for free-lying nuclei-like objects.

1: Let B0 be a disk structuring element of radiusλ0

2: for λ = r1 → r2 do
3: Let Ban be an annular structuring element with inner and outer radii of λ and
λ+ α respectively.

4: g = Ψanclo(f,B)− f
5: X1 = g > 0
6: X2 = (X1 	B0)⊕B0

7: X = X ∪X2

8: end for
9: Xm = set of centroids of the connected components in X.

the nuclei-like objects. This idea is illustrated in Figure 3.2. The reason behind using
the gradient image is that the nuclei boundaries are located on high gradient points.
Therefore the watershed transform results in regions with boundaries corresponding to
those of the nuclei.

The gradient magnitude image can be quite noisy. For this reason, the original image
is median filtered to remove impulse noise (size of the kernel is selected to be 3× 3), its
magnitude of gradient is computed, and the result is Gaussian filtered.

There exist many different approximations for the computation of the magnitude
of the gradient of an image. Several approaches with similar results have been tested.
Among all these methods, the very simple and straight forward Gaussian derivative oper-
ator is used. The standard deviation of Gaussian filter is selected to be 0.9. The gradient
magnitude image is approximated by sum of the absolute values of the convolution of
the image with the vertical and horizontal Gaussian derivative operators.

18 , Signals and Systems, Master of Science Thesis 2013


3.3. MARKER-CONTROLLED WATERSHED SEGMENTATION OF THE DETECTED NUCLEUS-LIKE OBJECTS

(STEPS 2-3)

(a) Watershed segmentation of the original image with respect to the inner markers (shown as disks)
yielding the outer marker.

(b) Gaussian filtered gradient magnitude of the median-filtered original image in (a).

(c) Watershed segmentation of (b) with respect to the union of the inner markers and the outer
marker.

Figure 3.2: Segmentation of the detected nuclei-like objects.

, Signals and Systems, Master of Science Thesis 2013 19


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

3.4 Artefact Rejection (steps 4-5)

The presence of overlapping cells, overlapping and/or folded cytoplasm, blood and cellu-
lar debris presents a formidable challenge for nuclei segmentation [21]. Ensuring that the
segmented objects are nuclei of cells relevant for MAC analysis is essential. The artefact
rejection strategy is based on the size, shape and granularity of the objects. Quantitative
measures of the area, elliptical shape, and the texture/granularity are computed for each
segmented nucleus-like object.

3.4.1 Size criterion

The squamous epithelium of the female genital tract is composed of three principal layers
[47]: the basal cell layer (immature), the intermediate cell layers, and the superficial
cell layers (most mature) (see Figure 3.3). As the cell maturation progresses toward
the surface, the amount of cytoplasm per cell increases [47]. The nuclei of superficial
cells are pyknotic and considerably smaller than intermediate and parabasal cells with
a nuclear diameter of about 4µm [47]. In a normal Pap smear usually only the upper
few layers of the squamous epithelium are removed and so the immature cells near the
base of the epithelium are not sampled [16]. Given that the aim of the algorithm is to
detect intermediate cell nuclei (presenting a fine network of chromatin and chromocenters
suitable for further MAC analysis), a threshold value can be defined to remove the
superficial cell nuclei from the segmentation result.

The nuclei of normal intermediate cells ranges approximately between 8µm to 15µm
in diameter and are oval-shaped [47], accordingly the minimum area can be defined.
The minimum area is deemed to be the area of a circle with radius rmin. This artefact
rejection step removes not only the superficial squamous cell nuclei but also many of the
small objects belonging to image artefacts.

20 , Signals and Systems, Master of Science Thesis 2013


3.4. ARTEFACT REJECTION (STEPS 4-5)

Figure 3.3: Cells of the squamous epithelium (freehand adaptation of Koss (2006, Figure
5-4)).

3.4.2 Shape criterion

Nuclei of the intermediate cervical cells are elliptical in shape [47]. To assess whether
a candidate nucleus object is elliptical, the elliptic variance feature is computed. Some
other features we looked at to fulfill the shape criterion are Danielsson’s G shape fac-
tor, ellipticity, elliptic variance and Dice-area (see Appendix E). The elliptic variance
descriptor (Evar) [48] measures how closely the borders of the fitted ellipse agree with
those of the segmented object (see Figure 3.4).

The simplest way to generate a signature for coarseness of the boundary of an object
is to compute the radial distances of the object boundary from the object centroid.
Suppose g = [gx, gy]

T is the centroid of the object and object boundary has N data
points pi = (xi, yi)

T . The covariance matrix of the data points is calculated using:

C =
1

N

N∑
i=1

(pi − g)(pi − g)T

Then the radial distances (d′i) of boundary points from the centroid are calculated as:

d′i =
√

(pi − g)T ·C · (pi − g)

, Signals and Systems, Master of Science Thesis 2013 21


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

Figure 3.4: Ellipse fitting to an arbitrary shape

The mean (µ′R) and standard deviation (σ′R) of the radial distances over all boundary
points are:

µ′R =
1

N

N∑
i=1

(d′i) and σ′R =

√√√√ 1

N

N∑
i=1

(di − µ′R)2

and finally Evar is defined:

Evar =
σ′R
µ′R

(3.2)

It should be noted that for elliptical objects Evar is close to 0.

3.4.3 Texture criterion

The feature chosen to measure the degree of the granularity of the texture in a nucleus
is the Tamura coarseness feature. Tamura [49] devised a texture model corresponding to
visual perception. It is useful and robust in the sense that it does not depend directly on
the exact grey-levels in the object and so has robustness to non-uniformity of illumination
and staining variations (provided that these do not greatly affect the size and number
of texture primitives). The Tamura coarseness aims to pick the biggest size where the
texture is present. The primitive elements (textures) are larger in size but smaller
in number for a coarse texture, while a fine texture contains a large number of small
primitives. The computational procedure can be summarized in the following steps.

1. Take averages at each pixel (x,y) over neighbourhoods whose sizes are powers of

22 , Signals and Systems, Master of Science Thesis 2013


3.4. ARTEFACT REJECTION (STEPS 4-5)

two 2k × 2k.

A (x,y) =

x+2k−1−1∑
i=x−2k−1

y+xk−1−1∑
j=y−2k−1

f(i,j)

22k
(3.3)

where f(x,y) is the grey-level at (x,y). The averaging procedure at different levels
is depicted in figure 3.5.

Figure 3.5: Computing averages at different scales

2. At each pixel, take the absolute difference between pairs of non-overlapping aver-
ages on opposite sides of the point in both horizontal and vertical directions. The
difference in the horizontal case is:

Ek,h(x,y) =
∣∣∣Ak(x+ 2k−1,y)−Ax(x− 2k−1,y)

∣∣∣ (3.4)

3. At each pixel, pick the best size of k which maximizes the difference Ek(x,y) in
either direction and set the best size to Sopt(x,y) = 2k.

4. Calculate the coarseness measure by taking the average of Sopt over the entire
image:

Fcrs =
1

m× n

m∑
i

n∑
j

Sopt(i,j) (3.5)

Where, m and n are the width and height of an image, respectively.

Figure 3.6 shows 6 images of cell nuclei with their corresponding Tamura coarseness
measures.

, Signals and Systems, Master of Science Thesis 2013 23


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

a b c

d e f

Figure 3.6: Measure of Tamura coarseness for 6 different cell nuclei.

Cell nuclei a b c d e f

Coarseness measures 9.10 9.52 9.56 10.41 10.72 10.9

(a) (b)

Figure 3.7: Two examples of artefacts rejected by the Tamura coarseness feature.

24 , Signals and Systems, Master of Science Thesis 2013


3.4. ARTEFACT REJECTION (STEPS 4-5)

Figure 3.8: Two sample FOVs from a Pap smear slide. Twelve samples of the segmented
objects before the artefact rejection step are highlighted. The quantitative measures of shape
and granularity for these objects are in Table 3.2

, Signals and Systems, Master of Science Thesis 2013 25


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

Table 3.2: Shape and granularity features for 6 different cell nuclei.

No Cell Nuclei Area Evar fcourse

1 480 0.031 9.48

2 331 0.69 8.79

3 924 0.07 11.95

4 740 0.043 12.23

5 1325 0.033 10.13

6 390 0.104 10.13

26 , Signals and Systems, Master of Science Thesis 2013


3.4. ARTEFACT REJECTION (STEPS 4-5)

Table 3.3: Shape and granularity features for 6 non-nuclei artefacts.

No Artefacts Area Evar fcourse

7 311 0.1138 10.78

8 460 0.129 11.10

9 392 0.072 9.80

10 336 0.137 9.31

11 288 0.109 12.80

12 464 0.209 11.81

, Signals and Systems, Master of Science Thesis 2013 27


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

3.5 Empirical evaluation of the proposed segmentation al-
gorithm

It was noted in the review in section 2.1 that many of the proposed segmentation al-
gorithms have not been formally empirically evaluated. Herein we present an empirical
evaluation of our proposed method relative to expert manual segmentation.

3.5.1 Image data

The data used in this study is a subset of 889 fields of view (FOVs) captured by a
cytopathologist from 68 Pap smear slides. Each FOV was acquired using a CCD camera
mounted on a light microscope. The images were captured with a 40× objective lens.
Each FOV image is of size 1024 × 1344 pixels with square pixels of size 0.25µm. The
grey-scale resolution is 8 bits per pixel.

3.5.1.1 Ground truth generation

Eleven slides, each containing a minimum of 100 non-superficial cervical cell nuclei, were
randomly selected from among the 68 slides. For each slide three FOVs were randomly
selected to yield a total of 33 FOVs.

Two graphical user interfaces (GUI) were developed to permit a user to review each
FOV and to place a marker on individual nuclei and also trace the boundary of nu-
clei. These two GUIs were designed specifically for the two purposes of marking and
manual border delineation of cell nuclei (see Figures 3.9 and 3.10). They provide the
following functions: cell nuclei marking function (by clicking within the nucleus area),
border delineation function (using the mouse to trace the cell nuclei boundaries) and a
measurement tool for checking the desired particles within the image. Figures 3.9 and
3.9 show two GUIs designed and used for ground truth generation.

Three untrained subjects were recruited to independently review the FOVs using
the GUI and to mark each free-lying nucleus. Prior to performing this task each was
shown examples of intermediate cell nuclei in another FOV (not one of the 33 FOVs they
had to review). Each subject was specifically instructed to mark elliptical objects, of
approximately the right size, with a well-defined boundary, and with a granular texture.
The set of all objects selected by at least two of the three subjects were taken to be the
ground truth for free-lying intermediate cell nuclei. Two image analysis experts (authors)
then used the GUI to independently trace the boundary of each ground truth nucleus.
These manual segmentations were taken to be the ground truth for the boundaries of
the free-lying intermediate cell nuclei.

28 , Signals and Systems, Master of Science Thesis 2013


3.5. EMPIRICAL EVALUATION OF THE PROPOSED SEGMENTATION ALGORITHM

F
ig

u
re

3
.9

:
G

U
I

fo
r

G
ro

u
n
d

tr
u
th

g
en

er
a
ti

o
n

, Signals and Systems, Master of Science Thesis 2013 29


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

F
ig

u
re

3
.1

0
:

G
U

I
fo

r
G

ro
u

n
d

tru
th

g
en

era
tio

n

30 , Signals and Systems, Master of Science Thesis 2013


3.5. EMPIRICAL EVALUATION OF THE PROPOSED SEGMENTATION ALGORITHM

3.5.2 Method and Experiment

The proposed segmentation algorithm was applied to the 33 FOV images. The param-
eters for different steps of the algorithm were selected after several experiments on a
small subset of images independent of the 33 selected FOV images. The minimum and
maximum values of the inner radius (r1 and r2) of the annular structuring elements (see
algorithm 1) were set to 22 and 33 pixels. The values were selected based on the nucleus
diameter range for the intermediate cell nuclei defined by Koss [47]. The outer radius
of each annulus was set to be two pixels more than the inner radius (i.e. α = 2) to
guarantee the extraction of inner markers of the adjacent free-lying nuclei. Finally the
size of the disk-structuring element (λ0) was set to 3 pixels (for noise removal).

The parameters for the artefact rejection step were also tuned on an independent data
set. Based on the defined size range of intermediate cell nuclei in [47], the threshold value
of 400 for segmented regions was selected for artefact rejection (The minimum area is
deemed to be the area of a circle with radius rmin). The threshold value of 0.095 and
13.2 were selected, respectively, for the elliptic variance and Tamura coarseness features.

Objects selected by the algorithm were compared to the ground truth nuclei obtained
manually and used to compute the sensitivity and specificity of the algorithm for the
detection of free-lying intermediate cell nuclei.

The accuracy of segmentation by the proposed algorithm for each detected mask was
assessed through comparison of the boundaries of the segmented object to the two cor-
responding boundaries (boundaries traced by two experts) in the ground truth datasets.
The similarity between pairs of masks and the similarity is computed. More specifi-
cally the similarity between pairs of masks was computed in terms of the Dice similarity
coefficient (DSC) scores defined by equation E.3.

3.5.3 Results

The sensitivity and specificity of the algorithm for the detection of free-lying interme-
diate cell nuclei is 94.71% and 85.3% respectively. Box-plots of the DSC scores for the
comparison of the proposed automatic segmentation to the two manual segmentations,
and for the comparison between the two manual segmentations are shown in Figure 3.11.

, Signals and Systems, Master of Science Thesis 2013 31


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

Figure 3.11: DSC scores for the comparison of the proposed automatic segmentation to the
two manual segmentations, and for the comparison between the two expert segmentations
(GT1 and GT2).

The agreement between the algorithm and the two manual segmentations is 97.30±
1.35% and 96.96 ± 1.7% respectively (mean DSC±standard deviation). The overall
agreement between the two expert segmentations is 97.26± 1.2% .

3.5.4 Discussion

The sensitivity of the algorithm to the detection of free-lying intermediate cell nuclei is
remarkably high. However its specificity, whilst still quite high, could be improved. A
review of false positives indicates that some of them are due to segmentation failures
as the result of severe background noise and artefacts. However other apparent failures
in fact represent genuine free-lying nuclei overlooked by the three recruited subjects.
Understanding the reasons for an algorithm’s failure modes can be used to improve its
robustness. The main failure modes responsible for detection of false positive cases are
discussed bellow.

The DSC scores for boundary delineation evaluation (see Figure 3.11) show that
nuclei boundaries obtained using the marker-controlled watershed transform are highly
accurate and consistent with the two experts’ visual perception of the intermediate cell
nuclei boundaries.

3.5.4.1 Failure modes

In the experiment 111 false positive cases were detected. In many of the cases the false
positives are visually similar in appearance to a real nucleus. Therefore the marker

32 , Signals and Systems, Master of Science Thesis 2013


3.5. EMPIRICAL EVALUATION OF THE PROPOSED SEGMENTATION ALGORITHM

detection algorithm extracts them as a nucleus-like object. The aim of the artefact re-
jection step is to discard a segmented artefact. The ellipticity and size features effectively
eliminate any object with an irregular size or shape. After reviewing the false positive
we discovered we concluded that the Tamura coarseness feature, whilst quite effective
in most cases, did not reject some of the artefacts as expected. Figure 3.12 shows two
such cases. The results suggest that other features be considered in future to improve
the specificity.

(a) (b)

Figure 3.12: (a) and (b) show two examples of false positive detection by the algorithm.

There were very few cases where there was no marker for a nucleus. Among the total
755 nuclei present in our database, the algorithm did not produce any marker for 14
nuclei. This basically happens when the nucleus is surrounded by many other objects,
especially when the adjacent objects have considerably lower intensity compared to the
the nucleus. An example of this type of failure is depicted in Figure 3.13a. All the other
nuclei markers were successfully extracted and no multiple seeds for a single object were
detected. The other mode of failure was due to inaccurate border delineation by the
watershed algorithm. This happens when there is very weak gradient information on
the boundary of the nucleus (see Figures 3.13b and 3.13c). A total of 6 nuclei were
missed due to watershed failure. Finally the particular thresholds defined for artifact
rejection features were responsible for eliminating some of the true segmented free-lying
nuclei. First of all the size feature has caused the rejection of the some of the segmented
nuclei that have slightly lower size than the predefined threshold. The ellipticity and the
Tamura coarseness features were responsible for discarding the rest of the true segmented
nuclei. Overall, 20 segmented regions that were truly segmented were rejected by the
artefact rejection features. Figure 3.14 shows some examples of these failure modes.

, Signals and Systems, Master of Science Thesis 2013 33


CHAPTER 3. NOVEL ALGORITHM FOR SEGMENTING FREE-LYING CELL NUCLEI

(a) (b) (c)

Figure 3.13: Examples of missed nuclei due to presence of background noise surrounding
the nucleus and watershed transform failure: (a) The marker extraction algorithm has failed
to provide a marker for the nuclei. (b) Example of failure of the morphological watershed
technique because of sharp edge information inside the object, as seen in the gradient image
(c)

(a) (b) (c) (d)

Figure 3.14: Examples of missed nuclei due to unsuitable choice of parameter values: (a)
and (b) The two objects have approximately the same size, however, the segmented nucleus
has a size slightly above the threshold value while the size of the other nucleus is slightly
below, hence discarded as an artefact. (c) and (d) Example of an accurately segmented
object which was discarded as an artefact in the artefact rejection step. This is because the
value for Tamura coarseness feature of this nucleus is lower than the defined threshold value.

3.6 Summary

This chapter presented a new algorithm for segmentation of free-lying cell nuclei in Pap
smears. The novelty of the algorithm stems from a robust marker selection method for
selecting candidate free-lying nuclei-like objects for subsequent marker-controlled water-
shed segmentation to obtain the nuclear boundaries. The algorithm also implements
artefact rejection based on size, shape, and nuclear granularity to ensure only the nuclei
of intermediate squamous epithelial cells are retained. In addition, this chapter presented
an empirical evaluation of the performance of the algorithm and discussed the results.
The sensitivity and specificity of the algorithm to the detection of free-lying intermediate
cell nuclei together with the accuracy of the segmentation of each nucleus detected by
the algorithm were presented.

34 , Signals and Systems, Master of Science Thesis 2013


4
Texture features characterising

chromatin distribution

T
his chapter is concerned with the quantitative characterisation of the texture of
chromatin. A review of chromatin texture features proposed to date was presented
in chapter 2. A novel set of structural chromatin texture features are presented

in this chapter. These are derived from a segmentation of the chromatin into blob-
like primitives. By considering each of the chromatin particles as individual regions
(objects), many features can be extracted that describe the distribution and arrangement
of those particles. This is the structural way of describing texture where the texture is
characterised by a description of its primitives and placement rules [20].

Using the segmented chromatin particles, two sets of features can be measured: Fea-
tures that solely characterise the blob-like particles in terms of measurements such as
area, perimeter, dynamics, etc. and features that characterise the spatial relationships
existing between the chromatin particles.

The remainder of this chapter is organised as follows. Section 4.1 presents the utilised
chromatin segmentation algorithm. Section 4.2 presents the proposed structural texture
features to characterise chromatin texture/distribution. Finally, Section 4.3 summarises
the chapter.

4.1 Chromatin segmentation

Chromatin is the material in the cell nucleus consisting of DNA and associated proteins.
It is so-named because of its ability to take on stain. In a Pap smear nuclear chro-
matin is visualised under light microscope as a mosaic of interchanging light and dark
regions within the nucleus (dark and light regions with high and low optical density1,

1The optical density also called extinction of a material is the logarithmic ratio of the incident
radiation to the transmitted radiation through that material.

35


CHAPTER 4. TEXTURE FEATURES CHARACTERISING CHROMATIN DISTRIBUTION

respectively). Regions of high optical density are well defined as particles [50].
The method chosen to segment the chromatin is proposed by Mehnert [20]. It can

be used to segment the dark and light particles (blobs) in a grey-level image of nucleus.
The grey levels in the image can represent either the intensity or optical density. Figure
4.1 shows a representation of a nucleus as a topographic relief. The dark particles or
chromatin blobs correspond to valleys and light particles correspond to mountains. In
a typical topographic relief, the valleys are associated with minima and the mountains
are associated with maxima.

Regional Maxima

Regional Minima

Figure 4.1: Representation of a cervical cell nucleus as a topographic relief (grey-level
represents height).

The chromatin segmentation algorithm proposed by Mehnert [20] has the following
steps:

1. Locating the regional minima;

• Applying a 3x3 median filter (to attenuate impulse noise);

• Up-sampling by a factor of 3 (to facilitate the rendering of watershed lines in
a subsequent step);

• Locating the regional minima (inner markers).

2. Computing the watershed transform of the filtered image with respect to the inner
markers (to produce an outer marker that delineates a zone of influence around
each regional minimum); and

3. Computing the magnitude of gradient image for the filtered image; and

4. Applying the watershed transform to the gradient image with respect to both the
inner markers and the outer marker (to delineate each dark chromatin blob).

Figure 4.2 shows examples of the segmentations produced by this algorithm. Light
particles can also be obtained by applying the algorithm to the negative of the image.

36 , Signals and Systems, Master of Science Thesis 2013


4.2. PROPOSED STRUCTURAL TEXTURE FEATURES

(a) (b) (c) (d)

Figure 4.2: Chromatin segmentation. (a) Original image of cell nucleus. (b) Watershed
transform of (a) with respect to regional minima (inner markers) yielding the outer marker.
(c) The morphological gradient image of (a). (d) Watershed segmentation of (c) with respect
to the union of the inner markers and the outer marker.

4.2 Proposed structural texture features

In this section we present a novel set of structural chromatin texture features derived
from a segmentation of the chromatin. By considering each of the chromatin particles as
individual regions (objects), many features can be extracted that describe the distribu-
tion and arrangement of those particles. Using the segmented chromatin particles, two
sets of features can be measured: features that characterise the spatial relations exist-
ing between the chromatin particles and features that solely characterise the blob-like
particles through measurements such as area, perimeter, dynamics, etc.

4.2.1 Graph-based features

The graph-based features are derived from ordinary delaunay graph (the dual of the or-
dinary Voronoi diagram), several graphs related to the area-Voronoi diagram–generalised
Delaunay graph, generalised Gabriel graph, generalised relative neighbourhood graph,
and the generalised minimum spanning tree–and from the area-Voronoi diagram itself.
These graphs describe different adjacency relationships between the chromatin particles.
Formally, a graph is represented by G = (V,E) where V is called the vertex set and E is
the edge set of the graph. Vertices correspond to particles and edges denote adjacency
(see [51] for details concerning the computation of the graphs).

Our proposed graph-based features can be categorised into 6 groups: Delaunay graph
based features, features based on the area-Voronoi diagram, features based on the gen-
eralised Delaunay graph drived from the area-Voronoi diagram, generalised Gabriel and
relative neighbourhood graph based features, and finally features based on the gen-
eralised minimum spanning tree. Definitions and basic geometric properties of these
graphs can be found in [51]. Figure 4.3 shows the area-Voronoi diagram constructed
by computing the watershed transform of the Euclidian distance transform of the seg-
mented chromatin particle masks and the generalised Delaunay graph derived from the
area-Voronoi diagram.

Figure 4.4 shows the graphs drived from the area-Voronoi diagram. The generalised
Delaunay graph (DG) is obtained directly from the area-Voronoi diagram. Generalised

, Signals and Systems, Master of Science Thesis 2013 37


CHAPTER 4. TEXTURE FEATURES CHARACTERISING CHROMATIN DISTRIBUTION

(a) (b) (c) (d) (e)

Figure 4.3: Illustration of how the area-Voronoi diagram and generalised Delaunay graph
are computed. (a) Segmented chromatin particles superimposed on the original nucleus
image. (b) Euclidian distance transform of the set complement of the union of the chromatin
particle masks in (a). (c) Area-Voronoi diagram. (d) The area-Voronoi diagram clipped to
the nuclear mask. (e) The region adjacency graph defined on the Voronoi regions then
defines a generalised Delaunay graph.

Gabriel graph (GG), generalised relative neighbourhood graph (RNG), and generalised
minimum spanning tree (MST ) are all obtained as sub-graphs of the generalised Delau-
nay graph. In particular the generalised Delaunay graph is treated as an ordinary De-
launay graph, with vertices now corresponding to particle centroids, and the sub-graphs
are obtained by removing select edges. This means that they each have the same vertices
and the edges satisfy MST ⊆ RNG ⊆ GG ⊆ DG. Several features can be computed
from the area-Voronoi diagram and associated graphs to quantify the arrangement of the
chromatin particles inside the nucleus. Features extracted from the Delaunay, Gabriel
and relative neighbourhood graphs measure the distances between chromatin particles
and describe variations in the connectivity of the nodes. Voronoi-based features describe
the shape and size of the Voronoi polygons/regions. The minimum spanning tree con-
nects all the vertices in a given graph in a way that the sum of the edge lengths are the
minimum possible.

(d )(c)(b )(a)

Figure 4.4: Graphs derived from the area-Voronoi diagram. (a) the generalised Delaunay
graph (b) the generalised Gabriel graph (c) the generalised relative neighbourhood graph
(d) the generalised minimum spanning tree. See the text for details.

38 , Signals and Systems, Master of Science Thesis 2013


4.2. PROPOSED STRUCTURAL TEXTURE FEATURES

4.2.2 Margination features

Our margination features characterise the distances of the segmented chromatin blobs
to the nucleus boundary. Six of these features are computed from the cookie-cutting-
distance [20] illustrated in Figure 4.5. The slopes of the cumulative frequency curve of
the cookie-cutting-distance at the 25th and 37th percentiles are calculated (M1 and M2)
as two measures of margination for each nucleus1.

Two other features for characterising margination are derived by computing the mean
and standard deviation of the distances in the cookie-cutting-distance image (M3 and
M4). These two features also include information about the size of the nucleus, which is
an indicator of the state of cell in certain disease processes [52]. In addition, features M3
and M4 are both normalized to the maximum value of the nucleus boundary distance
transform to yield pure measures of margination (M5 and M6). One additional feature
is the mean of the minimum distances between the geometrical centres of chromatin
particles and the nucleus boundary (M7).

0 5 10 15 20 25 30 35
0

500

1000

1500

C
u

m
m

u
la

ti
ve

 H
is

to
g

ra
m

C
u

m
m

u
la

ti
ve

 H
is

to
g

ra
m

Distance from boundary

Distance from boundary

(a) (c)

(g)

(b)

(f)

Slope = 35.05

(d)

5 10 15 20 25 30 35 40
0

200

400

600

800

1000

1200

1400

(e) (h)

Slope = 37.74

Figure 4.5: Illustration of how the margination features M1 and M2 are computed.
(a),(e) Segmented chromatin particles. (b),(f) Euclidean distance transform of the nucleus
boundary. (c),(g) Portions of (b),(f) cut out by the segmented chromatin particle masks.
This is called cookie-cutting-distance. (d),(h) Slope of the cumulative frequency curve of
the cookie-cutting-distance at the 37th percentile.

4.2.3 Clustering features

Four quantitative measures for chromatin particle clustering are proposed. For the first
measure, the distance transform of the area-Voronoi diagram circumscribed by nucleus

1These percentiles were chosen empirically using the cell images that were not used in the empirical
evaluation described in the next chapter.

, Signals and Systems, Master of Science Thesis 2013 39


CHAPTER 4. TEXTURE FEATURES CHARACTERISING CHROMATIN DISTRIBUTION

boundary is calculated. Portions of the resultant image are cut out by the segmented blob
masks. Afterwards, the cumulative frequency histogram of this image is constructed and
then the slopes of the cumulative frequency curve at the 25th and 37th percentile2 are
calculated to give the first two measures C1 and C2. Features C3 and C4 are computed
from the area-Voronoi diagram and the distance transform image used in its construction
(Figure 4.7a and 4.7b) as follows. Firstly for every pair of Voronoi regions that share
a common edge, the convex hull is computed for the union of the corresponding pairs
of particles. Secondly the union of these convex hulls is filled to form a binary mask
(Figure 4.7d). Thirdly this mask is intersected with the divide lines of the area-Voronoi
diagram (Figure 4.7e). Finally features C3 and C4 are the mean and standard deviation,
respectively, of the distance transform values corresponding to the residual divide line
pixels in this intersection (Figure 4.7f).

1 2 3 4 5 6 7 8 9 10 11 12 13 14
0

100

200

300

400

500

600

700

800

900

1000

Slope = 131.90

C
u

m
m

u
la

ti
ve

 H
is

to
g

ra
m

C
u

m
m

u
la

ti
ve

 H
is

to
g

ra
m

Distance from boundary

Distance from boundary

(g)(f)(e) (h)

1 2 3 4 5 6 7 8 9 10 11 12
0

200

400

600

800

1000

1200

Slope = 191.39

(a) (c)(b) (d)

Figure 4.6: Illustration of how the features C1 and C2 are computed. (a),(e) Area-
Voronoi diagram and segmented chromatin particles. (b),(f) Distance transform of the
Voronoi regions. (c),(g) Portions of (b),(f) cut out by the segmented chromatin particle
masks. (d),(h) Slope of the cumulative frequency curve of the distances in (c),(g) at the
37th percentile.

2See footnote 1.

40 , Signals and Systems, Master of Science Thesis 2013


4.2. PROPOSED STRUCTURAL TEXTURE FEATURES

(a) (b) (c)

(f)(e)(d)
−2 0 2 4 6 8 10 12 14 16

0

5

10

15

20

25

30

35

40

45

50

Figure 4.7: Illustration of how the features C3 and C4 are computed. (a) Segmented
chromatin particles. (b) Distance transform of the set complement of the union of the
chromatin particle masks. (c) Area-Voronoi diagram. (d) Binary mask (with particles
superimposed) formed from the union of the convex hulls (see the text for details). (e)
Voronoi region divide lines clipped according to the mask in (d). (f) Frequency histogram
of the distance transform values corresponding to the residual divide line pixels in (e).

4.2.4 Contextual chromatin particle features

Contextual features provide information based on object-object or object-scene interac-
tions. The information related to the segmented chromatin blobs can be integrated with
the higher level contextual information. For example, the area ratio of each chromatin
particle to its watershed ZOI captures information about object area as well as back-
ground contextual information. Table 4.1 lists the contextual chromatin particle features
measured in this study.

4.2.5 Statistics of chromatin particle morphometric features

Morphometric features describe the geometry (shape, size, position and boundary) of
the chromatin blob and are computed from its binary mask. By computing the mean
and standard deviation of the extracted features for all the blobs inside a nucleus, a
structural texture measure for the nucleus is obtained.

, Signals and Systems, Master of Science Thesis 2013 41


CHAPTER 4. TEXTURE FEATURES CHARACTERISING CHROMATIN DISTRIBUTION

Table 4.1: Summary of the extracted nucleus features

Morphometric Features

Area, perimeter, mean-radius, elliptic variance [53]

Densitometric Features

Photometric features [54]: Integrated optical density (IOD), mean optical density (MOD), variance

of optical density, skewness of optical density, kurtosis of optical density

Texture Features

S
ta

ti
st

ic
a
l

Fractal texture features [42]: Fractal area 1, fractal area 2, fractal dimension

Run length features [55]: SRE, LRE, GLN, RLN, RP, LGRE, HGRE, SRLGE, SRHGE, LRLGE, LRHGE

Histogram based features [42]: Mean, standard deviation, and skewness of the grey-level histogram

GLCM features [54]: Contrast, correlation, energy, entropy, local homogeneity, maximum probability, cluster

shade, cluster prominence and information measure of correlation (H-IMC1, H-IMC2)

Complex Daubechies wavelet features of nuclei [56]: Mean, standard deviation, and skewness of the grey-

level histogram. GLCM features – contrast, correlation, energy, entropy, local homogeneity, maximum probability,

cluster shade, cluster prominence and information measure of correlation (W -IMC1, W -IMC2)

Statistics of the chromatin particle morphometric features

Mean of chromatin particle areas and perimeters, mean of chromatin particle areas normalized to the nucleus area,
mean of chromatin particle perimeters normalized to the nucleus perimeter, chromatin particle compactness (P2A)

Contextual chromatin particle features

S
tr

u
c
tu

ra
l

Margination: M1, M2, M3, M4, M5, M6, M7 (See section 4.2.2) Clustering: C1, C2, C3, C4 (See section 4.2.3)

Blob features: Heterogeneity (the ratio between the area of the segmented dark and light regions, and the nucleus

area) [52], the ratio between the area of the segmented dark regions and the nucleus area, number of segmented dark

particles, number of segmented light particles, area ratio of each chromatin particle to its watershed ZOI, average

distance between the geometrical center of the nucleus and pixels of all chromatin particles, distance between the

geometrical center of the nucleus and center of mass of the chromatin particles

Discrete texture features [42, 57]: Medium and high DNA amount, medium and high DNA area, medium and

high DNA compactness, medium-high DNA compactness, center of gravity (the distance from the geometrical center

of the blob to the center of gravity of the optical density function)

Area-Voronoi diagram: Mean and standard deviation (SD) of the areas of the Voronoi regions, mean and SD of

the areas of the Voronoi regions normalized to the nucleus area, area disorder [55], Voronoi regions roundness [55],

mean of area ratio of each chromatin particle to its Voronoi region [55]

Delaunay graph and generalised Delaunay graph: Mean of the Delaunay triangle areas, mean and SD of

the Delaunay edge lengths, average of the mean and SD of the edge lengths connected to each vertex (chromatin

particle), maximum Delaunay edge length, mean of the number of connections per chromatin particle, number of

vertices (chromatin particles) on the graph boundary

Gabriel graph and relative neighbourhood graph: Mean and SD of the graph edge lengths, maximum edge

length, average of the mean and SD of the graph edge lengths connected to each vertex (chromatin blob), mean of

the number of connections per vertex (chromatin blob)

Minimum spanning tree: Mean and SD of the edge lengths, total edge length, edge disorder [55], minimum to

maximum edge ratio, mean of the number of connections per vertex, percentage of vertices connected to one vertex

(MST1), percentage of vertices connected to two vertices (MST2), percentage of vertices connected to more than

two vertices (MST3)

42 , Signals and Systems, Master of Science Thesis 2013


4.3. SUMMARY

4.3 Summary

The chapter presented a set of novel structural texture features for quantifying nuclear
chromatin patterns in cells on a conventional Pap smear. The features are derived from
an initial segmentation of the chromatin into blob-like texture primitives. Several new
features were introduced to quantify the qualitative description of chromatin used by
cytoprofessionals, such as margination and clustering, using an structural approach to
texture analysis requiring an initial segmentation of chromatin.

, Signals and Systems, Master of Science Thesis 2013 43


5
Evaluation of the proposed

texture features for screening Pap
smear slides

T
his chapter presents an empirical evaluation of the performance of the structural
chromatin texture features proposed in the previous chapter. In particular two
experiments, using clinical Pap smears sourced from the department of pathol-

ogy, Regional Cancer Centre (RCC), Thiruvananthapuram in India, are detailed. The
first is a comprehensive feature selection experiment that sought to determine the most
discriminatory subset of features, from among the proposed features and features drawn
from the literature, for discriminating between normal and abnormal smears. The sec-
ond is a classification experiment that sought to evaluate the performance of a variety of
classifiers, built using the feature sets obtained in the first experiment, for discriminating
between normal and abnormal slides on the basis of only normal-appearing cells.

The remainder of this chapter is organized as follows. Section 5.1 presents the image
data used in the experiments. Section 5.2 details the feature selection methods considered
in this study. Section 5.3 details the first experiment carried out to obtain the most
discriminatory subset of features. Section 5.4 then details the second experiment.

5.1 Pap smear images and ground truth

The image data used in this study originate from a set of 68 conventional Pap smear slides
sourced from the Regional Cancer Centre (RCC), Thiruvananthapuram in India. Each
slide was reviewed by a cytopathologist and assigned a cytological diagnosis according
to the Bethesda system [58]. The cytopathologist subsequently acquired representative
FOVs from each slide. In the case of abnormal smears this included FOVs with and
without diagnostic cells. The cytopathologist also labelled individual cells in each FOV

45


CHAPTER 5. EVALUATION OF THE PROPOSED TEXTURE FEATURES FOR SCREENING PAP SMEAR SLIDES

accordingly. The FOV images were acquired using a monochrome CCD camera mounted
on a light microscope with a 40× objective lens with a numerical aperture of 0.95. Each
image has a grey-scale resolution of 8 bits per pixel and is of size 1024×1344 pixels with
square pixels of size 0.25µm.

5.2 Feature selection methods considered in this study

5.2.1 The curse of dimensionality and the need for feature selection

The first step in designing a classifier is to have a dataset for training and evaluating
the performance of the classifier. It is unequivocal that this data must contain represen-
tative observations from each class, and that the sample size determines the number of
features required to build a classifier to discriminate between different classes [59]. As
illustrated earlier in the chapter 4, a wide range of features can be computed and those
can be used for building the classifier. However, increasing the number of features to
build a classifier tends to increase the misclassification error. In addition, the prediction
variability increases and the classifier becomes very sensitive to the outliers. Hence, there
would be no guarantee for the designed classifier to have roughly the same classification
performance on a new data set [59].

It is important to point out that building a classifier with too many features on a too
small training dataset can lead to a “perfect” classification performance on the training
dataset, but very poor performance on unseen test data.

Feature selection is a dimensionality reduction process, which aims to select an opti-
mum subset of features from the original potentially discriminating set of features. There
are two main reasons for using feature selection. First of all, irrelevant and redundant
features are detrimental for machine learning algorithms. For a given fixed number of
training samples, the predictive power of the classifier decreases as the dimensionality
increases. Feature selection techniques are intended to avoid the curse of dimensionality
by removing irrelevant and redundant features. Secondly, feature selection techniques
greatly reduce the measurement and computational cost of classifying high-dimensional
data [59].

The appropriate selection of relevant feature subset makes it possible to achieve
excellent performance on classification of malignancy. In this study, the following three
feature selection methods were investigated: the state-of-the-art multiple support vector
machines with recursive feature elimination (MSVM-RFE) [60], L1-regularization path
for generalised linear models [61], and the guided regularized random forest (GRRF)
[62].

5.2.2 MSVM-RFE

The feature extraction technique utilising Support Vector Machine based on recursive
feature elimination (RFE) was first proposed by Guyon [63]. It returns a ranking of the
features of a classification problem by training a SVM with a linear kernel and removing
the feature with smallest ranking score.

46 , Signals and Systems, Master of Science Thesis 2013


5.2. FEATURE SELECTION METHODS CONSIDERED IN THIS STUDY

The criterion used by the SVM-RFE is the weight magnitude and it is related to
each feature’s support to the discrimination function. The recursive feature elimination
algorithm, recursively identifies the features with the smallest weights in magnitude and
removes them.

Multiple SVM-RFE [60] is another feature selection method that uses backward
feature elimination procedure similar to SVM-RFE. However, instead of using a single
linear SVM at each recursive step, multiple linear SVMs are trained on sub-samples of
the training data obtained from multiple runs of k-fold cross validation. The feature
ranking score is then computed from statistical analysis of the weight vectors of the
multiple linear SVMs. Similar to the SVM-RFE method, the feature with the smallest
ranking score is omitted at each step.

5.2.3 Guided Regularized Random Forest (GRRF)

Decision trees are increasingly used for the purpose of the feature selection. Bagged
random trees known as Random forest (RF) have also been widely utilised for measuring
the feature importance. Those feature importance scores can be used in the feature
selection process to shortlist the most discriminating features and taking into account
only the features with high importance scores. The feature shortlisting procedure consists
of multiple iterations. The feature with the lowest importance score will be eliminated
from feature set per iteration. The number of features to be eliminated per iteration can
be increased to make the algorithm more time and cost efficient (the authors implemented
the algorithm to eliminate 20% of the features per iteration) [64].

The main drawback of the ordinary random forest is that at every node, a feature that
optimizes an information-theoric criterion will be selected to split the data. However, the
redundancy of that feature to the features selected in the previous splits is not considered
in this procedure.

A regularization framework can address the issue and can avoid selecting a new
feature for data splitting at each node when that feature produces a similar gain (im-
portance scores) to the feature set that has been selected in the previous splits. The
RF gives an importance score for each feature, though it does not end up in a feature
subset. Hence, the regularization framework can be easily added to the RF to ease the
feature subset selection. This implementation of the regularization framework on ran-
dom forest known as regularized random forest (RRF) [62] has been recently proposed
for the feature selection purposes. Nevertheless, evaluations of the features are usually
done by a portion of the training data at each tree node, which might cause in-stability
in feature selection procedure. To overcome this problem, an enhanced version of the
RRF, referred to as guided RRF has been proposed by Deng [62]. In this method, first
the importance scores for each variable from an ordinary random forest is computed.
Then those scores are used to guide the feature selection (in this case RRF). Given that
the importance scores in ordinary RF are computed on all of the trees and also on all of
the training data, the GRRF is likely to perform more efficiently in comparison to RRF
in terms of feature selection performance.

, Signals and Systems, Master of Science Thesis 2013 47


CHAPTER 5. EVALUATION OF THE PROPOSED TEXTURE FEATURES FOR SCREENING PAP SMEAR SLIDES

5.2.4 L1-regularization procedure with generalised linear model

Generalised linear models (GLM) generalise the classical linear models based on the
normal distribution. In an ordinary linear model the relation between the response
variable Yi and the independent predictor variables Xij is given by:

ηi = Yi = β0 + β1Xi1 + β2Xi2 + . . .+ βkXij (5.1)

where η is the linear predictor and βs are the coefficients of the linear combination whose
values are unknown and have to be estimated. The coefficient βj is the measure of the
impact of the predictor Xij on the target Yi. In an ordinary linear model, for each set of
values for the predictors, the response variable is continuous and normally distributed
[65].

In a generalised linear model, by contrast, response variables belong to the exponen-
tial family of distributions, such as the Gaussian, Binomial, Poisson, Gamma, or inverse
Gaussian. In addition, the relation between the response and predictor variables does
not necessarily have the simple linear form as in equation 5.1. In fact, the independent
predictive variables can take non-linear transformations of the other predictor variables.
However, the linear predictor is still linear in the coefficients (parameters) that need to
be estimated.

GLMs model the random variable Y by providing a relationship between its expected
value and the linear predictor η.

ηi = g (µ) = β0 + β1Xi1 + β2Xi2 + . . .+ βkXij (5.2)

where g is called the link function. The link function relates the linear predictor η
to the expected value of the response variable. The link function for a classical linear
model is the identity function g (µ) = µ = E (Y ). Examples of link fun