Source Code: Patch-based carcinoma detection in CLE images

The process, as performed by us, consists of three steps:

1. Patch extraction (and randomization) (extractPatches.py)

The original MKT (cellVizio) files are read, patches without annotated artifacts are extracted according to their frame entry in the CLE database.

40 batches with randomized patches (and according information about the originating sequence, patient etc.) are generated.

2. Training of a classifier and testing the classifier with the test set (train.py)

TensorFlow is used with a convolutional network, as described in the paper. This step generates a SQLite database with the results of individual frames.

3. Fusion of Patch Probabilities to Image Probabilities (CNN_ppf.py)

In this step, a numpy array with images probabilities is generated.

Unfortunately, to reproduce the exact results of our paper, access to the CLE image database is needed, and we have no permission to provide this openly at this time.

However, to be transparent about the methodology, you can find the code for each of the three steps mentioned above in the following. Further, I provide all derived data, i.e. the results of the classification approach on the patches and the CLE database without the images. With this, it is possible to re-run the patch probability fusion step (step 3).

CLE Database structure

We base our CLE image database on a SQLite relational database with the following structure:

CLEdatabases: Each database consists of a number of sequences (movies),

and may be from a different hospital or anatomical region

It has the following fields:

id: Unique ID

path: Path to the sequence files (mkt files)

description: Description of the database

authors: Medical authors of the dataset

CLEsequences: Each sequence consists of a number of images (frames),

It has the following main fields:

id: Unique ID

patientID: Unique identifier for patient (numerical)

patient: Patient identifier (string)

fileId: Unique identifier for a file (a file may contain in multiple CLE sequences)

file: MKT file name

subfolder: Subfolder containing MKT files

database: Link to CLEdatabases.id

CLEframes: Single images of the CLE database.

It has the following main fields:

id: Unique ID

sequenceID: Link to CLEsequences.id

frameIdx: Frame index in the MKT file (0: first image in file)

cellStructure: Cell classification (-1: unknown, 0: normal

epithel, 1: carcinoma, 2: dysplasia)

anatomicalLocation: Location where the sample was taken

(0: upper alveolar ridge, 1: lower inner

labium, 2: palatal region, 3: lesion region,

4: vocal folds)

gaussianNoiseClass: Noisyness of the image (0: not, 10: completely)

motionArtifactClass: Motion artifacts in image (0: none, 10: only artifacts)

illuminationArtifactClass: Illumination artifacts in the image (deprecated)

imageQuality: Subjective quality of image (0: bad, 1: neutral, 2: good)

CLEregions: Regions of interest / region annotations within a single frame.

Fields:

id: Unique ID

frameId: Link to CLEframes.id

regionType: Type of annotation (0: motion artifact, 1: noise artifact, 2: other artifact,

3: Other ROI/no artifact, e.g. anatomical structures)

x1,x2,y1,y2: Coordinates within image [x1:x2,y1:y2]

Files

Filename	Size	Description
train.py	16 kb	Train CNN using TensorFlow (step 2)
CNN_ppf.py	9kb	Patch probability fusion of classification results (step 3)
extractPatches.py	12kb	Patch extraction from original images (step 1)
CLEdB.db	568kb	Original sqlite3 database with classification information (without images)
CLEresults_TF_ppf_db0_publication_LOPO.db	7.7mb	Results of the CNN patch classification
helperfiles.tgz	7kb	Auxilliary files for file input/output. Contains the database management backend (CLEdB.py) and the MKT file reader (MKTreader.py)

Contact

Address

Dipl.-Ing. Marc Aubreville

Researcher in the Computer Vision (CV) group at the Pattern Recognition Lab of the Friedrich-Alexander-Universität Erlangen-Nürnberg

Source Code: Patch-based carcinoma detection in CLE images

CLE Database structure

Files