I. INTRODUCTION

Advances in imaging, visualization, and virtual environments technology are allowing the clinician to not only visualize, but also interact with a virtual patient [25][26]. In many cases, anatomical information contained within sampled image datasets is essential to clinical tasks. One typical example is in surgery where pre-surgical planning and post-operative evaluations are not only enhanced, but in many cases depend upon sampled image data for successful results [9][21]. Many sampling devices are now available to image a variety of material. Two commonly used modalities to image bony structures and soft tissue are CT (Computerized Tomography) and MRI (Magnetic Resonance Imaging), respectively. Datasets are realized as a sequence of 2D cross-sectional slices that together represent the 3D sample space. The entire image stack may be viewed as a 3D array of scalar (or vector) values, called a volume, where each voxel (volume pixel) represents a measured physical quantity at a location in space. Advances in the fields of surface and volume graphics now make it possible to render a volume dataset with high image quality using lighting and shading models. Graphic workstations equipped with specialized hardware and texture mapping capabilities show promise for real-time rendering [25]. Currently, clinicians view images on photographic sheets containing adjacent image slices, and must mentally reconstruct the 3D anatomical structures. Even though clinicians have developed the necessary skills to make use of this presentation, computer-based tools will allow for a higher level of interaction with the data.

A major hurdle in the effective use of this technology is the accurate identification of anatomical structures within the volume. The computer vision literature typically identifies three processing stages before object recognition: image enhancement, feature extraction, and grouping of similar features. In this paper, we address the last step, called image segmentation, where pixels are grouped into regions based on image features. The goal is to partition an image into pixel regions that together represent objects in the scene. Segmentation is a very difficult problem for general images, which may contain effects such as highlights, shadows, transparency, and object occlusion. On the other hand, sampled image datasets lack these effects with a few exceptions. One such exception is ultrasound datasets which may still contain occluded objects. On the other hand, challenges to segment sampled image datasets involve handling noise artifacts introduced during the acquisition process and dataset size. For example, noise artifacts appear as "salt and pepper" noise or arbitrary size areas of noise which represent complex or small size objects when compared with the device sampling rate. With new imaging technology, the increasing size of volume datasets is an issue for most applications. For example, a "medium" size dataset with dimensions 256x256x125 contains over 8 million voxels. The MRI, CT, and color image datasets for an entire human male cadaver from the National Library of Medicine's Visible Human Project [1] require approximately 15 gigabytes of storage. An additional challenge is that objects may be arbitrarily complex in terms of size and shape.

Many segmentation methods proposed for medical image data are either direct applications or extensions of approaches from computer vision. Image segmentation algorithms can be classified in many ways [15][11][27]. We identify three broad classes that divide algorithms to segment sampled image data as: manual, semi-automatic, and automatic. Reviews of algorithms from each class can be found in [7][18]. The image segmentation algorithm presented in this paper is a semi-automatic approach. The manual method requires a human segmenter with knowledge of anatomy to use a graphical software tool to outline regions of interest. Obviously this method produces high quality results, but is time consuming and tedious.

Semi-automatic methods require user interaction to set algorithm parameters or to enhance the algorithm results. They can be classified according to the space in which features are grouped together [11]. Measurement space methods map and cluster pixels in a feature space. A commonly used method is global thresholding [18][11], where pixel intensities from the image are mapped into a feature space called a histogram. Thresholds are chosen at valleys between pixel clusters that each represent a region of similar-valued pixels in the image. This works well if the target object has distinct and homogeneous pixel values, which is usually the case with bony structures in CT datasets. On the other hand, spatial information is lost in the transformation to feature space, which may produce disjoint regions. Segmentation may be difficult because pixels within the same object may be mapped to different clusters and pixels from different objects may be mapped into the same cluster.

Spatial domain methods use spatial proximity in the image to group pixels. Edge detection methods use local gradient information to define edge elements, which are then combined into contours to form region boundaries [7][11]. For example, a 3D version of the Marr-Hildreth operator was used to segment the brain from MRI data [5]. Other edge operators, such as the Canny operator [6], have been used to some extent. However, edge operators are generally sensitive to noise and produce spurious edge elements that make it difficult to construct a reasonable region boundary. Region growing methods [2][11][3][27], on the other hand, construct regions by grouping spatially proximate pixels, so that some homogeneity criterion is satisfied over the region. Algorithms may be classified into four categories: merge, split, hybrid, and seeded region growing. Merge algorithms initially partition the entire image into many homogeneous regions, which can be single pixels or groups of pixels. At each iteration, two neighboring regions are merged into one if the resulting region is "homogeneous." Most metrics for computing homogeneity are statistical, for example, two regions are merged if their pixel means are close. Split algorithms define the entire image as a single region, and then successively subdivide regions into smaller ones. Subdivision stops when subdivided regions are no longer homogeneous. Hybrid approaches use both split and merge operations, and usually place additional region constraints. For example, a small region with size below a specified minimum may be merged with a bordering region. These approaches have drawbacks. Regions thus produced usually have blocky shapes because initial regions are defined with geometric primitives, such as rectangular shapes. Split approaches subdivide regions in a regular manner, such as subdivision into quadrants. This makes it difficult to segment arbitrarily shaped objects. Different segmentation results may be produced for the same image if merge and split operations are performed in different orders. For example, a top-down image traversal may generate different results than a bottom-up traversal. Also, noise in the image may erroneously contribute to the homogeneity metric, and hinder merging or cause premature splitting.

Seeded region growing algorithms [2][11][27] grow a region from a seed, which can be a single pixel or cluster of pixels. Seeds may be chosen by the user, which can be difficult because the user must predict the growth behavior of the region based on the homogeneity metric. Since the number, locations, and sizes of seeds may be arbitrary, segmentation results are difficult to reproduce. Alternatively, seeds may be defined automatically, for example, the min/max pixel intensities in an image may be chosen as seeds if the region mean is used as a homogeneity metric [2]. A region is constructed by iteratively incorporating pixels on the region boundary. Most methods use two criteria to add pixels. One adds a boundary pixel if its intensity satisfies a homogeneity metric defined over the entire region, for example, the mean or variance of the region. The other criterion adds the pixel if its intensity is close enough to a pixel in its local neighborhood.

In contrast to these approaches, we utilize a new biologically inspired oscillator network, called Locally Excitatory Globally Inhibitory Oscillator Network (LEGION) [20][23], to perform segmentation on sampled medical image datasets. The network was proposed based on theoretical and experimental considerations that point to the plausibility of oscillatory correlation as a representational scheme. The oscillatory correlation theory assumes that the brain groups and segregates visual features on the basis of correlation between neural oscillations [23][14]. It has been found that neurons in the visual cortex respond to visual features with oscillatory activity (See [16] for review). Oscillations from neurons detecting features of the same object tend to synchronize with a zero phase-shift, whereas oscillations from different objects tend to desynchronize from each other. Thus, objects seem to be segregated in time.

After briefly introducing some background on neural oscillations, we describe LEGION and its four key computational components in Sections II and III. In Section IV, a graph formulation is used to show that LEGION is able to label connected components. From a graph algorithm derived from LEGION's computational components, we present results of segmenting CT and MRI image datasets in Section V. Our algorithm is based on a local neighborhood for grouping and chooses seeds automatically. Next, we compare our results with manual segmentation in Section VI. Finally we conclude with a discussion in Section VII.

[Next] [Previous] [Top]