Inside the GSC II Classifier

Inside the Classifier

STScI • ACDSD • MAST • CASB • GSC II

Products

GSC

DSS

GSPC

Science

Publications

Data Access

Related Science

Missions

HST

GEMINI

VLT

NGST

Virtual Observatory

XMM

Facilities

Plate Scanning

COMPASS ooDB

Staff Pages

Last Updated Jan 2001

Overview

Performance

Using the Data

Features
Ranking
Decision Tree
Voting
Final Classification

Features

A set of 30 features describing the shape, uniformity, and brightness, are calculated for each object:

Maximum density and integrated density
Weighted and unweighted ellipticity, semi-major and semi-minor axes
Texture parameters (entropy, contrast, and 2nd order angular moment) computed from co-occurrence matrix
Spike parameters, computed from correlation of the object with a template in the shape of a plus sign, similar to diffraction spikes on bright stars
Areas at the 16 detection thresholds used by the object detection and deblending algorithms

This feature set was selected based on good star/galaxy separation in feature plots.

Ranking

These "raw" features are then transformed by the statistical process of "ranking", that is, ordering a population of objects by the value of the feature, and mapping the full range of the feature into a range of 0:1. The ranking is performed in zones on each plate (typically the plate is divided into a 7x7 grid, but some more crowded plates were gridded more finely due to memory limitations. Objects lying near the grid boundaries are interpolated.). The ranked features are then used in the remainder of the classification process. This ranking has the effect of significantly removing plate to plate variations in the features, removing the necessity of creating a specific training set for each plate. This can be illustrated by considering the case of ellipticity, a common classification feature: the roundest objects on any plate are likely to be stars, even if one plate is poorly guided compared to the other.

Decision Tree

An oblique decision tree (OC1, Murthy et al, 1994) is used to perform the classifications. Unlike classical, or axis-parallel, decision trees, that can split on only one variable at a time, OC1 can perform splits on a linear combination of all the features. This permits OC1 to find and sensibly exploit relationships among the image features.

Voting

To reduce the variance contribution to the classification error, a set of five decision trees are created from a training set of approximately 5000 handclassified objects, using different randomizations in the treebuilding process to produce 5 pruned trees from the same data. During the plate classification task, each object is classified by all five trees, and the results are voted to produce a final, single plate, classification.

Final Classification

The final published classification of an object that has been matched on multiple plates (typically 1 for each bandpass near plate centers, and more in overlap regions) is the mode of the individual classifications. Defect classifications are ignored, on the assumption that any object matched on two plates is real. Unmatched defects are excluded from the catalog. In the event of a tie, e.g. as many stellar as non-stellar votes, we decide in favor of non-stellar based on our experience that it is easier to misclassify a galaxy as a star rather than vice-versa. (Individual classifications based on older 25 micron data, where the classifier was poorly tuned, are excluded from the voting.)