Features
A set of 30 features describing the shape,
uniformity, and brightness, are calculated for each object:
- Maximum density and integrated density
- Weighted and unweighted ellipticity, semi-major
and semi-minor axes
- Texture parameters (entropy, contrast, and
2nd order angular moment) computed from co-occurrence matrix
- Spike parameters, computed from correlation
of the object with a template in the shape of a plus sign, similar
to diffraction spikes on bright stars
- Areas at the 16 detection thresholds used
by the object detection and deblending algorithms
This feature set was selected based on good
star/galaxy separation in feature plots.
Ranking
These "raw" features are then transformed by
the statistical process of "ranking", that is, ordering a population
of objects by the value of the feature, and mapping the full range
of the feature into a range of 0:1. The ranking is performed in
zones on each plate (typically the plate is divided into a 7x7 grid,
but some more crowded plates were gridded more finely due to memory
limitations. Objects lying near the grid boundaries are interpolated.).
The ranked features are then used in the remainder of the classification
process. This ranking has the effect of significantly removing
plate to plate variations in the features, removing the necessity
of creating a specific training set for each plate. This can be
illustrated by considering the case of ellipticity, a common classification
feature: the roundest objects on any plate are likely to be stars,
even if one plate is poorly guided compared to the other.
Decision
Tree
An oblique decision tree (OC1, Murthy
et al, 1994) is used to perform the classifications. Unlike
classical, or axis-parallel, decision trees, that can split on only
one variable at a time, OC1 can perform splits on a linear combination
of all the features. This permits OC1 to find and sensibly exploit
relationships among the image features.
Voting
To reduce the variance contribution to the
classification error, a set of five decision trees are created from
a training set of approximately 5000 handclassified objects, using
different randomizations in the treebuilding process to produce
5 pruned trees from the same data. During the plate classification
task, each object is classified by all five trees, and the results
are voted to produce a final, single plate, classification.
Final
Classification
The final published classification of an object
that has been matched on multiple plates (typically 1 for each bandpass
near plate centers, and more in overlap regions) is the mode of
the individual classifications. Defect classifications are ignored,
on the assumption that any object matched on two plates is real.
Unmatched defects are excluded from the catalog. In the event of
a tie, e.g. as many stellar as non-stellar votes, we decide in favor
of non-stellar based on our experience that it is easier to misclassify
a galaxy as a star rather than vice-versa. (Individual classifications
based on older 25 micron data, where the classifier was poorly tuned,
are excluded from the voting.)
|