The CATH Protein Structure Classification is a semi-automatic, hierarchical classification of protein domains published in 1997 by Christine Orengo, Janet Thornton and their colleagues.
CATH shares many broad features with its principal rival, SCOP, however there are also many areas in which the detailed classification differs greatly.
The name CATH is an acronym of the four main levels in the classification.
|1. Class: the overall secondary-structure content of the domain|
|2. Architecture: a large-scale grouping of topologies which share particular structural features|
|3. Topology: high structural similarity but no evidence of homology. Equivalent to a fold in SCOP|
|4. Homologous superfamily: indicative of a demonstrable evolutionary relationship. Equivalent to the superfamily level of SCOP.|
In order to better understand the CATH classification system it is useful to know how it is constructed: much of the work is done by automatic methods, however there are important manual elements to the classification.
The very first step is to separate the proteins into domains. It is difficult to produce an unequivocal definition of a domain and this is one area in which CATH and SCOP differ.
The domains are automatically sorted into classes and clustered on the basis of sequence similarities. These groups form the H levels of the classification. The topology level is formed by structural comparisons of the homologous groups. Finally, the Architecture level is assigned manually.
More detail on this process and the comparison between SCOP, CATH and FSSP can be found in: Hadley & Jones, 1999 (PMID 10508779) and Day et al., 2003 (PMID 14500873).