SWIFT Description Parser: a Software Tool for Rapid Species Description through Natural Language Parsing
Project leaders: Shaun L. Winterton, Mathew Taylor
Senior Programmer: Damian Barnier
Earth’s undescribed biodiversity is immense but our ability to document these undescribed species is threatened by a taxonomic impediment caused by a lack of taxonomic resources and inherent aspects of the descriptive process. We are developing a easy-to-use software application (SWIFT Description Parser) to alleviate some of the tedious components of describing species. This software uses natural language parsing to extract character states from taxonomic descriptions and generate character matrices in standard descriptive data (SDD) format for use in interactive keys and for generating taxonomic monographs automatically. Adoption of SWIFT Description Parser as part of routine taxonomic studies will dramatically increase the productively of taxonomists describing the world’s undescribed biota by significantly increasing the rate of description.
Estimates of the biodiversity of earth range widely from three to 100 million species, of which only 1.8 million are described. With this tremendous number of undescribed biodiversity on Earth, the societal need of taxonomy is greater now than ever, and yet resources supporting taxonomy are becoming scarcer (Wheeler et al. 2004). This is the taxonomic impediment and simply means that despite identifying the problem, we still lack the taxonomic expertise and resources to describe the remaining biodiversity on earth (Evenhuis 2007).
Describing species is a time-consuming, careful process requiring specialised expertise and knowledge about a specific group of organisms. Monographs compound this by treating all the species (usually large numbers both previously described and new) in a single revision. The tedious process of traditional species description may take years from recognition that a species is new to actual publication and availability of a taxonomic name (i.e. Latin binomial). Few taxonomists produce more than 100 species descriptions throughout their career, so with fewer taxonomists and resources available our hopes of documenting the world’s species are diminishing. What is needed is a radical change in the way we think about species description and biodiversity exploration. We need to move away from tediously composed species descriptions in word processors, towards large, digitised character data harvested from the published literature, where species description involves a simple process of checking appropriate character states followed by subsequent transformation into species descriptions.
What is needed is a paradigm shift from traditional to digital taxonomy to describe the world’s biodiversity in a timeframe appropriate for realistic conservation management.
SWIFT Description Parser is a software application currently under development by CBIT as a innovative method for describing the world’s undocumented biological diversity using character matrices harvested from published descriptions which can be then used to describe new species, with the data exportable in interactive key format or natural language species descriptions in monographs. The net result will be greater output of species descriptions, thus reducing the impact of the taxonomic impediment.