The University of Queensland Homepage
Takes you back to the CBIT Homepage You are at the CBIT site

Shop Online

Visit Lucidcentral

 

 

 

SWIFT Description Parser: a Software Tool for
Rapid Species Description through Natural Language Parsing


Developers: CBIT
Project leaders: Shaun L. Winterton, Mathew Taylor
Senior Programmer: Damian Barnier

Summary

Earth’s undescribed biodiversity is immense but our ability to document these undescribed species is threatened by a taxonomic impediment caused by a lack of taxonomic resources and inherent aspects of the descriptive process. We are developing a easy-to-use software application (SWIFT Description Parser) to alleviate some of the tedious components of describing species. This software uses natural language parsing to extract character states from taxonomic descriptions and generate character matrices in standard descriptive data (SDD) format for use in interactive keys and for generating taxonomic monographs automatically. Adoption of SWIFT Description Parser as part of routine taxonomic studies will dramatically increase the productively of taxonomists describing the world’s undescribed biota by significantly increasing the rate of description.

Project Description:

Estimates of the biodiversity of earth range widely from three to 100 million species, of which only 1.8 million are described. With this tremendous number of undescribed biodiversity on Earth, the societal need of taxonomy is greater now than ever, and yet resources supporting taxonomy are becoming scarcer (Wheeler et al. 2004). This is the taxonomic impediment and simply means that despite identifying the problem, we still lack the taxonomic expertise and resources to describe the remaining biodiversity on earth (Evenhuis 2007).

Describing species is a time-consuming, careful process requiring specialised expertise and knowledge about a specific group of organisms. Monographs compound this by treating all the species (usually large numbers both previously described and new) in a single revision. The tedious process of traditional species description may take years from recognition that a species is new to actual publication and availability of a taxonomic name (i.e. Latin binomial). Few taxonomists produce more than 100 species descriptions throughout their career, so with fewer taxonomists and resources available our hopes of documenting the world’s species are diminishing. What is needed is a radical change in the way we think about species description and biodiversity exploration. We need to move away from tediously composed species descriptions in word processors, towards large, digitised character data harvested from the published literature, where species description involves a simple process of checking appropriate character states followed by subsequent transformation into species descriptions.

What is needed is a paradigm shift from traditional to digital taxonomy to describe the world’s biodiversity in a timeframe appropriate for realistic conservation management.

SWIFT Description Parser is a software application currently under development by CBIT as a innovative method for describing the world’s undocumented biological diversity using character matrices harvested from published descriptions which can be then used to describe new species, with the data exportable in interactive key format or natural language species descriptions in monographs. The net result will be greater output of species descriptions, thus reducing the impact of the taxonomic impediment.

Technical aspects:

Automating Species Descriptions. Simply described, structured descriptive data can be generated rapidly using SWIFT Description Parser by either harvesting character state information from existing publications against a character state list, or by checking appropriate boxes in a list of character states. With a new species in hand, the taxonomist then checks the appropriate boxes to describe the species, and adds new characters and states, if necessary, that may be unique to that species. When transforming this data to a description characters not appropriate to a species are excluded, as they would not have been scored in the character list. Ultimately a character list could also be compiled from character information gleaned from already published descriptions of related species but usually it is predefined. A summary of the process is as follows (Figure 1). A description would be associated with an entity (taxon) either manually, or automatically if an XML dataset is available. The description would then be broken down (parsed) in two or more passes based on a set of delimiters and rules. From the description text a data model will be created that can be parsed against a standard feature list (characters and states). To help in parsing the description data model against the feature list the parser will draw on standard syntax rules and user defined rule sets. It will also employ a dictionary set to assist in this process. For example, “arrangement” can also mean “arranged” etc. Once the description data has been parsed against the feature list, matches will be scored and presented. Users can then make additional changes, if desired. Multiple descriptions for a given entity can also be used to further fill the data matrix or show divergence between descriptions. After parsing has been completed on an entity set the data can then be exported to Structured Descriptive Data (SDD) format (Figure 2). This format can then be used in a number of SDD compliant applications such as Lucid3 or IdentifyLife. These tools in turn can produce, for example, identification keys or new species descriptions. We plan to provide a mechanism for import of harvested character state data from multiple sources, and to enable centralisation of character data in SDD format in a web-based environment for large biodiversity projects.

SWIFT Description Parser is undergoing development and testing and release a date is planned for mid 2009.

If you are interested in using SWIFT Description Parser for taxonomic research or as a beta tester, then please contact us.

 

 
privacy | feedback