In the digital age, our ability to catalog Earth's breathtaking biodiversity depends not just on scientific expertise, but on making that knowledge readableânot just for humans, but for machines.
Imagine you're trying to find all information about a particular animal, say the African elephant, across every scientific database in the world. Seems straightforward, right? Unfortunately, this simple task becomes a monumental challenge for computers. The scientific name Loxodonta africana might refer to slightly different groupings of organisms in different databases, may have been reclassified over time, or might be known by alternative names in older literature.
This isn't just an elephant-sized problemâit affects all species and hampers our ability to understand and protect life on Earth. As one research team puts it, "The scientific names of plants and animals play a major role in Life Sciences as information is indexed, integrated, and searched using scientific names. The main problem with names is their ambiguous nature" 1 .
The core issue lies in what scientists call "taxonomic concepts"âthe specific meaning attached to a scientific name based on a particular classification view. Consider these real-world complications:
A single species might be known by different names in different classification systems
The same scientific name might refer to different groupings of organisms in different databases
As research advances, species get reclassified, but historical records still use outdated names
"A taxon is a group of one or more organisms whose members are considered evolutionarily related to one another," researchers explain, noting that "contrary to popular belief, a generally agreed-upon, single taxonomy of organisms does not exist" 1 .
This ambiguity creates significant barriers when trying to integrate data from different sourcesâa crucial capability for addressing pressing challenges like biodiversity loss, climate change impacts on ecosystems, and emerging infectious diseases.
Traditionally, species information has been stored in relational databasesâstructured systems similar to sophisticated spreadsheets where data is organized into tables with predefined relationships. In these systems, species are typically identified using Life Science Identifiers (LSIDs), which function like unique catalog numbers in a massive library 1 .
While familiar and widely used, this approach has limitations. Each database essentially becomes an information silo, making it difficult to connect related information across different systems. Computers can process the data within each database efficiently but struggle to understand the meaningful relationships between information in different databases.
The emerging alternativeâontologiesârepresents a fundamentally different approach. Rather than simply storing records in tables, ontologies create a web of relationships that both humans and machines can navigate. They're built using Semantic Web technologies, including HTTP URIs (the same web addresses we use for websites) and RDF (Resource Description Framework), which organizes information in subject-predicate-object statements 1 5 .
Think of the difference this way: a relational database is like a meticulously organized filing cabinet, while an ontology is like a Wikipedia page full of hyperlinksâeach link connecting to related concepts in ways that both humans and machines can follow.
"Ontologies are a form of knowledge representation for a given domain that uses formal semantics and can be used to arrange and define a concept hierarchy, taxonomy and topology," computer science researchers note 5 .
| Feature | Relational Databases | Ontologies |
|---|---|---|
| Identifier Type | Life Science Identifiers (LSIDs) | HTTP URIs |
| Data Structure | Tables with predefined relationships | Web of interconnected statements |
| Machine Understanding | Limited to predefined queries | Can infer new relationships |
| Data Integration | Challenging across different systems | Built for connecting distributed data |
| Flexibility | Rigid schema | Adaptable to new relationships |
In 2014, a significant research project introduced TaxMeOn, a meta-ontology designed specifically to model taxonomic information in a machine-understandable way 1 . This work provided one of the first direct comparisons between traditional databases and ontological approaches for handling species checklists.
The researchers modeled the same taxonomic information using two different approaches:
The test case involved complex taxonomic scenarios that commonly cause problems in biological databases:
Comparison of identifier types used in the experiment
The results demonstrated clear advantages to the ontological approach:
"The use of HTTP URIs is preferable for presenting the taxonomic information of species checklists," the researchers concluded. "An HTTP URI identifies a taxon and operates as a web address from which additional information about the taxon can be located, unlike LSID" 1 .
This capability enables what's known as Linked Dataâinterconnecting related information across the web rather than leaving it isolated in separate databases. The approach allows systems to "understand" the relationships between different taxonomic concepts and navigate between connected pieces of information automatically.
Ontology approach advantages based on TaxMeOn experiment results
Creating these machine-understandable species checklists requires a specialized set of tools and technologies. Here are the key components researchers use in this field:
| Component | Function | Real-World Example |
|---|---|---|
| HTTP URIs | Provide unique, web-accessible identifiers for taxa | A permanent web address for Canis lupus |
| RDF (Resource Description Framework) | Structures information in subject-predicate-object statements | "Arctic fox - lives in - Arctic region" |
| OWL (Web Ontology Language) | Defines relationships and constraints between concepts | Specifying that a genus can contain multiple species |
| SPARQL | Query language for retrieving ontological information | Finding all predator species in a food web |
| TaxMeOn | Meta-ontology for modeling taxonomic information | Framework for representing species checklists |
These tools enable what researchers call "formal semantics"âa way of representing meaning that's precise enough for computers to process logically. This doesn't just help with finding information; it enables computers to make logical inferences based on the relationships in the data 5 .
A computer can automatically infer this relationship without being explicitly told about each species.
The shift from databases to ontologies isn't just a technical improvementâit's transforming how we study and protect life on Earth.
Machine-understandable species data enables researchers to connect information across traditional boundaries. A wildlife ecologist studying animal distributions can seamlessly link to genetic information in biomedical databases, conservation status in environmental databases, and fossil records in paleontological collections.
"This enables the integration of biological data from different sources on the web using Linked Data principles and prevents the formation of information silos," researchers emphasize 1 .
This interoperability is particularly crucial for addressing complex scientific questions that span multiple disciplines, such as understanding how climate change affects species distributions or tracking the emergence of zoonotic diseases that jump from animals to humans.
The impact of these technological advances extends throughout ecology and conservation biology. Recent research shows that deep neural networks are increasingly being combined with species distribution models to classify animals from images and predict where they're likely to be found 2 .
These integrated approaches depend on having clean, well-structured taxonomic data that machines can understand and process. For instance, a camera trap system using artificial intelligence to identify species from photos needs unambiguous references to know what characteristics define each speciesâexactly what ontological approaches provide.
"The use of deep learning already covers several levels of classification of living beings, from bacteria to plants," ecological researchers note, "passing through the classification of insects and vertebrates, and on scales that vary from local, with specific regions, to works that include the entire planet" 2 .
Perhaps most importantly, these technologies are becoming essential tools in global conservation efforts. Understanding which species exist where, how their distributions are changing, and how they're responding to environmental pressures requires integrating massive amounts of data from countless sources.
| Field | Application | Impact |
|---|---|---|
| Conservation Biology | Tracking endangered species distributions | Identifying critical habitats for protection |
| Climate Change Research | Modeling species range shifts | Predicting ecosystem responses to warming |
| Public Health | Monitoring disease vector distributions | Predicting outbreaks of vector-borne diseases |
| Agriculture | Tracking crop pest distributions | Implementing targeted pest control strategies |
Despite significant progress, challenges remain in the widespread adoption of ontological approaches. The taxonomic community continues to maintain various systems and standards, and integrating historical data presents particular difficulties. Furthermore, as researchers acknowledge, "There will never be a commonly agreed upon single taxonomy and there will always be multiple competing current taxonomic views" 1 .
Yet the direction is clear. Recent initiatives like the EUdaphobase Taxonomy Ontology for soil biology demonstrate how ontologies are being adopted across biological subdisciplines 6 . These efforts align with the FAIR principles (Findable, Accessible, Interoperable, and Reusable) that are becoming the standard for scientific data management.
As these technologies mature, we're moving toward a future where our digital understanding of biodiversity matches the complexity and interconnectedness of the natural worldâtransforming how we document, study, and protect the rich tapestry of life on our planet.