Google plus

Monday, July 11, 2011

It's all just semantics.

The problem with computers is that they are, as we used to say back in the UK, "all face and no trousers". Computers can hold huge amounts of data and can quickly search for target character strings or numeric values, but ultimately they have no idea what any of the data actually mean. This is problematic when dealing with a data management program that centralizes information pouring in from many scientists. If I want to find something that I know I at one time contributed to an ever-increasing mountain of data, I can search for a word or value that I know I included when I created it. However, If I want to find something someone else created, then I have a problem because I don’t necessarily know any of the words or numeric values that they included and they may no longer be available to ask. Additionally, I won’t necessarily recognize search results as useful based on a file name, image thumbnail or some other preview that is the result of a full text search. When a scientist asks another scientist an ambiguous question, we, as humans, can respond in a uniquely human way by saying something like “what do you mean?” Poor, dumb computers on the other hand can never know “what you mean”, because they don’t understand meaning.

A good ELN system finds ways to associate rich meaning with data to make it easier to find information and build upon past laboratory research. One way to do this is to associate
metadata with raw data files, which help to identify the data’s origin, meaning and relationships. Some types of metadata are common and well known – keywords for example, or the “tags” that so many of us use to identify our friends in facebook pictures. Savvy organizations understand the importance of metadata and may insist that information such as sample numbers, project IDs and grant numbers be associated with all data to make it easier to gather and find later, but this kind of metadata still assumes that users know and follow established conventions and naming schemes. A good ELN can go one step further. Using technologies such as OWL and RDF, a good ELN can associate semantic metadata with objects using pre-defined ontologies that can anticipate how humans make meaning of data. Think of an ontology as a set of related, hierarchic terms with increasingly specific meanings. For example, the term “Autoimmune Disease” might can be subdivided into a full list of 100 different examples (Addison’s disease, Alopecia, Arthritis, Allergies etc. etc.) and some of these second level terms might be subdivided into third level terms (Allergies include Hay Fever, Penicillin Allergy, etc). If you perform an experiment related to Hay Fever, and you associate the semantic metadata label “Hay Fever” with that experiment, then later a colleague searches for “Autoimmune Disease”, a good ELN is smart enough to include the Hay Fever experiment in the search results because it “knows” that Hay Fever is a type of Autoimmune Disease, even thought the experiment does not anywhere contain that exact phrase.

By using industry standard or carefully constructed custom ontologies that make sense for a particular organization, downstream searching and gathering of knowledge assets can be greatly facilitated because a good ELN understands what kinds of resources you are looking for even if you do not know anything about the specific text or content of those resources. good ELN can also automate the initial association of metadata with certain objects so that the scientist can spend less time manually categorizing their data, and more time performing research. In sense, good ELN can be trained to understand what a particular file “means”, bringing the ELN one step closer to the goal of behaving as though it were a real human lab assistant, albeit one that never requests a vacation or asks for a pay raise. There's really only one ELN on the market that leverages modern semantic technologies, and that's CERF.  learn more at

No comments:

Post a Comment