Thursday, December 2, 2010

A Romantic's View of Expert Systems

Who will identify all the fossils that are just now starting to weather from the rocks?
~ Roger L. Kaesler

I am a romantic about many aspects of paleontology, including the sages that grace the science.  I suspect it helps that I am also an amateur at all of this and have never been behind the scenes.  My fleeting encounters with professional paleontologists have left me with the impression that each has a profound grasp of his or her domain, an ability to see the micro and the macro at once.  They define the term expert.  Yes, it’s a romantic view.

I rarely have the opportunity to turn to one of those paleontological sages when I’m faced with the challenge of identifying precisely what genus or species of shark gave up the fossil tooth that lies before me.  My next best approach involves identification guides, articles, a few key websites, and the like.

One resource that I’ve always thought held promise was sort of an “expert in a box” or, more accurately, an expert in a “knowledge base” (to use a term from computer-based expert systems, something else for which I also have amateur status or less).  A knowledge base for fossil identification could take one of several forms and need not be delivered through technology.  It might be a series of “rules” (e.g., IF/THEN statements) with an initial rule that draws some basic distinction within the ranks of the type of fossil specimen being studied.

An example of such a set of rules was prepared several years ago by Robert Purdy of the Smithsonian Institution.  A Key to the Common Genera of Neogene Shark Teeth (revised March 2006) is a set of 50 rules, each of which has 2 possible responses.  The response to a rule dictates which rule the user moves to next or whether a possible identification of the shark genus is ready to be offered.  Purdy’s Rule 1 requires the user to decide whether the fossil tooth has (a) one cusp or (b) several cusps.  That clearly is a fundamental difference that separates fossil shark teeth.  In Purdy’s key, if the tooth has a single cusp, then the user moves on to Rule 2; if multiple cusps, Rule 25 is the destination.  Though there are 50 such rules, some identifications come quickly, after invoking only a few rules.  The genus of the cow shark tooth, pictured below (image on left is of the lingual side, image of right is of the labial) can be identified as Notorynchus in three steps – Rules 1, 25, and 26.

I have to admit that when I applied the key to the pictured tooth I made a judgment call on the last rule choosing between the (a) and (b) options of Rule 26.  Rule 26(a) applies to a tooth with 3 to 4 cusplets while Rule 26(b) cites 7 to 10 cusplets.  This tooth appears not to fall into either category, having instead about five cusplets.  I decided to go with the option that came closest to the specimen, Rule 26(a), which immediately generated the Notorynchus identification.  Of course, with this particular tooth, I’d already consulted my other resources, though not a living expert, and “knew” where the process should be heading – Notorynchus.

I have spent some time working on a knowledge base (in Excel) to help in the identification at the species level of fossil teeth from Carcharhinus sharks, the so-called gray or requiem sharks.  Bretton Kent in his seminal Fossil Sharks of the Chesapeake Bay Region (1994) captures the essence of why, absent a human expert at my beck and call, I’ve invested time in trying to build this knowledge base.  “The identification of individual Carcharhinus species based solely on teeth can be difficult given the degree of convergence in tooth form among different lineages.”  (p. 80)

It also shouldn’t be surprising that distinguishing among species through a set of rules may be even more problematic than using rules to distinguish among genera as in Purdy’s key.  Differences among fossil specimens from diverse species are frequently very subtle.  For instance, serrations can either present or absent.  Pretty obvious, except sometimes serrations are tiny, requiring a hand lens to see.  When do serrations shift from being no longer tiny and difficult to see, but regular?  When are they no longer regular but coarse?  At the extremes the differences are obvious, but, there is, to use Kent’s word, convergence that requires distinguishing among shades of gray.

So, in my limited experience, the application of a knowledge base isn’t always, or even usually, akin to following a single, obvious thread directly from specimen to a conclusion about identity.  Rather, there are knots to contend with and these stem largely from the fact that we’re dealing with what were once living organisms which are inherently variable, with a fossilizing process that introduces variability, and with rules that are more or less useful depending upon how carefully they’ve been worded.  Ultimately, one needs to use some informed judgment.  A knowledge base can be useful in the identification of a specimen, but, from my perspective, not sufficient.

Several decades ago, expert systems emerged from work in computer-based artificial intelligence.  An expert system was defined as “a computer program designed to model the problem-solving ability of a human expert.”  (John Durkin, Application of Expert Systems in the Sciences, Ohio Journal of Science, vol. 90, no. 5, 1990, p.171.)  Among the components of such a system that Durkin identified were (1) a knowledge base and (2) something called an “inference engine” which integrated the data input by the user with the information residing in the knowledge base to craft a solution to the problem under analysis.  Apparently, the inference engine would be programmed to deal with the uncertainty introduced by incomplete information, yielding probabilities for different solutions.  Fossil identification was one problem that some programmed expert systems to address.

I recently stumbled across the presidential address to The Paleontological Society delivered by paleontologist and geologist Roger L. Kaesler in 1992, nearly 20 years ago.  In it, Kaesler stated that paleontology faced a future with far too few paleontologists who were immersed and expert in the taxonomy, evolutionary history, and geography of major groups of fossils.  He warned that “an acute shortage of systematic paleontologists” threatened the science.  (A Window of Opportunity:  Peering into a New Century of Paleontology, Journal of Paleontology, vol. 67, no. 3, 1993.)  He saw a 15-year window of opportunity in which to prepare for this future, and suggested that one promising way to preserve the systematists’ knowledge and bring it to bear on future fossil finds was the building and application of expert systems.

The window of opportunity Kaesler identified closed in 2007 (sadly, that was also the year he died).  I wonder, has the dearth of systematic paleontologists has come to pass, an event that he apparently thought inevitable?  Is the science increasingly relying on an array of expert systems to identify and classify fossils, and to define their relationships among each other?

As for the first question, in a recent (2008) piece, Norman MacLeod, the Keeper of Paleontology at the Natural History Museum in London, writes, "This expertise deficiency, which has come to be called the 'taxonomic impediment', is with us now and will only become more serious as time goes by unless some means is found to address its effects."  (Introduction, Automated Taxon Identification in Systematics:  Theory, Approaches and Applications, edited by MacLeod, 2008, p. 3).  He is writing not just about a shortage of individuals with systematic knowledge affecting paleontology but more broadly, including biology and zoology.

As for the second, MacLeod posits that the dream of automated taxon identification in general, not just of paleontological remains but also of extant organisms, is still alive but clearly has yet to be realized.  He concedes that "most practicing taxonomists still believe such systems are the stuff of science fiction."

I’m too much of a romantic and a bit soured by my brief encounters with efforts to capture knowledge in a box to think we can ever replace that human expert, that systematic paleontologist who seemingly knows it all in his or her domain.

In Trilobite:  Eyewitness to Evolution (2000), Richard Fortey describes his early years at the University of Cambridge’s Sedgwick Museum of Earth Sciences, where he labored to extract trilobites from material he had collected at Spitsbergen, north of the Arctic Circle.  The renowned paleontologist Harry Whittington was mentoring him, his “guru.”  For Fortey, Whittington does what an irreplaceable human expert system does.  Fortey writes,

From time to time Harry Whittington would appear and make encouraging remarks, or put me right when I placed the wrong head and tail together. (p. 38)

Whittington died earlier this year (2010) at the age of 94.

No comments:

Post a Comment

Nature Blog Network