Posted by Richard B. Hoppe on September 23, 2004 08:21 PM
I. Introduction: Designer Discrimination Algorithms
A significant problem in developing a revolutionary new theory is the parallel development of methods and technologies appropriate to testing the theory. As I have said a number of times in response to criticisms of Multiple Designers Theory, the absolutely necessary first step in the MDT research program must be the development and validation of designer discrimination methodologies. In the Introduction to Multiple Designers Theory above I saidthat in developing a design discrimination methodology, MDT has the same task as mainstream ID. First the methodology must be systematized and formalized. Then it must be empirically validated on test materials for which we already know the histories. The first task of MDT is to develop a formalized researcher-independent methodology that, when it is eventually applied to phenomena whose provenance and history we do not know, it can be legitimately expected to reliably tell us something of interest about the phenomena. Mainstream intelligent design has so far avoided that task: there are no validation data at all on its principal design detection methods. MDT, however, has begun the task of validation and calibration of its methodologies.
When proposing a new and potentially revolutionary theoretical structure and associating that theory with a new and untested methodology for evaluating the theory, it is absolutely critical to validate the associated methodology. That is logically and practically prior to anything else. Absent systematic and careful empirical validation of the methodology, ‘tests’ of the theory using the methodology are at best suspect. At worst they can be interpreted to be merely self-serving. Careful and systematic empirical validation of the methodology must be the foundation of, and precede (or at least parallel) tests of a theory using that methodology.
Whether one is trying to discriminate design from no design, as is the case for mainstream ID, or to distinguish the work of one designer from another, the task of developing, systematizing, formalizing, and empirically validating the methodology is critical. Absent that, claims about one’s theory are not plausible. A theory may have all the promise in the world, but without empirically validated methodologies for making observations and systematically gathering data to test hypotheses, it will remain merely a conjecture with no empirical content or explanatory utility.
Developing and validating a new methodology requires attention to two issues, the validity and reliability of the methodology. “Validity” refers to the question of whether the method measures what it purports to measure: does it do what it claims to do? Does a spectrometer in fact provide information concerning the frequency composition of light? “Reliability” refers to the question of consistency: does a methodology provide consistent results when repeatedly applied? Can one mechanically apply the method so subjective human judgments are not implicated in the results it yields?
Another significant problem in developing a general methodology for discriminating among the products of different designers is how the products will be represented as inputs to the discrimination algorithm. The representation problem is at the heart of pattern recognition, which is what a Designer Discrimination Algorithm must do: recognize the different patterns that are generated by different designers. Mainstream ID proponents have argued at various times that “irreducibly complex” structures, “complex specified information” or “specified complexity” are hallmarks of intelligent design, and they are incorporated in various ways in the arguments those proponents make. Unfortunately, there is no consistency in the way those various notions are defined, measured, or applied to real-world phenomena. For example, the latter two require that a human make a subjective judgement about whether an appropriate “specification” exists, but there is no developed body of methods that govern those judgements. Mainstream ID has no formal or mechanical way of representing objects or processes to serve as inputs to its methodologies. Mainstream ID is in N-Ray territory here. A primary goal of MDT’s designer discrimination methodology development program is to develop objective representations of phenomena that avoid the requirement for subjective human judgments, making the designer discrimination process completely mechanical. As the pilot data below suggest, achievement of that goal appears to be in sight.
Complexity is important in discriminating among designers, and the Designer Discrimination Algorithm (DDA) I am developing and evaluating takes account of that importance. There are many methods that try to measure complexity. One author identifies dozens of complexity measures. (Interestingly, William A. Dembski’s Complex Specified Information did not make the cut in that paper. See also this critique.) One component of the DDA is a rough analog of Solomonoff/Kolmogorov/Chaitin Complexity, or Algorithmic Information Complexity (AIC). There are other components in the DDA, but the construction of the DDA is such that the scores it assigns to stimuli should be (at least) ordinally associated with AIC. (See also this paper and this one — both by Cosma Shalizi and his colleagues — for related ideas about measuring the complexity and organization of complicated systems.)
Here I report progress on the effort to develop a designer discrimination methodology that is valid and reliable. This preliminary report will be slightly disappointing to readers because at this stage I do not intend to disclose the specific designer discrimination methodologies I’m testing . As I have noted elsewhere, there is a commercial niche for such technologies. I am doing some of this work on company time with company machines, and I am obliged to reserve the technology until it is developed far enough to be appropriately protected or until my company decides that it can be released into the public/academic domain.
Nevertheless, I can now say with confidence that at least one methodology under evaluation has the capability to discriminate among the complex objects created by several different designers. Pilot studies described below show that it provides statistically significantly different scores for samples of the products of different designers when it is mechanically applied, and there is reason to believe that the technology may be amenable to being incorporated into an automated classifier system that allows reliable assignments of products to designers with the process untouched by human hands (or judgments).
The principal task in discriminating among the products of multiple designers is to devise an analysis technology that can be mechanically implemented so subjective researcher-idiosyncratic variables cannot influence the encoding and scoring of the products to be analyzed. The technology undergoing evaluation does that. It is researcher independent. No subjective judgments are exercised in the selection of instances, in the application of the encoding method, or in the application of the discrimination method. All those components can in principle be fully automated, though I have not yet done so; this is, after all, still pilot research. In pilot research it is useful to have the stages of analysis clearly separated for process tracking and troubleshooting purposes. The designer discrimination methodology makes no use of background knowledge about the design processes employed by the designers and makes no assumptions about how the designs were manufactured (instantiated in matter and energy). It requires no assumptions about the nature of the designers, and uses no extra background information about the designed objects. It is a purely objective methodology that analyzes just the designed structures.
II. Pilot Research on Designer Discrimination Methodologies
To give the flavor of the pilot research, here are the results of two small pilot studies of the designers discrimination technology under development.
In the first pilot study, the materials were samples of text from two human authors. The texts were mechanically represented so as to be appropriate for input to the discrimination algorithm. Representation is a significant problem, and solving it in a general way so that non-textual materials can also be mechanically represented for presentation to the algorithm is no trivial task. That problem may now have been solved, too — see the second pilot study below.
Scores were assigned by the Designer Discrimination Algorithm to each of ten instances generated by of each of the two human writers, D1 and D2. The instances were generated by the designers in ignorance of the testing to be done on those products and with no particular instructions about what the designs should embody by way of structure. The first number indicates the designer, the second designates the instance. For example, “D1,3” is the third instance of the first designer. The scores are in dimensionless units, rescaled with identical scale factors. As mentioned above the scores are (at least) ordinal in Algorithmic Information Complexity.
TABLE 1. DDA Scores for Textual Stimuli
|Instance||DDA Score||Instance||DDA Score|
The means of the two distributions of scores are statistically significantly different (2-tailed t(18) = 2.10, p<.05). The scores in the aggregate reliably distinguish between the two samples. As a reference point, the mean score assigned by the DDA to a sample of random structureless texts with comparable numbers of “words” is on the order of 7.5 standard deviations different from the mean scores of the two distributions above. As well as detecting designer differences, the technology very reliably detects human-generated text in general from random text. (That suggests the possibility that the DDA may be useful to mainstream ID to salvage its forlorn effort to detect design at all. It’s a shame that the DDA is — so far — proprietary technology.)
It may be of interest to readers to know the authors of the texts used in the first pilot study. The D1 samples are from Charles Darwin’s The Origin of Species (6th Edition), while the D2 samples are from William A. Dembski’s No Free Lunch. As I noted above, the scores assigned by the DDA are at least ordinal in Algorithmic Information Complexity. I am not at all surprised to find that Dembski’s text is on average less complex than Darwin’s.
As noted above, the problem of devising a mechanical process for representing non-textual stimulus objects so they are suitable for input to the DDA is a difficult one. Just recently I devised an approach to the representation problem that has some promise. Again using 10 instances of non-verbal designed products from each of two human designers, neither of whom were aware of the analyses to be performed, the DDA yielded these scores (same notation as in the first pilot data):
TABLE 2. DDA Scores for Non-Textual Stimuli
|Instance||DDA Score||Instance||DDA Score|
As with the data on textual stimuli in Table 1, the Table 2 means are significantly different (two-tailed t(18) = 2.22, p<.05).
There are a number of methodological and conceptual issues yet to be worked out, of course. Most notably, while in both samples the two designers’ products differ significantly on mean discrimination scores for samples of 10 instances, there is still non-trivial overlap between the distributions. That makes reliable classifications of single instances difficult. There are mathematical methods that may be useful for “sharpening” encoded representations of instances presented to the DDA (analogous to the operation of lateral inhibition in retinal processing) that may reduce the overlap of the distributions and thereby increase the reliability of classifications. In addition, by analogy with the separate representations of visual edges, colors, and so on at higher cortical levels in the visual system, it may be possible to independently encode major features of the designed objects so as to generate vectors of discrimination scores that may provide the basis for more reliable classifications of single instances.
As I did two years ago when I introduced MDT, I will once again note with no particular modesty that as of this moment, MDT has a substantially broader and firmer empirical base than does mainstream ID. Though mainstream ID’s method for (purportedly) detecting design has been public for at least 6 years, since Dembski’s book The Design Inference was published in 1998, Dembski’s Explanatory Filter has been formally applied to only one or two isolated phenomena in the domain of interest, biology, and no systematic validation or reliability research has been published to support its claim of being able to reliably detect design of any sort, human or non-human. Similarly, there are no systematic validation data available for Dembski’s complex specified information or specified complexity, nor is there any for Behe’s irreducible complexity. In stark contrast, MDT’s pilot research has already shown that it can do what it claims: make discriminations among the products of different designers.
I said in the Introduction to Multiple Designers Theory above that MDT offers a potentially fruitful research program, and that program is under way. The first step is working through the tedious process of actually developing and validating designer discrimination methods on appropriate test stimuli. Where is mainstream ID’s research program six years after the publication of The Design Inference with its alleged method for detecting intelligent design? Completely absent.
The Moderator of ISCID’s Brainstorms took exception to that last paragraph, writing
If the purpose of your post was to make the following point:
Where is SUDID’s research program five years after the publication of TDI?
Then it really has no place at Brainstorms. Your post has a touch of legitimacy to it, yet it also rings of polemical sarcasm. Please don’t waste our time if your only intention is to point out that somebody else’s research program isn’t moving along as fast as you’d like it to. On the other hand, if your interest is in pursuing your MDT concept for its own sake (which I’ve come to increasingly doubt) then do it, and stop throwing in these hints which indicate otherwise.
That beautifully illustrates the (justified!) defensiveness of the ID movement in the face of its complete lack of a coherent research program. I responded
One of the main criticisms of MDT has been that I made (added in late edit: unsupported and unsupportable) claims about what might be possible using it as an orienting theory. For four months, since I first outlined it here, I’ve been working on developing methodologies consistent with its assumptions and goals. My recent posting is to report progress on that project. Genuine progress, progress that answers the initial question: Can designs generated by different designers be [mechanically] discriminated one from another? That’s not a trivial project, and it’s not a trivial result.
I took heat for “designer-centric extremism” on the grounds that to do research on the nature, identity, and properties of the designer(s) requires pre-knowledge of, or presuppositions about, those very properties. As I report above, that criticism is unfounded: It is possible to statistically distinguish between samples of the products of different designers with no assumptions at all about the nature, properties, or identities of the designers, and without any knowledge of the design process itself. All that is required is the samples of products themselves. That is not a trivial result.
I have not done this work (and work it is!) merely to tweak someone’s tail. I suggest you consult some of the really polemical writing in science to provide a reference, particularly the kinds of polemics that occur in informal contexts. My remarks about SUDID are comparative, intended to make what I (who have been a working scientist for decades) consider to be an important point: the research program based on MDT is potentially richer and already more fruitful than the SUDID research program. I argued in the MDT thread that [mainstream ID] is a subset of MDT, and that opening consideration up to multiple designers hypotheses could inform an active research program. And it has. Working alone and in my spare time, in four months I have begun to produce systematic validation data on the methodologies inspired by MDT.
One of my principal criticisms of [mainstream ID] for longer than MDT has existed is that [mainstream ID] has not done so. It has not provided systematic validation data on its methods, even though those methods have been publicly available for years. That critique is independent of the MDT research except insofar as the latter demonstrates, not merely asserts, that it is concerned with the kinds of detailed and foundational research that a real novel scientific research program must take into account. If anything, the MDT research is a model for [mainstream ID] research. If the purpose of my posting was merely to ask that question, it’s been asked multiple times to no response except the implicit response contained in Dembski’s keynote speech at the RAPID conference where he called for compilation of some sort of Catalog of facts about phenomena thought to be inconsistent with evolutionary theory. Even in that speech, though, there was no mention of a scientific purpose for such a catalog (e.g., to provide foundational data to inform a systematic research program) but rather rhetorical - to convince the unconvinced that problems with evolutionary theory exist.
Your suspicions notwithstanding, MDT has already spawned real research generating real data that speak to real issues that the theory defines as important. Moreover, those methods appear to have applications in other non-trivial contexts, not just in MDT. That is part of what is meant by a fruitful research program. It is not inappropriate for proponents of one theory to criticize a competing theory if that competitor is seen to fall short of its promise. That happens all the time in science. Many of the articles in PCID provide examples.
“PCID” is Progress in Complexity, Information, and Design, the electronic journal of the International Society for Complexity, Information, and Design. I recommend scanning through it to get an idea of the empirical impoverishment of ID’s publications. Though it was to be a quarterly journal, the most recent issue is October 2003, nearly a year ago. It is evident that there has been precious little progress in complexity, information, or design made by the mainstream ID camp. Two years after my MDT postings on ISCID there is still no coherent mainstream ID research program, no validation or reliability studies of mainstream ID’s methodology, and most damning, there is still no mainstream ID theory to test.
Like its Scientific Creationist predecessor, mainstream single-designer Paleyist ID is a dry hole, a scientifically sterile attempt to salvage metaphysical preconceptions. Multiple Designers Theory offers a way out of that intellectual sterility. MDT demonstrably can generate a more fecund research program than mainstream ID. However, no mainstream ID “theorist” has taken up MDT even though it was originally introduced on two flagship ID Web sites, ARN and ISCID. Perhaps publishing MDT on The Panda’s Thumb, a premier ID critics’ site, will yield better results. One can only hope.
Once again, feel free to distribute these essays complete with identifying marks and scars (i.e with appropriate attribution) to legislators, to local and state school board members, and to other parties interested in teaching alternatives to evolution in public schools. If they claim that mainstream ID is their preference, press for an actual description of mainstream intelligent design theory, and press them for even as much data as are reported here. Multiple Designers Theory is theoretically more general and empirically better corroborated than mainstream ID, and has at least as much chance of being correct as mainstream ID.
Copyright (c) 2002, 2003, 2004 by Richard B. Hoppe