Richard B. Hoppe posted Entry 504 on September 23, 2004 08:21 PM.
Trackback URL: http://www.pandasthumb.org/cgi-bin/mt/mt-tb.fcgi/503

I.  Introduction: Designer Discrimination Algorithms

A significant problem in developing a revolutionary new theory is the parallel development of methods and technologies appropriate to testing the theory.  As I have said a number of times in response to criticisms of Multiple Designers Theory, the absolutely necessary first step in the MDT research program must be the development and validation of designer discrimination methodologies.  In the Introduction to Multiple Designers Theory above I saidthat in developing a design discrimination methodology, MDT has the same task as mainstream ID. First the methodology must be systematized and formalized. Then it must be empirically validated on test materials for which we already know the histories.  The first task of MDT is to develop a formalized researcher-independent methodology that, when it is eventually applied to phenomena whose provenance and history we do not know, it can be legitimately expected to reliably tell us something of interest about the phenomena.  Mainstream intelligent design has so far avoided that task: there are no validation data at all on its principal design detection methods.  MDT, however, has begun the task of validation and calibration of its methodologies.

When proposing a new and potentially revolutionary theoretical structure and associating that theory with a new and untested methodology for evaluating the theory, it is absolutely critical to validate the associated methodology. That is logically and practically prior to anything else. Absent systematic and careful empirical validation of the methodology, ‘tests’ of the theory using the methodology are at best suspect. At worst they can be interpreted to be merely self-serving. Careful and systematic empirical validation of the methodology must be the foundation of, and precede (or at least parallel) tests of a theory using that methodology.

Whether one is trying to discriminate design from no design, as is the case for mainstream ID, or to distinguish the work of one designer from another, the task of developing, systematizing, formalizing, and empirically validating the methodology is critical. Absent that, claims about one’s theory are not plausible. A theory may have all the promise in the world, but without empirically validated methodologies for making observations and systematically gathering data to test hypotheses, it will remain merely a conjecture with no empirical content or explanatory utility.

Developing and validating a new methodology requires attention to two issues, the validity and reliability of the methodology.  “Validity” refers to the question of whether the method measures what it purports to measure: does it do what it claims to do?  Does a spectrometer in fact provide information concerning the frequency composition of light?  “Reliability” refers to the question of consistency: does a methodology provide consistent results when repeatedly applied?  Can one mechanically apply the method so subjective human judgments are not implicated in the results it yields?

Another significant problem in developing a general methodology for discriminating among the products of different designers is how the products will be represented as inputs to the discrimination algorithm.  The representation problem is at the heart of pattern recognition, which is what a Designer Discrimination Algorithm must do: recognize the different patterns that are generated by different designers.  Mainstream ID proponents have argued at various times that “irreducibly complex” structures, “complex specified information” or “specified complexity” are hallmarks of intelligent design, and they are incorporated in various ways in the arguments those proponents make.  Unfortunately, there is no consistency in the way those various notions are defined, measured, or applied to real-world phenomena.  For example, the latter two require that a human make a subjective judgement about whether an appropriate “specification” exists, but there is no developed body of methods that govern those judgements.  Mainstream ID has no formal or mechanical way of representing objects or processes to serve as inputs to its methodologies.  Mainstream ID is in N-Ray territory here.  A primary goal of MDT’s designer discrimination methodology development program is to develop objective representations of phenomena that avoid the requirement for subjective human judgments, making the designer discrimination process completely mechanical.  As the pilot data below suggest, achievement of that goal appears to be in sight.

Complexity is important in discriminating among designers, and the Designer Discrimination Algorithm (DDA) I am developing and evaluating takes account of that importance.  There are many methods that try to measure complexity.  One author identifies dozens of complexity measures.  (Interestingly, William A. Dembski’s Complex Specified Information did not make the cut in that paper.  See also this critique.)  One component of the DDA is a rough analog of Solomonoff/Kolmogorov/Chaitin Complexity, or Algorithmic Information Complexity (AIC).  There are other components in the DDA, but the construction of the DDA is such that the scores it assigns to stimuli should be (at least) ordinally associated with AIC.  (See also this paper and this one — both by Cosma Shalizi and his colleagues — for related ideas about measuring the complexity and organization of complicated systems.)

Here I report progress on the effort to develop a designer discrimination methodology that is valid and reliable. This preliminary report will be slightly disappointing to readers because at this stage I do not intend to disclose the specific designer discrimination methodologies I’m testing . As I have noted elsewhere, there is a commercial niche for such technologies. I am doing some of this work on company time with company machines, and I am obliged to reserve the technology until it is developed far enough to be appropriately protected or until my company decides that it can be released into the public/academic domain.

Nevertheless, I can now say with confidence that at least one methodology under evaluation has the capability to discriminate among the complex objects created by several different designers. Pilot studies described below show that it provides statistically significantly different scores for samples of the products of different designers when it is mechanically applied, and there is reason to believe that the technology may be amenable to being incorporated into an automated classifier system that allows reliable assignments of products to designers with the process untouched by human hands (or judgments).

The principal task in discriminating among the products of multiple designers is to devise an analysis technology that can be mechanically implemented so subjective researcher-idiosyncratic variables cannot influence the encoding and scoring of the products to be analyzed. The technology undergoing evaluation does that. It is researcher independent. No subjective judgments are exercised in the selection of instances, in the application of the encoding method, or in the application of the discrimination method. All those components can in principle be fully automated, though I have not yet done so; this is, after all, still pilot research. In pilot research it is useful to have the stages of analysis clearly separated for process tracking and troubleshooting purposes. The designer discrimination methodology makes no use of background knowledge about the design processes employed by the designers and makes no assumptions about how the designs were manufactured (instantiated in matter and energy).  It requires no assumptions about the nature of the designers, and uses no extra background information about the designed objects. It is a purely objective methodology that analyzes just the designed structures.

II. Pilot Research on Designer Discrimination Methodologies

To give the flavor of the pilot research, here are the results of two small pilot studies of the designers discrimination technology under development. 

In the first pilot study, the materials were samples of text from two human authors.  The texts were mechanically represented so as to be appropriate for input to the discrimination algorithm.  Representation is a significant problem, and solving it in a general way so that non-textual materials can also be mechanically represented for presentation to the algorithm is no trivial task.  That problem may now have been solved, too — see the second pilot study below.

Scores were assigned by the Designer Discrimination Algorithm to each of ten instances generated by of each of the two human writers, D1 and D2. The instances were generated by the designers in ignorance of the testing to be done on those products and with no particular instructions about what the designs should embody by way of structure. The first number indicates the designer, the second designates the instance. For example, “D1,3” is the third instance of the first designer. The scores are in dimensionless units, rescaled with identical scale factors.  As mentioned above the scores are (at least) ordinal in Algorithmic Information Complexity.

TABLE 1.  DDA Scores for Textual Stimuli

 
InstanceDDA Score Instance DDA Score
  D1,1 2.897101   D2,1 2.851835
  D1,2 2.887752   D2,2 2.832242
  D1,3 2.758767   D2,3 2.351285
  D1,4 2.958867   D2,4 2.967218
  D1,5 3.015303   D2,5 2.789246
  D1,6 2.954063   D2,6 3.010922
  D1,7 2.910997   D2,7 2.559147
  D1,8 3.200433   D2,8 2.825446
  D1,9 2.929112   D2,9 2.797831
  D1,10 3.085128   D2,10 2.706459
  Mean 2.959752   Mean 2.769163

The means of the two distributions of scores are statistically significantly different (2-tailed t(18) = 2.10, p<.05). The scores in the aggregate reliably distinguish between the two samples. As a reference point, the mean score assigned by the DDA to a sample of random structureless texts with comparable numbers of “words” is on the order of 7.5 standard deviations different from the mean scores of the two distributions above. As well as detecting designer differences, the technology very reliably detects human-generated text in general from random text.  (That suggests the possibility that the DDA may be useful to mainstream ID to salvage its forlorn effort to detect design at all.  It’s a shame that the DDA is — so far — proprietary technology.)

It may be of interest to readers to know the authors of the texts used in the first pilot study.  The D1 samples are from Charles Darwin’s The Origin of Species (6th Edition), while the D2 samples are from William A. Dembski’s No Free Lunch.  As I noted above, the scores assigned by the DDA are at least ordinal in Algorithmic Information Complexity.  I am not at all surprised to find that Dembski’s text is on average less complex than Darwin’s.

As noted above, the problem of devising a mechanical process for representing non-textual stimulus objects so they are suitable for input to the DDA is a difficult one.  Just recently I devised an approach to the representation problem that has some promise.  Again using 10 instances of non-verbal designed products from each of two human designers, neither of whom were aware of the analyses to be performed, the DDA yielded these scores (same notation as in the first pilot data):

TABLE 2.  DDA Scores for Non-Textual Stimuli

 
InstanceDDA Score Instance DDA Score
  M1,1 2.677709   M2,1 2.481005
  M1,2 3.149289   M2,2 2.325280
  M1,3 2.870003   M2,3 2.764049
  M1,4 2.654073   M2,4 2.327951
  M1,5 2.513959   M2,5 2.791067
  M1,6 2.698993   M2,6 2.428283
  M1,7 2.899635   M2,7 2.528682
  M1,8 2.484545   M2,8 2.678652
  M1,9 2.636921   M2,9 2.387657
  M1,10 2.745839   M2,10 2.730819
  Mean 2.733097   Mean 2.544344

As with the data on textual stimuli in Table 1, the Table 2 means are significantly different (two-tailed t(18) = 2.22, p<.05). 

There are a number of methodological and conceptual issues yet to be worked out, of course. Most notably, while in both samples the two designers’ products differ significantly on mean discrimination scores for samples of 10 instances, there is still non-trivial overlap between the distributions. That makes reliable classifications of single instances difficult.  There are mathematical methods that may be useful for “sharpening” encoded representations of instances presented to the DDA (analogous to the operation of lateral inhibition in retinal processing) that may reduce the overlap of the distributions and thereby increase the reliability of classifications. In addition, by analogy with the separate representations of visual edges, colors, and so on at higher cortical levels in the visual system, it may be possible to independently encode major features of the designed objects so as to generate vectors of discrimination scores that may provide the basis for more reliable classifications of single instances.

III.  Conclusions

As I did two years ago when I introduced MDT, I will once again note with no particular modesty that as of this moment, MDT has a substantially broader and firmer empirical base than does mainstream ID. Though mainstream ID’s method for (purportedly) detecting design has been public for at least 6 years, since Dembski’s book The Design Inference was published in 1998, Dembski’s Explanatory Filter has been formally applied to only one or two isolated phenomena in the domain of interest, biology, and no systematic validation or reliability research has been published to support its claim of being able to reliably detect design of any sort, human or non-human.  Similarly, there are no systematic validation data available for Dembski’s complex specified information or specified complexity, nor is there any for Behe’s irreducible complexity.  In stark contrast, MDT’s pilot research has already shown that it can do what it claims: make discriminations among the products of different designers.

I said in the Introduction to Multiple Designers Theory above that MDT offers a potentially fruitful research program, and that program is under way.  The first step is working through the tedious process of actually developing and validating designer discrimination methods on appropriate test stimuli. Where is mainstream ID’s research program six years after the publication of The Design Inference with its alleged method for detecting intelligent design?  Completely absent.

The Moderator of ISCID’s Brainstorms took exception to that last paragraph, writing

RBH,
If the purpose of your post was to make the following point:

  Where is SUDID’s research program five years after the publication of TDI?

Then it really has no place at Brainstorms. Your post has a touch of legitimacy to it, yet it also rings of polemical sarcasm. Please don’t waste our time if your only intention is to point out that somebody else’s research program isn’t moving along as fast as you’d like it to. On the other hand, if your interest is in pursuing your MDT concept for its own sake (which I’ve come to increasingly doubt) then do it, and stop throwing in these hints which indicate otherwise.

That beautifully illustrates the (justified!) defensiveness of the ID movement in the face of its complete lack of a coherent research program.  I responded

Moderator,

One of the main criticisms of MDT has been that I made (added in late edit: unsupported and unsupportable) claims about what might be possible using it as an orienting theory. For four months, since I first outlined it here, I’ve been working on developing methodologies consistent with its assumptions and goals. My recent posting is to report progress on that project. Genuine progress, progress that answers the initial question: Can designs generated by different designers be [mechanically] discriminated one from another? That’s not a trivial project, and it’s not a trivial result.

I took heat for “designer-centric extremism” on the grounds that to do research on the nature, identity, and properties of the designer(s) requires pre-knowledge of, or presuppositions about, those very properties. As I report above, that criticism is unfounded: It is possible to statistically distinguish between samples of the products of different designers with no assumptions at all about the nature, properties, or identities of the designers, and without any knowledge of the design process itself. All that is required is the samples of products themselves. That is not a trivial result.

I have not done this work (and work it is!) merely to tweak someone’s tail. I suggest you consult some of the really polemical writing in science to provide a reference, particularly the kinds of polemics that occur in informal contexts. My remarks about SUDID are comparative, intended to make what I (who have been a working scientist for decades) consider to be an important point: the research program based on MDT is potentially richer and already more fruitful than the SUDID research program. I argued in the MDT thread that [mainstream ID] is a subset of MDT, and that opening consideration up to multiple designers hypotheses could inform an active research program. And it has. Working alone and in my spare time, in four months I have begun to produce systematic validation data on the methodologies inspired by MDT.

One of my principal criticisms of [mainstream ID] for longer than MDT has existed is that [mainstream ID] has not done so. It has not provided systematic validation data on its methods, even though those methods have been publicly available for years. That critique is independent of the MDT research except insofar as the latter demonstrates, not merely asserts, that it is concerned with the kinds of detailed and foundational research that a real novel scientific research program must take into account. If anything, the MDT research is a model for [mainstream ID] research. If the purpose of my posting was merely to ask that question, it’s been asked multiple times to no response except the implicit response contained in Dembski’s keynote speech at the RAPID conference where he called for compilation of some sort of Catalog of facts about phenomena thought to be inconsistent with evolutionary theory. Even in that speech, though, there was no mention of a scientific purpose for such a catalog (e.g., to provide foundational data to inform a systematic research program) but rather rhetorical - to convince the unconvinced that problems with evolutionary theory exist.

Your suspicions notwithstanding, MDT has already spawned real research generating real data that speak to real issues that the theory defines as important. Moreover, those methods appear to have applications in other non-trivial contexts, not just in MDT. That is part of what is meant by a fruitful research program. It is not inappropriate for proponents of one theory to criticize a competing theory if that competitor is seen to fall short of its promise. That happens all the time in science. Many of the articles in PCID provide examples.

“PCID” is Progress in Complexity, Information, and Design, the electronic journal of the International Society for Complexity, Information, and Design.  I recommend scanning through it to get an idea of the empirical impoverishment of ID’s publications.  Though it was to be a quarterly journal, the most recent issue is October 2003, nearly a year ago.  It is evident that there has been precious little progress in complexity, information, or design made by the mainstream ID camp.  Two years after my MDT postings on ISCID there is still no coherent mainstream ID research program, no validation or reliability studies of mainstream ID’s methodology, and most damning, there is still no mainstream ID theory to test.

Like its Scientific Creationist predecessor, mainstream single-designer Paleyist ID is a dry hole, a scientifically sterile attempt to salvage metaphysical preconceptions.  Multiple Designers Theory offers a way out of that intellectual sterility.  MDT demonstrably can generate a more fecund research program than mainstream ID.  However, no mainstream ID “theorist” has taken up MDT even though it was originally introduced on two flagship ID Web sites, ARN and ISCID.  Perhaps publishing MDT on The Panda’s Thumb, a premier ID critics’ site, will yield better results.  One can only hope.

Once again, feel free to distribute these essays complete with identifying marks and scars (i.e with appropriate attribution) to legislators, to local and state school board members, and to other parties interested in teaching alternatives to evolution in public schools.  If they claim that mainstream ID is their preference, press for an actual description of mainstream intelligent design theory, and press them for even as much data as are reported here.  Multiple Designers Theory is theoretically more general and empirically better corroborated than mainstream ID, and has at least as much chance of being correct as mainstream ID.

Copyright (c) 2002, 2003, 2004 by Richard B. Hoppe

Commenters are responsible for the content of comments. The opinions expressed in articles, linked materials, and comments are not necessarily those of PandasThumb.org. See our full disclaimer.

Comment #7909

Posted by mithras on September 24, 2004 7:55 AM (e)

Bravo! Bravo! A most impressive performance.

Comment #7916

Posted by Rilke's Grand-daughter on September 24, 2004 10:46 AM (e)

Tease; no information on the really interesting question: what are the characteristics or features of the candidate objects that are selected for analysis by the DDA. Darn it.

Based on your ‘runs’ so far, have you identified a minimum sample size or algorithmn to determine that size that appears to ‘confirm’ a particular designer?

Comment #7924

Posted by Great White Wonder on September 24, 2004 12:42 PM (e)

Richard,

I would advise seeking patent protection for your software methods lest the “mainstream” IDers start stealing your ideas. Of course, you’ll be prevented from obtaining protection in most non-US countries, because of their strict publication requirements. In the US, you have one year from public disclosure to file.

Not that any of those characters would ever do anything which violated one of the Ten Commandments …

Comment #7925

Posted by Erik 12345 on September 24, 2004 12:44 PM (e)

Two questions enter my mind:

Question #1. Exactly which statistical problem have you set out to solve? I’m looking for an answer formulated in the same terms as descriptions of learning, classification, generalization, etc. tend be written in. A few examples of this form:

Example A: Given N samples, each sample shall be classified as belonging to exactly one of C categories (with category #17 meaning “designer #17 generated this sample”).

Example B: Given M training samples, with known classifications, and N other samples of unknown classifications, all the unknown classifications should be inferred.

Example C: Given N samples, determine the number of categories they should be classified into. (Or, in plain terms, determine the total number of designers that were involved in generating the samples.)

Question #2: What constraints must be placed on the input for it to be meaningful to apply your algorithm? And how much a priori knowledge about the kind of data is exploited? For example, you reported an application to samples of texts by Darwin and by Dembski. Suppose some of these texts had been typeset using only upper-case letters. Or suppose that some of the texts were adapted to pages with very large margins, so that the ASCII representations contained an excessive number of newline characters. Would you then have to preprocess the texts to get meaningful results?

Or, to make formulate essentially the same question differently, suppose not all of the texts were written in English, but that some of the texts were in German and Chinese. Would you then appeal to your a priori knowledge that the data you’re studying are written human languages, and stipulate that your applications of your algorithms to sample texts of different languages is invalid?

Comment #7928

Posted by Rilke's Grand-daughter on September 24, 2004 2:37 PM (e)

The other point arises of how can the criteria be ‘generalized’ between slightly variant artifacts?

Comment #7979

Posted by RBH on September 25, 2004 7:47 PM (e)

Rilke’s Grand-daughter asked whether minimum samples sizes had been established. That awaits better estimates of the effect sizes one can expect and the variability associated with the scores assigned by the DDA as it is refined.

Erik asked about the specific statistical question MDT attempts to answer, giving three examples. Initially, the task is most similar to Example A, “Given N samples, each sample shall be classified as belonging to exactly one of C categories (with category #17 meaning “designer #17 generated this sample”).” That is the methodology validation stage of affairs – validating and then testing the reliability of the discrimination methods. Later, when some clear hypothesis about the level of analysis appropriate to biology is at least tentatively identified, the questions illustrated by Erik’s Examples B and C comes into play. I do not plan to rush into biological unknowns before the methodology is under good control.

Erik’s second question, about constraints on the inputs, is still in a very early stage of research planning. I plain don’t know yet. I will say that the representational method at least in principle should make appeals to prior knowledge about the language of texts unnecessary. The goal of the methodology development program is to render the role of prior knowledge (at least to the extent that it can’t be mechanically applied) as irrelevant as possible.

Rilke’s Grand-daughter also asked about generalizing the criteria (for classification?) to variant artifacts – apparently meaning variant designs from the same designer. That’s a question yet to be addressed under the ‘reliability’ heading. Validity is the first question, reliability is the second.

RBH

Comment #12378

Posted by Calzaer on December 30, 2004 6:31 AM (e)

Have you used your method to test whether or not multiple instances from a single designer score differently from the way multiple instances from a pair of designers score? For instance, if you had run D1,1 D1,2 and D1,3 against D1,5 D1,8 and D1,10, (the three smallest numbers in D1 against the three largest numbers in D1) would you get a statistically insignificant difference between the means? Or would it come up as statisically significant, and therefore indicated as the results of different designers?

(It’s been entirely too long since I’ve had a statistical methods course, which is why I haven’t just used your numbers to do it myself. *sheepish*)

Also, what were the products you compared in Table 2?

Comment #12398

Posted by RBH on December 30, 2004 5:04 PM (e)

Calzaer,

Nope. Those are good questions, as are the questions you and RGD raised in the MDT thread, but due to the press of other business-related issues I’ve been on hiatus from MDT stuff for a while. I hope to get back to it in February, when my fond hope is that the pressure on my machinery (both silicon and wetware) will ease a bit.

RBH