Sources And Methods: Experiment

Showing posts with label Experiment. Show all posts

Friday, August 13, 2010

Does Analysis Of Competing Hypotheses Really Work? (Thesis Months)

The recent announcement that collaborative software based on Richards Heuer's famous methodology, Analysis of Competing Hypotheses, would soon be open-sourced was met with much joy in most quarters but some skepticism in others.

The basis for the skepticism seems to be the lack of hard evidence that ACH actually improves forecasting accuracy. While this was not the only (and may not have been the most important) reason why Heuer created ACH, it is certainly a question that bears asking.

No matter how good a methodology is at organizing information or creating an analytic audit trail or easing the production burden, etc., the most important element of any intelligence methodology would seem to be its ability to increase the accuracy of the forecasts generated by the method (over what is achievable through raw intuition).

With a documented increase in forecasting accuracy, analysts should be willing to put up with almost any tedium associated with the method. A methodology that actually decreases forecasting accuracy, on the other hand, is almost certainly not worth considering, much less implementing. Methods which match raw intuition in forecasting accuracy really have to demonstrate that the ancillary benefits derived from the method are worth the costs associated with achieving them.

It is with this in mind that Drew Brasfield set out to test ACH in his thesis work while here at Mercyhurst. His research into ACH and the results of his experiments are captured in his thesis, Forecasting Accuracy And Cognitive Bias In The Analysis Of Competing Hypotheses (full text below or you can download a copy here).

To test ACH, Drew used 70 students divided between a control and an experimental group who were all familiar with ACH. The groups were asked to research and estimate the results of the 2008 Washington State gubernatorial election between Democrat Christine Gregoire and Republican Dino Rossi (Gregoire won the election by about 6 percentage points). The students were given a week in September 2008 to independently work on their estimate of who would win the election in November.

The results were in favor of ACH in terms of both forecasting accuracy and bias. In Drew's words, "The findings of the experiment suggest ACH can improve estimative accuracy, is highly effective at mitigating some cognitive phenomena such as confirmation bias, and is almost certain to encourage analysts to use more information and apply it more appropriately."

The results of the experiment are displayed in the graphs below:

Statistical purists will argue that the results did not meet the traditional 95% confidence interval test suggesting that the accuracy difference may be due to chance. True enough. What is clear, though, is that ACH doesn't hurt forecasting accuracy and, when combined with the other results from the experiment (see below) strongly suggests that Drew's characterization of ACH is correct.

Becasue Drew captured the political affiliation of his test subjects before he conducted his experiment he was able to sort those subjects more or less evenly into the control and experimental groups. Here again, ACH comes away looking pretty good:

The chart may be a bit confusing at first but the bottomline is that Republicans were far more likely to accurately forecast the eventual victory of the Democratic candidate if they used ACH. Here again the statistics suggest that chance might play a larger role than normal (an effect exacerbated by the even smaller sample sizes for this test). At the least, however, these results are consistent with the first set of results and, again, do nothing to suggest that ACH does not work.

Drew's final test is the one that helps clarify any fuzziness in the results so far. Here he was looking for evidence of confirmation bias -- that is, analysts searching for facts that tend to confirm their hypotheses instead of looking at all facts objectively. He was able to find statistically significant amounts of such bias in the control group and almost none in the experimental group:

It is difficult for me to imagine a method which worked so well at removing biases that would also not improve forecasting accuracy. In short, based on the results of this experiment, concluding that ACH doesn't improve forecasting accuracy (due to the statistical fuzziness) would also require one to conclude that biases don't matter when it comes to forecasting accuracy. This is an arguable hypothesis, I suppose, but not where I would put my money...

The most interesting part of the thesis, in my opinion, though, is the conclusion. Here Drew makes the case that the statistical fuzziness was a result of the kind of problem tested, not the methodology. He suggests that "ACH may be less effective for an analytical problem where the objective probabilities of each hypothesis are nearly equal."

In short, when the objective probability of an event approaches 50%, ACH may no longer have the resolution necessary to generate an accurate forecast. Likewise, as objective reality approaches either 0% or 100%, ACH becomes increasingly less necessary as the correct estimative conclusion is more or less obvious to the "naked eye". Close elections, like the one in Washington State in 2008 may, therefore, be beyond the resolving power of ACH.

Like much good science, Drew's thesis has generated a new testable hypothesis (one we are, in fact, in the process of testing!). It is definitely worth the time it takes to read.

Forecasting Accuracy and Cognitive Bias in the Analysis of Competing Hypotheses

Monday, August 2, 2010

Multi-Criteria Intelligence Matrices: A Promising New Method (Thesis Months)

One of the more interesting theses I have supervised over the last several years was Lindsey Jakubchak's The Effectiveness Of Multi-Criteria Intelligence Matrices In Intelligence Analysis.

Lindsey's thought was to take a version of the well-tested school of operational methodologies often referred to as multi-criteria decisionmaking methods (MCDM) and flip it on its head to turn it into an intelligence method. The results of her experiment show the method as promising in a number of different respects, though, clearly, there is still work to be done.

For those of you unfamiliar with MCDMs in general, there are many, many variants of the process and each is accompanied with all of the arguments and counter arguments typically associated with academe. Lindsey just wanted to see if there was any value in her proposition at all, so she chose one of the simplest, and most common forms of MCDM to "flip" -- a streamlined version of the US Army's Staff Study Method.

What do I mean by "flip"? Well, MCDMs are typically used to help select the most logical course of action based on a given set of criteria. Say, for example, you were looking to buy a car. You had down selected to three particular SUVs but you couldn't make up your mind which one was best for your family. An MCDM would ask you to select the criteria you thought were important to you and your family (seating, reliability, gas mileage, storage, etc) and then rate each car, using a matrix to sort the results. Arguably, the car that best meets your criteria is the one you should select (Anyone familiar with Consumer Reports, for example, knows that this is the way they come to their conclusions about various products).

What if it is not you buying the car, though? What if you are trying to figure out what kind of car a friend might buy? Your friend might prefer sports cars to SUVs and have an entirely different set of criteria for choosing one. That's what I mean by "flip". What if you could use an MCDM not as a tool to help you make better decisions but as an intelligence analysis method to help you figure out what an enemy, a criminal or a competitor is likely to do? That was what Lindsey set out to test and she gave her method a name -- the Multi-Criteria Intelligence Matrix.

While, due to the topic she decided to explore -- Russia's relationship with OPEC -- she was not able to evaluate forecasting accuracy (though I give her full points for trying), she was able to compare her experimental group to the control group in a number of other interesting ways. Using the standards in ICD 203 as a guideline, she was able to say a couple of interesting things like:

"Although the experimental group indicated a lower level of knowledge in regards to the topic (Russia’s relationship to OPEC) and expressed a lower level of interest with the topic, both of which were found to be statistically significant, the experimental group was able to arrive at a broader range of possible Courses Of Action (COAs)."

"The average completion time for the control group was 70 minutes and the average time for the experimental group was 58.6 minutes. Therefore, when looking at the big picture, although the experimental group seemed less knowledgeable and less interested, they were able to arrive at a more complete list of relevant possible COAs, and they completed their analysis in less time."

"While a few students in the control group provided one or two alternative COAs, the majority of the student-analysts merely provided one COA with few comparisons to any alternatives, thus not providing any insight to whether or not alternative solutions were considered. In the experimental group, the student-analysts, who used MCIM, provided a list of all possible COAs, and identified the importance of specific criterion or various factors to those COAs."

In the end, the study suggests the method has promise and, with Lindsey's results in hand, it has more evidence to back it than many other, more widely taught, methods. I have embedded the full text below or you can download it here.

Related Posts:
Top 5 Intelligence Analysis Methods

The Effectiveness of Multi-Criteria Intelligence Matrices In Intelligence Analysis

Wednesday, May 12, 2010

A Brilliant Failure (Thesis Months)

Embed-O-Matic

ScribdViewer.swf?document_id=27664439&access_key=key-2e1tritrkwsz4a7gg6ei&page=1&viewMode=list

(Note: Due to circumstances entirely within my control, I have been pretty lax about getting these theses -- particularly this one, which is very cool -- out the door in a timely manner. No worries, though. "Thesis Month" is now "Thesis Months").

Researchers rarely like to publish their failures. Want some proof? Next time you pick up a journal check to see how many of the authors are reporting experimental results that do not tend to confirm their hypotheses.

Sometimes, however, failures are so unexpected and so complete that they force you to re-think your fundamental understanding of a topic.

Think about it: It is not unreasonable to assume that a 50 lb cannonball and a 5 lb cannon ball dropped from the Leaning Tower of Pisa will hit the earth at different times. For more than 1000 years, this Aristotelian view of the way the world worked dominated.

The first time someone tested this idea (and, apparently, it wasn't Galileo, though he typically gets the credit) and the objects hit the ground at the same time, people were forced to reconsider how gravity works.

Shannon Ferrucci's thesis, "Explicit Conceptual Models: Synthesizing Divergent And Convergent Thinking", is precisely this type of brilliant failure.

Shannon starts with a constructivist vision of how the mind works. She suggests that when an intelligence analyst receives a requirement, it activates a mental model of what is known about the target and what the analyst needs to know in order to properly answer the question. Such a model obviously grows and changes as new information comes in and is never really complete but it is equally obvious that such a model informs the analytic process.

For example, consider the question that was undoubtedly asked of a number of intel analysts last week: What is the likely outcome of the the elections in the UK?

Now, imagine an analyst that was rather new to the problem. The model in that person's head might have included a general notion about the parliamentary system in the UK, some information on the major parties, perhaps, and little more. This analyst would (or should) know that he or she needs to have a better grasp of the issues, personalities and electoral system in the UK before hazarding anything more than a personal opinion.

Imagine a second, similar, analyst but imagine that person with a significantly different model with respect to a crucial aspect of the election (For example, the first analyst believes that the elections can end in a hung parliament and the second analyst does not believe this to be the case).

Shannon argues that making these models explicit, that is getting them out of the analyst's head and onto paper, should improve intelligence analysis in a number of ways.

In the first place, making the models explicit highlights where different analysts disagree about how to think about a problem. At this early stage in the process, though, the disagreement simply becomes a collection requirement rather than the knock-down, drag-out fight it might evolve into in the later stages of a project.

Second, comparing these conceptual models among analysts allows all analysts to benefit from the good ideas and knowledge of others. I may be an expert in the parliamentary process and you may be an expert in the personalities prominent in the elections. Our joint mental model of the election should be more complete than either of us will produce on our own.

Third, making the model explicit should help analysts better assess the appropriate level of confidence they should have in their analysis. If you thought you needed to know five things in order to make a good analysis and you know all five and your sources are reliable, etc, you should arguably be more confident in your analysis than if you only knew two of those things and the sources were poor. Making the model explicit and updating it throughout the analytic process should allow this sort of assessment as well.

Finally, after the fact, these explicit models provide a unique sort of audit trail. Examining how the analysts on a project thought about the requirement may go a long way towards identifying the root causes of intelligence success or failure.

Of course, the ultimate test of an improvement to the analytic process is forecasting accuracy. While determining accuracy is fraught with difficulty, if this approach doesn't actually improve the analyst's ability to forecast more accurately, conducting these explicit modeling exercises might not be worth the time or resources.

So, it is a question worth asking: Does making the mental model explicit improve forecasting accuracy or not? Shannon clearly expected that it would.

She designed a clever experiment that asked a control group to forecast the winner of the elections in Zambia in October 2008. With the experimental group, however, she took them through an exercise that required students to create, at both the individual and group levels, robust concept maps of the issue. Crunched for time, her experiment focused primarily on capturing as many good ideas and the relationships between them as possible in the conceptual models the students designed (Remember this -- it turns out to be important).

Her results? Not what she expected...

In case you are missing it, the guys who explicitly modeled their problem did statistically significantly worse -- way worse -- than those that did not.

It took several weeks of picking through her results and examining her experimental design before she came up with an extremely important conclusion: Convergent thinking is as important as divergent thinking in intelligence analysis.

If that doesn't seem that dramatic to you, think about it for a minute. When was the last time you attended a "critical thinking" course which spent as much time on convergent methods as divergent ones? How many times have you heard that, in order to fix intelligence, "We need to connect more dots" or "We have to think outside the box" -- i.e. we need more divergent thinking? Off the top of your head, how many convergent thinking techniques can you even name?

Shannon's experiment, due to her time restrictions, focused almost exclusively on divergent thinking but, as Shannon wrote in her conclusion, "The generation of a multitude of ideas seemed to do little more than confuse and overwhelm experimental group participants."

Once she knew what to look for, additional supporting evidence was easy to find. Iyengar and Lepper's famous "jam experiment" and Tetlock's work refuting the value of scenario generating exercises both track closely to Shannon's results. There have even been anecdotal references to this phenomena within the intelligence literature.

But never has there been experimental evidence using a realistic intelligence problem to suggest that, as Shannon puts it, "Divergent thinking on its own appears to be a handicap, without some form of convergent thinking to counterbalance it. "

Interesting reading; I recommend it.

Explicit Conceptual Models: Synthesizing Divergent and Convergent Thinking

Sources And Methods

Friday, August 13, 2010

Does Analysis Of Competing Hypotheses Really Work? (Thesis Months)

Monday, August 2, 2010

Multi-Criteria Intelligence Matrices: A Promising New Method (Thesis Months)

Wednesday, May 12, 2010

A Brilliant Failure (Thesis Months)

I want to use some material on this blog...

Popular Posts

Blog Archive

About Me

Career Advice!

Strawman