machinespace = the networked information space of ever-increasing complexity that humans have to interact with.

October 03, 2004

The case for Context Engines - Human Factors and Massive Data

or, Human interaction with Massive Data (HiMD)...

Massive Data analysis has been around for a long time, but since the data sets were formerly available to governmemt entities, there wasn't much popular interest. The 90's brought with the tools for Data Mining, and large companies suddenly discovered that they were sitting on a treasure trove of information if they could only interpret and understand the information that their data tools were spewing out. However, since the mid-nineties, the internet has made massive data (or more correctly, Massive Information sets) available to all and sundry.

The problem of navigating large information/data sets to find a desired piece of information has been beaten to death over the last 10 years under the label of Search technologies, and their arcane algorithms to help user's "find" information... however, "search" technologies are just beginning to scratch the surface - they haven't got to the real problems yet.

The real problem is not finding specific pieces of information, but in placing them in such a framework that an analyst can derive courses of action from them. That is, providing the analyst with a set of "background" tools that will not only help them find the information they need in the huge mass of data, but also convert the information into real knowledge by relating them contextually to the problems that the analyst needs to solve.

There are essentially 4 components to typical HiMD problems - a massive data set, the analyst/interpreter, the underlying analytical tools and the line of engagement.

Massive Data set = the information space in question (for example, US Elections, Iraq War, Terrorism, Greenhouse Effect, Hurricane Ivan, Human Genome)
Analyst/Interpreter = Subject Matter Expert, source of contextual knowledge
Analytical Tools = support tools for the analyst, currently, mostly search algorithms, and pattern engines, some prediction capabilities)
Line of Engagement (LoE) = the application area or space.

Note: The 'line of engagement' or 'LoE' is the phrase I am using to describe the application space for the results of the massive data analysis - in most cases, the LoE is localized or to a specific area, locale, or is bound by time parameters. LoE can be localized, regional or global and can be time limited or ongoing.

As can be seen from the examples, the US Election, although it may have global implications, is confined to the United States, and is to be held on a specific date. Hence it's LoE is limited by region and time.

Hurricane Ivan is localized to certain time and space. It's LoE begins with the identification of a Tropical Depression, and ends with the dissipation of the system over the mainland - hence it too is limited by region, and time. Add it's devastation the rebuilding efforts, we have a much larger data set, over a longer period of time, but still limited to the area impacted by it's track.

The Greenhouse effect, although global, is gradual, and can be measured and analyzed at almost any place in the world with the proper calibrations for the area or locale. It's LoE is global, ongoing and dynamic, but the immediate threat level is low, and response time need not be immediate.

The Iraqi war is localized to a region, although it is a very complex and dynamic data set. Since there is no formal war, with definite rules of engagement and disengagement, there is no mutually agreed to ceasefire or any other warfighting conventions. The Iraqi war data set is dynamic, with new insurgencies erupting in areas previously under control, and new areas, formerly insurgent, coming under control. Thus, the LoE for the Iraqi war is regional, ongoing and dynamic, with high immediate threat and needing quick response times.

Terrorism is ongoing and extremely complex, and the LoE reflects the complexity - Terrorism has an LoE that is global, ongoing, dynamic, with a high immediate threat and quick response time.

As can be seen from the above examples, the scope of each Massive Data set is very varied, and in cases where the LoE is ongoing and global with high immediate threat levels or where quick response is required, the need for tools to aid the analyst becomes paramount.

However, it is no longer sufficient to provide the analysts with knowledge management, filtration or pattern recognition tools. These may suffice in situations where the nature of the MD is benign, and where there is no urgency of threat or response. What is needed are tools to build context.

At present, the contextual knowledge is held by the analyst himself, by virtue of their being a subject matter expert in the specific area, and thus able to take the pieces of information and make some sense from it - for a long time, this was sufficient since in every field that used massive data on a regular basis, the line of engagement (LoE) was sufficiently localized or small enough that a few analysts (or at most a few dozen) were sufficent to provide the necessary contextual reference and interpret the data accordingly.

However, in the area of Terrorism, or for that matter in any field where the LoE is ongoing and dynamic with the requirement for quick response and immediacy of action, there simply aren't enough analysts with the contextual knowledge to be able to convert the information provided by current level tools into actionable knowledge.

What can be done? The tools I am proposing are basically "context engines" that can build a reference framework on an ongoing basis, once seeded with a topic of interest - historical, political, cultural, psychological, economical, environmental, biological, religious etc etc.. whatever the case may be... the context engines would then be able to "ingest" the incoming data in the background, and keep building up the contextual fabric for that particular data set. Of course, such a contextual fabric would need to be periodically assessed by running queries against known facts and outcomes. This could be done by acknowledged subject matter experts.

The outputs of several context engines could be combined or cross-linked to provide a richer information sets for the analyst community - having context will allow for more analysts to operate effectivley, even though they may not have the required depth of experience to classify them as subject matter experts. This would be especially valuable in situations where a LoE is expected to widen rapidly in a very short period of time.

copyright 2004 ajoy muralidhar. all names, websites and brands referenced are the copyright or trademark of their respective owners.


Post a Comment

<< Home