New Approach to Data Mining
Triada, a small company in Ann Arbor MI and Foster City CA, has developed a remarkable and entirely different approach to analyzing large amounts of data, producing knowledge and insights that can be acted on.
The company has already attracted interest of some huge manufacturing companies, and appears poised for a dramatic lift-off. They’re seeking demonstration opportunities with individual companies, and with partner/vendors who can take the product into various markets.
During a recent visit to their facility, I had the system explained and demonstrated. The implications truly do appear staggering, as you see how simple it is on one level, and how powerful on another.
The key is a unique “transform” process that looks at the data in an entirely different way. Engineers will appreciate an analogy with the Fourier Transform, used for more than a century to analyse continuous waveform data. Transforms can dramatically reduce the amount of data, and/or provide a different way of “seeing” it, e.g., making it possible to understand in ways that never would have been possible with the orginal data. (Imagine trying to make sense of frequency spectra without FT.)
The following material is taken directly from Triada’s company literature and website, at http://www.triada.com Note the white paper especially, under “Technology” for the complete rendition of this concept, along with graphics.
Contact: Bruce Borden, 650-378-7506, bborden@triada.com
—————————————-
— Triada’s NGram Transform.
The software uses a unique and patented method for identifying relationships between disparate data items and presenting those relationships in an extraordinarily compact and useful manner.
Triada’s revolutionary technology transforms information into associations. Modeled on human memory, the NGram Transform, learns raw data, remembering it in Associative Memory Structures (AMSs). These structures contain all associations that exist in the original data, presenting them in an interactive and intuitive form. Unlike data mining, which relies primarily on guesswork and brute force querying, the AMS reveals data knowledge quickly and visually.
Today’s enterprises are overwhelmed by a glut of data. The challenge is no longer how to gather and store information. Instead, it is how to turn very large amounts of data into useful knowledge.
The NGram Transform, mathematically transforms raw data from the data domain into powerful knowledge in the association domain. In the data domain, knowledge is difficult to identify. It is lost in the myriad of details collected in the database. By seeing all associations within data, NGramTM transforms data into the association domain, revealing knowledge and making it readily accessible to the analyst.
— Associative Memory Structures-the Power of NGram Transform
Because an AMS stores information as associations, redundant information shrinks as it is transformed. AMSs are typically much smaller than the size of the original data. And, as more information is tranformed into an AMS, its growth rate slows. This is because the AMS has already learned much of the information it is encountering.
In a corporate database, information is highly redundant. Think of the sales records for an automotive company. Each record contains fields such as date, model, options, color, price, warranty, dealership, and salesperson. A given model is sold thousands of times, with a few hundred combinations of options, color, and warranty. Prices vary from sale to sale, but, for one model, they will all fall within a rather narrow band, such as from the list price of $14,995 down to $11,999.
NGram Transform can associate all prices with all sales by all salespeople, etc. This transformation by association is knowledge. By simply looking at the AMSs, you can see many useful facts, such as how many trucks a particular dealership sold, how many red Fords were sold on a particular date, and which dealerships discounted more heavily than others. This means that you do not need to write and solve complex database queries to obtain the equivalent information.
Statistically Unusual Events Can be Significant
Things that happen infrequently can be just as interesting and potentially important as those that occur often. While low-association frequency events are often merely data entry errors, they can also be important events that data scrubbing applications almost always miss.
Some failures may occur very infrequently, but always happen in some combination of events: whenever a certain make of car is nine-months-old, the extra undercoat option was not applied, and if the car was purchased in northern cities, then the radiator always springs a leak. This association perhaps implies that the radiator is rusting from salt corrosion. NGram Transform captures the relationship as soon as it occurs. Summary or statistical analysis would most likely miss this association. Then, months later when it has become a sizable problem, the manufacturer’s engineers would begin searching for an explanation of why the defect occurs. Quite often, due to the enormous volume of data involved, the association that would be immediately evident with NGram Transform is never spotted by traditional methods, and the true root cause of a problem remains a mystery.
The Power of Association
Associations have many other powerful characteristics that simplify knowledge discovery:
• Associations act like memories of events. The human brain processes information in a non-linear way. NGram Transform makes similar associations, finding patterns in seemingly random fact sets. For example, NGram Transform can associate the number of warranty claims brought into a particular dealership, revealing that the most claims occur during January.
• Associations are facts. If a dealership sells 25 red pickups, NGram Transform can make that association.
• Associations act like intersections of events. If Sales Rep Jane sells 12 red Ford pickups on the same day Sales Rep Joe sells five minivans, those two events will be associated by NGram Transform.
• Associations act like Boolean operations. When we build up associations, we are doing Boolean operations (mostly ands) of multiple records. With NGram Transform the “and” of all of the pick-up sales records will be available in a node that combines color and make. The node counting pick-ups will be the “or” of all of the colors, models, sales reps, and so forth.
Associations are answers to questions yet to be asked. Because queries of a large database often require significant processor time and disk space, the user usually restricts the query so that it produces only a fraction of the information the user knows is available in the database. Using NGram Transform, the answer to the same query appears as an association in an AMS node. In fact, all the related associations are also revealed in the same node.
Vital Statistics
NGram Transform and Athena run under Windows NT on an Intel compatible PC. Depending on the amount of original data and its redundancy, the transformed AMS will be smaller than the original data by a sizable factor. In several recent cases, 50 GBytes of original data required only five GBytes of storage as an AMS. This is just the opposite of storage in an RDBMS, which typically adds 200% overhead, requiring roughly three times the original data size for storage. This compression allows AMSs to be backed up efficiently, transmitted, or replicated.
NGram Transform is very fast. It learns information at around 5 GBytes per hour per processor. Once built, you can copy an AMS from the system where it was built to each user’s system, or the AMS can be accessed through NT’s distributed file system. Multiple processors in one system can display the same AMS or independent AMSs. You can incrementally update an AMS. In general, an AMS never forgets, but you can remove records from it.
Summary
You will gain new business insights whenever you view an AMS transformation of your data. Every Triada customer is finding new and exciting ways to leverage the knowledge NGram Transform reveals. A financial credit company is using NGram Transform to find errors from their information sources. They plan to also use NGram Transform to identify target customers for mailing lists. An auto manufacturer is using Athena to pinpoint root causes of failures before they become bigger problems. They intend to also use Athena to identify option combination trends for target marketing.
** NGram Transform converts data from an information domain into an association domain. Once transformed, all associations in the data become directly viewable.
** NGram Transform is loss-less and reversible; the original records may be retrieved.
** NGram Transform converts raw data into Associative Memory Structures. AMSs typically require much smaller amounts of storage space then the original data size and they grow more slowly as more information is taught to them.
** The associative memory portions of an AMS are even smaller. These will fit in the main memory of most computer systems, requiring little or no I/O during the examination process.
Data mining applications require guesswork and extensive querying. Relying only on these tools is a risky proposition; you never know if you’ve discovered everything of interest. With NGram Transform, knowledge is instantly at your fingertips, revealed to you in an intuitive and visual form. Your analysts will save time and money.
Discover the power of NGram Transform and transform your data into valuable knowledge.