Finding the needle in a haystack
FAIRFIELD — Maharishi University of Management is becoming a key player on the national stage for its research into data mining.
Data mining refers to techniques for finding useful knowledge in a vast sea of information. Due to a recent partnership between IBM and MUM, university students have free access to IBM software to help them crunch huge data sets. They’re using those tools from IBM in a course titled “Business Intelligence and Data Mining,” taught by MUM professor Anil Maheshwari.
The IBM Academic Initiative offers participating schools course materials, training and curriculum development to 6,000 universities and 30,000 faculty around the world.
Maheshwari said modern computers have taken number crunching to new heights. They allow programmers to find correlations between sometimes seemingly unrelated variables. This kind of computing power is valuable for businesses because it allows them to fine-tune their advertisements.
Businesses collect reams of information about the demographics of their customers. Data mining allows them to sort through this information to find out who buys the product, such as whether the customers are mostly male or female, young or old, single or married, etc. Learning which variables are important and which are not is key to a successful marketing campaign.
“Data mining is just like mining into diamond,” he said. “You need a lot of skill and tools but also an artistic edge of identifying the diamond.”
Maheshwari said data mining holds the promise of being able to answer questions the way contestants do on a game show such as Jeopardy! He even mentioned a computer called “Watson” that has competed on the show. The computer is fed the question and then generates an answer based partly on how the words in the question correspond to encyclopedia articles in its database. Data mining power has reached a point where Watson’s sophisticated algorithms can arrive at the correct answer even when the question employs puns.
A computer that can answer questions after searching through a database would be useful to doctors who are trying to predict whether a symptom in a patient is likely to lead to a malignant or a benign tumor. Data mining computers could search through thousands of cases to find which variables, in this case, predicted malignant tumors and which predicted benign tumors.
Such technology could be applied in other realms, too, such as finding out which students were likely to drop out of school based on data about previous dropouts.
Maheshwari said collecting large amounts of data is easier than most people think considering so much of it is publically available on the Internet. He said the government gathers massive amounts of data for everything under the sun. Accessing the data is not the tricky part – knowing how to separate the wheat from the chaff is. Actually, Maheshwari said the analogy he prefers is finding a needle in a haystack, because the vast majority of data in a database is useless in answering the researcher’s question.
In response to the growing need for experts in information technology such as data mining, MUM has introduced an online graduate certificate program in Management Information Systems. The program can be completed entirely online in one to two years.