Business Intelligence Acquired by Data Warehouse Testing
I have written a program that can look at specific company details of publicly traded stocks and predict with 83% accuracy (independently proven) the direction they will move in the stock market over the next year, and can offer it to you for a price. You see, the same data crunching skills behind something like Moneyball might as well be put to an effort beyond just helping your fantasy sports team perform well. Got your attention? Good. Because the idea of statistics or analysis working to predict future outcomes, based on a business intelligence approach to Big Data is sound, even though the first sentence of this paragraph is a total fabrication.
The important thing to know about Big Data is not how big it is, or how many data fields each record has. The important thing to know is that useful information can be squeezed from the raw data, making the Big Data investment worthwhile. In other words, if the raw data is a bucket of lemonades, you can turn it into lemonade. The data can be sold to the public, or used to improve and optimize the way you do business. The loss of a few records is no great tragedy (well, unless this is really a database where each record strongly impacts individual people’s lives or need to be auditable for compliance reasons). The business intelligence behind data warehousing concerns itself in trends.
We read about many of these trends and know they have great impact on specific industry sectors. If I mention names like A.C. Nielsen or Billboard or BoxOfficeMojo, you immediately think about lists of what’s popular and what isn’t, compiled from individual sales, airplay, or viewer watching records. YouTube, Facebook, Twitter, Pinterest and Instagram track a combo of likes, followers, shares, views, favorites, tweets, retweets, rePins, and subscribers. The actual individuals in the list are ignored, while the important aggregate statistic, in this case the total count, is pulled from the raw data.
The business world is full of headline articles boasting of trends based on sales figures, or surveys of attitudes. These are all based on aggregating data and then pulling a few critical numbers from the assemblage, and presenting those bits with a giant “Ta-da!” Also, the news now finds “things going viral” to be newsworthy, artificially inflating the trending power anything from Chewbacca mom, that a Kardashian has done
But the exciting part involves expanding beyond just counting. The exciting part is detecting a change, perhaps a tipping point, and being able to determine what it means, and how to respond to it. If you are wondering about the rise and fall of what people are Googling, you can go straight to Google Trends. If you want to market your wares based on how people are Googling, you can go to Google AdWords to see statistics on specific keywords, then place ads based on the keywords you want. Yahoo has also joined the multi-petabytes of data tracking of user habits. The news has gotten excited lately looking at tweet counts and tweet content and treating this as data warehouse input waiting to be analyzed, as a way of predicted human sentiment.
If you want to shop in what feels like a warehouse that tracks its inventory supply by data warehouse and implements all kinds of complex strategies to optimize sales based on data warehouse analytics, shop at Walmart.
Of course, if you want to get extreme, you need to talk about technology limits. The Guinness Book lists the largest data warehouse as being in Santa Clara, California, with what should at press time be a little over 140 petabytes (a petabytes is a million gigabytes), which is about the same amount as at CERN, home of the Large Hadron Collider. This, however, is potentially dwarfed by the NSA’s Utah Data Center, with a Qualitest warehouse testing that limit by at least 2000%, representing the world’s largest data mining effort, in the name of surveillance of a wide variety of personal data digital trails. As much as I’d like to speculate on the business intelligence use of what is probably the world’s largest data warehouse as the logical extreme of this article, I’m not sure I want to do anything to increase the likelihood of my being looked up in it by my government. So let’s just say that I bet it can do a lot of really neat stuff using a lot of smart and powerful analytics on what I’ll call a data ocean of information, and a worthwhile use of money.