Wednesday, March 12, 2014



What exactly is big data capable of? The following is a well thought out article outlining the basics of big data along with the reasoning as to why its more tangible than you may think. 

Cheers!


~The TPUServices, LLC Team



10 big myths about Big Data
By Maria Korolov, Network World

Network World - Big Data has dominated tech news of late. It has been touted as a possible solution for everything from intrusion detection to fraud prevention to curing cancer and setting optimal product prices.

But Big Data, which we’re defining as data collected in large volumes, at high velocity and in a variety of formats, isn't a cure-all for every problem. In fact, if companies that believe in some of the myths surrounding Big Data, could head off in the wrong direction, waste a lot of time and money, cost a company its competitive position in the market, or damage a company's reputation.

Here are some of the biggest myths surrounding Big Data.

MYTH 1: Only data scientists can deal with Big Data

In fact, data scientists by themselves are not enough.

“Data scientists by themselves aren't going to be able to pull off getting the information out of Big Data if you don't know what you're looking for in the first place,” says Pat Farrell, senior director of data analytics at Penn Medicine. “You need people who are familiar with the industry, the domain of knowledge, understand what kind of questions are out there, what insight would be valuable to your particular industry.”

Penn Medicine, for example, includes both a health system and a school of medicine. For a long time, the health system has been collecting clinical data in a data warehouse. Meanwhile, in the school of medicine, new technology is allowing for the sequencing of human genomes, which entails a huge amount of data.
  
“We know there's value in there somewhere, and we finally have the computing power to access it,” says Farrell. Combining data analytics with expertise in medicine opens up a brand new field of predictive healthcare, he says.

MYTH 2: The bigger the data, the bigger the value

It takes time and resources to collect data, house it, and catalog it, says Farrell. Indiscriminately collecting large masses of data can divert those resources from more worthy projects.

Farrell recommends that companies have a clear idea of the specific metric or key performance indicator that they're looking for before they start collecting data.

“You want to get to the point where you have a handful of nuggets of wisdom that are valuable to you,” he says. “The data by itself, sitting there, is not enough.”

MYTH 3: Big Data is for big companies

Large companies may have more internal sources of data, but even small firms can take advantage of data coming in from social media platforms, government agencies, and data vendors.

“Regardless of the size of your organization, it’s better to make decisions based on data than to simply rely on intuition or gut feelings,” says Darin Bartik, executive director of product management for Dell Software’s Information Management Solutions.

Smaller companies may make data-driven decisions less often than their bigger counterparts, he says, but, when they do, they can make course corrections faster.

“Smaller companies can use best practices to be more data-driven and actually outpace or outmaneuver bigger, slower competitors,” he says.

MYTH 4: Collect it now, sort it out later

Storage is getting cheaper all the time, but it's not free. However, for many companies, the appetite for data is expanding faster than storage costs are decreasing, says Brad Peters, CEO of San Francisco-based Birst, a cloud-based business intelligence vendor.

Companies think that if they just collect the data, they'll figure out what to do with it later, he says. “I see a number of large corporations collecting boatloads of stuff, their expense on it goes up, and they don't get any value out of it.”

In fact, with some data sets, the law of diminishing returns starts to apply. Say, for example, you're polling people to predict an election. You need a certain number of people to get a representative sample. But after a point, adding more people won't significantly affect the margin of error.

“Do you store a bunch of data you may need, that might give you a couple more digits of precision?” he asks. “Or do you buy more people power? Do you secure your networks better? We're not going too fast as an economy, and budgets aren't increasing.”

And it's not just storage costs, says Dean Gonsowski, global head of information governance and big data management at San Francisco-based Recommind, which specializes in unstructured data analytics.

For example, it may cost the company if the data gets out, he says. And having data sitting around in warehouses means that it's subject to e-discovery arising from court cases.

Finally, the more data, the longer it takes to sort through it. “When the repositories get into the billions of records, searches take hours or weeks,” he says. “The volume of information really start clogging systems that were never built to handle those volumes.”

MYTH 5: All data is created equal

The state of Virginia has been collecting data on student enrollments, financial aid, and degree awards for the past 20 years. But that doesn't mean that the data collected 20 years ago and stored in the same data field is necessarily the same data.

“The biggest problem I deal with, is that just because it's in the data dictionary, researchers think it's fair game,” says Tod Massa, the policy research and data warehousing director for Virginia's State Council of Higher Education. “For example, data on student test scores on the ACT and SAT were initially only collected on in-state students, then there was a gap, then it was collected on both in-state and out-of-state students.” Similarly, race and ethnicity is tracked differently at the K-12 level and in higher education.

In fact, any particular data point might be reported differently by different institutions, or at different points in time, or by different people at those institutions. “If you're in an isolated shop or enterprise that is solely responsible for the data it collects, then you might have a different situation,” he says. “But then even, I suspect that the meanings of data change over time.”

As a result, analysts need to have not just statistical skills, but also local knowledge of the data and knowledge of trends in the industry as a whole, such as SAT and ACT scores being re-calibrated.

“You can't program all those things into a data repository,” he says.

The same applies to external data sources, he adds. “Data collections at the federal level have changed dramatically over the past 50 years,” he says. “Understanding the culture and context of data collection is really a necessity for using the data well.”

MYTH 6: The more specific the prediction, the better

It's human nature to think that something that is more specific is more accurate. That `3:12 p.m.’ is more accurate than `sometime in the afternoon.’ That the meteorologist who predicts that it will definitely rain on Sunday morning is more accurate than the one who predicts a “fifty percent chance of showers this weekend.”

In fact, the opposite is true. In many situations, the more exact prediction is less likely to be accurate.

Say, for example, a customer buys a very specific laptop, in a very particular configuration. And the only other customer to have bought that same product in the past also bought a pair of hot pink stilettos.

“A recommendation for hot pink stilettos may be very specific, but may be too specific – and have a high margin of error,” says Jerry Jao, CEO of Retention Science, a marketing firm in Santa Monica, Calif.

“This is actually something we see pretty commonly among business and marketing managers,” he says.

MYTH 7: Big Data equals Hadoop

Hadoop, a popular open-source database for unstructured data, has been getting a lot of attention lately.

But there are other options.

“There is a whole NoSQL movement,” says Irfan Khan, general manager and senior vice president at SAP Big Data. “There is MongoDB, Cassandra – a whole rack of other technologies.”

Some of those technologies may be a better fit for a particular Big Data project than others.

In particular, Hadoop works by dividing data into chunks, and working on multiple chunks simultaneously. This approach works on many Big Data problems, but not all of them.

“While YARN and Hadoop 2 address some of this, sometimes you need to deal with things in ways that Hadoop isn't ideal for,” says Grant Ingersoll, CTO at Redwood City-based LucidWorks, a Big Data consulting firm. “People need to keep a level head and decide what is best for them, not just what is the shiny object that all the cool kids are using.”

MYTH 8: End users don't need direct access to Big Data

With Big Data moving in at a high speed, from a wide variety of sources, and in large volumes, it might seem that it is just too complicated for regular employees to deal with.

But that's not necessarily the case.

Take, for example, all the data generated by the devices in an intensive care unit. Heart rates, respiration data, EKG readings. Too often, though, the doctors and nurses can only see a patient's current readings.

“I can't look and see what it was 10 minutes ago, or draw a tend line for what it's going to be an hour from now,” says Anthony Jones, chief marketing officer of Philips Healthcare’s Patient Care & Clinical Informatics.

But being able to see the historical data for a patient can be very valuable for a medical practitioner making a decision. “The guys sticking with a core data science team, they're missing a big opportunity,” says Jones.

The problem today is getting all the different devices that generate data to talk to each other even though they weren't designed to do that, and use different platforms, operating systems and programming languages. And then once you do, to get the data in a useful form to doctors and nurses right when they need it.

MYTH 9: Big Data is for big problems

The CIO of a major bank recently gave a talk about Big Data, and was asked about end user self-service.

“And the CIO says, 'I don't believe in that,'” recalls Peters, CEO of Birst.

That's a common attitude, he says, with some executives thinking that Big Data only answers certain types of questions. The attitude can be summed up this way: “The goal of Big Data for us is to solve very few, very high-value problems with a core set of data scientists. We don't want data chaos where normal people have access to this information, because I don't think they need it.”

Peters disagrees with this approach, but says it's common in many industries. “It's a rampant myth inside large insurance companies that business users aren't smart enough to handle it.”

MYTH 10: The Big Data bubble will eventually burst

Hype cycles may come and go, but transformative technological changes stick around. The dot-com crash did not signal the end of the Internet.

Even when the hype dies down, companies will still have Big Data to deal with. In fact, they will have more Big Data to deal with than they ever expected, due to exponential growth – IDC projects that total amount of data collected will double every two years through 2020.

And it's not just that companies are simply collecting more of the stuff that they currently collect. Instead, new types of data are likely to appear, requiring massive amounts of storage.

“We will get to the point where everyone who gets admitted to a hospital, the hospital maps their genome,” says Anthony Jones, chief marketing officer of Philips Healthcare’s Patient Care & Clinical Informatics. “This allows treatment to be customized to the patient. And when you talk about Big Data, that's a massive amount of data. I don't think a lot of CIOs really appreciate how much harder things are going to get.”

By thinking of “Big Data” as just a phase, companies can miss opportunities to capture data elements that could have an impact on their business down the line, says Bryan Hill, CTO of Cadient Group, an interactive marketing agency in King of Prussia, Pa.

“The term 'Big Data' is likely to change, just like cloud computing came up, which is no different than the Web was, or the Internet,” he says. “The term may change, but the spirit of Big Data is here to stay.”