What exactly is big data capable of? The following is a well thought out article outlining the basics of big data along with the reasoning as to why its more tangible than you may think.
Cheers!
~The TPUServices™, LLC Team
10 big myths about Big Data
By Maria Korolov, Network World
Network World - Big Data has dominated tech news of late. It
has been touted as a possible solution for everything from intrusion detection
to fraud prevention to curing cancer and setting optimal product prices.
But Big Data, which we’re defining as data collected in
large volumes, at high velocity and in a variety of formats, isn't a cure-all
for every problem. In fact, if companies that believe in some of the myths
surrounding Big Data, could head off in the wrong direction, waste a lot of
time and money, cost a company its competitive position in the market, or
damage a company's reputation.
Here are some of the biggest myths surrounding Big Data.
MYTH 1: Only data scientists can deal with Big Data
In fact, data scientists by themselves are not enough.
Penn Medicine, for example, includes both a health system
and a school of medicine. For a long time, the health system has been
collecting clinical data in a data warehouse. Meanwhile, in the school of
medicine, new technology is allowing for the sequencing of human genomes, which
entails a huge amount of data.
“We know there's value in there somewhere, and we finally
have the computing power to access it,” says Farrell. Combining data analytics
with expertise in medicine opens up a brand new field of predictive healthcare,
he says.
MYTH 2: The bigger the data, the bigger the value
It takes time and resources to collect data, house it, and
catalog it, says Farrell. Indiscriminately collecting large masses of data can
divert those resources from more worthy projects.
Farrell recommends that companies have a clear idea of the
specific metric or key performance indicator that they're looking for before
they start collecting data.
“You want to get to the point where you have a handful of
nuggets of wisdom that are valuable to you,” he says. “The data by itself,
sitting there, is not enough.”
MYTH 3: Big Data is for big companies
Large companies may have more internal sources of data, but
even small firms can take advantage of data coming in from social media
platforms, government agencies, and data vendors.
“Regardless of the size of your organization, it’s better to
make decisions based on data than to simply rely on intuition or gut feelings,”
says Darin Bartik, executive director of product management for Dell Software’s
Information Management Solutions.
Smaller companies may make data-driven decisions less often
than their bigger counterparts, he says, but, when they do, they can make
course corrections faster.
“Smaller companies can use best practices to be more
data-driven and actually outpace or outmaneuver bigger, slower competitors,” he
says.
MYTH 4: Collect it now, sort it out later
Storage is getting cheaper all the time, but it's not free.
However, for many companies, the appetite for data is expanding faster than
storage costs are decreasing, says Brad Peters, CEO of San Francisco-based
Birst, a cloud-based business intelligence vendor.
Companies think that if they just collect the data, they'll
figure out what to do with it later, he says. “I see a number of large
corporations collecting boatloads of stuff, their expense on it goes up, and
they don't get any value out of it.”
In fact, with some data sets, the law of diminishing returns
starts to apply. Say, for example, you're polling people to predict an
election. You need a certain number of people to get a representative sample.
But after a point, adding more people won't significantly affect the margin of
error.
“Do you store a bunch of data you may need, that might give
you a couple more digits of precision?” he asks. “Or do you buy more people
power? Do you secure your networks better? We're not going too fast as an
economy, and budgets aren't increasing.”
And it's not just storage costs, says Dean Gonsowski, global
head of information governance and big data management at San Francisco-based
Recommind, which specializes in unstructured data analytics.
For example, it may cost the company if the data gets out,
he says. And having data sitting around in warehouses means that it's subject
to e-discovery arising from court cases.
Finally, the more data, the longer it takes to sort through
it. “When the repositories get into the billions of records, searches take
hours or weeks,” he says. “The volume of information really start clogging
systems that were never built to handle those volumes.”
MYTH 5: All data is created equal
The state of Virginia has been collecting data on student
enrollments, financial aid, and degree awards for the past 20 years. But that
doesn't mean that the data collected 20 years ago and stored in the same data
field is necessarily the same data.
“The biggest problem I deal with, is that just because it's
in the data dictionary, researchers think it's fair game,” says Tod Massa, the
policy research and data warehousing director for Virginia's State Council of
Higher Education. “For example, data on student test scores on the ACT and SAT
were initially only collected on in-state students, then there was a gap, then
it was collected on both in-state and out-of-state students.” Similarly, race
and ethnicity is tracked differently at the K-12 level and in higher education.
In fact, any particular data point might be reported
differently by different institutions, or at different points in time, or by
different people at those institutions. “If you're in an isolated shop or
enterprise that is solely responsible for the data it collects, then you might
have a different situation,” he says. “But then even, I suspect that the
meanings of data change over time.”
As a result, analysts need to have not just statistical
skills, but also local knowledge of the data and knowledge of trends in the
industry as a whole, such as SAT and ACT scores being re-calibrated.
“You can't program all those things into a data repository,”
he says.
The same applies to external data sources, he adds. “Data
collections at the federal level have changed dramatically over the past 50
years,” he says. “Understanding the culture and context of data collection is
really a necessity for using the data well.”
MYTH 6: The more specific the prediction, the better
It's human nature to think that something that is more
specific is more accurate. That `3:12 p.m.’ is more accurate than `sometime in
the afternoon.’ That the meteorologist who predicts that it will definitely
rain on Sunday morning is more accurate than the one who predicts a “fifty
percent chance of showers this weekend.”
In fact, the opposite is true. In many situations, the more
exact prediction is less likely to be accurate.
Say, for example, a customer buys a very specific laptop, in
a very particular configuration. And the only other customer to have bought
that same product in the past also bought a pair of hot pink stilettos.
“A recommendation for hot pink stilettos may be very
specific, but may be too specific – and have a high margin of error,” says
Jerry Jao, CEO of Retention Science, a marketing firm in Santa Monica, Calif.
“This is actually something we see pretty commonly among
business and marketing managers,” he says.
MYTH 7: Big Data equals Hadoop
Hadoop, a popular open-source database for unstructured
data, has been getting a lot of attention lately.
But there are other options.
“There is a whole NoSQL movement,” says Irfan Khan, general
manager and senior vice president at SAP Big Data. “There is MongoDB, Cassandra
– a whole rack of other technologies.”
Some of those technologies may be a better fit for a
particular Big Data project than others.
In particular, Hadoop works by dividing data into chunks,
and working on multiple chunks simultaneously. This approach works on many Big
Data problems, but not all of them.
“While YARN and Hadoop 2 address some of this, sometimes you
need to deal with things in ways that Hadoop isn't ideal for,” says Grant
Ingersoll, CTO at Redwood City-based LucidWorks, a Big Data consulting firm.
“People need to keep a level head and decide what is best for them, not just
what is the shiny object that all the cool kids are using.”
MYTH 8: End users don't need direct access to Big Data
With Big Data moving in at a high speed, from a wide variety
of sources, and in large volumes, it might seem that it is just too complicated
for regular employees to deal with.
But that's not necessarily the case.
Take, for example, all the data generated by the devices in
an intensive care unit. Heart rates, respiration data, EKG readings. Too often,
though, the doctors and nurses can only see a patient's current readings.
“I can't look and see what it was 10 minutes ago, or draw a
tend line for what it's going to be an hour from now,” says Anthony Jones,
chief marketing officer of Philips Healthcare’s Patient Care & Clinical
Informatics.
But being able to see the historical data for a patient can
be very valuable for a medical practitioner making a decision. “The guys
sticking with a core data science team, they're missing a big opportunity,” says
Jones.
The problem today is getting all the different devices that
generate data to talk to each other even though they weren't designed to do
that, and use different platforms, operating systems and programming languages.
And then once you do, to get the data in a useful form to doctors and nurses
right when they need it.
MYTH 9: Big Data is for big problems
The CIO of a major bank recently gave a talk about Big Data,
and was asked about end user self-service.
“And the CIO says, 'I don't believe in that,'” recalls
Peters, CEO of Birst.
That's a common attitude, he says, with some executives
thinking that Big Data only answers certain types of questions. The attitude
can be summed up this way: “The goal of Big Data for us is to solve very few,
very high-value problems with a core set of data scientists. We don't want data
chaos where normal people have access to this information, because I don't
think they need it.”
Peters disagrees with this approach, but says it's common in
many industries. “It's a rampant myth inside large insurance companies that
business users aren't smart enough to handle it.”
MYTH 10: The Big Data bubble will eventually burst
Hype cycles may come and go, but transformative
technological changes stick around. The dot-com crash did not signal the end of
the Internet.
Even when the hype dies down, companies will still have Big
Data to deal with. In fact, they will have more Big Data to deal with than they
ever expected, due to exponential growth – IDC projects that total amount of
data collected will double every two years through 2020.
And it's not just that companies are simply collecting more
of the stuff that they currently collect. Instead, new types of data are likely
to appear, requiring massive amounts of storage.
“We will get to the point where everyone who gets admitted
to a hospital, the hospital maps their genome,” says Anthony Jones, chief
marketing officer of Philips Healthcare’s Patient Care & Clinical
Informatics. “This allows treatment to be customized to the patient. And when
you talk about Big Data, that's a massive amount of data. I don't think a lot
of CIOs really appreciate how much harder things are going to get.”
By thinking of “Big Data” as just a phase, companies can
miss opportunities to capture data elements that could have an impact on their
business down the line, says Bryan Hill, CTO of Cadient Group, an interactive
marketing agency in King of Prussia, Pa.
“The term 'Big Data' is likely to change, just like cloud
computing came up, which is no different than the Web was, or the Internet,” he
says. “The term may change, but the spirit of Big Data is here to stay.”