What Exactly is Big Data, Anyway?

Share This

By: Ryan Wilkinson, Account Coordinator

There’s no denying that Big Data one of the pivotal forces of the modern age; you can’t throw a rock in San Francisco without hitting a company that claims to be the next big thing in collecting, analyzing or organizing it, and no enterprise wants to make a decision unless it’s “data-driven.” But between all the visualization software, actionable business intelligence insights, and other techno-babbly buzzwords, some of us are left wondering: what is big data, anyway?

Unfortunately, not everyone agrees on the definition. Some people will tell you there is a threshold amount of data a company must hold to be considered “big” (although most of those people can’t agree on exactly where that threshold is); others will say that it’s anything that requires advanced analytics software to use.

SAS, one of the biggest players in the analytics software game, says it isn’t about how much data there is, but what a company does with it. They also provide one of the most straightforward definitions around, even if it doesn’t include any hard numbers: “Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.” They go on to explain that big data “can be analyzed for insights that lead to better decisions and strategic business moves.”

That helps some, but some context and statistics might give us more of a solid outline. According to research published by IDC in 2014, the total amount of data on the internet, or the “digital universe,” doubles every year. At the time of publication the sum of it all came to about 4.4 zettabytes, or 4.4 trillion gigabytes (that’s 4,400,000,000,000 if you want to see all the zeros). The researchers predicted that by 2020, that amount will multiply ten-fold to reach 44 zettabytes. That’s a lot of data; you’d need almost 700 billion iPhone X basic models to hold it all. It’s definitely big. But the big data used by enterprises doesn’t include all of it, so we’re going to need to get more specific.

According to the same IDC report, by 2020 approximately 37% of that data would be useful if analyzed. So much of the data floating around in the world is uncharacterized and untagged, which means that we don’t really know what it is, making it effectively useless. That 37% prediction, or just over 16 trillion gigabytes, is where analytics companies find their value. That’s certainly still a big number, but a little easier to work with.

To further break that amount down, experts in the field use what are called the Four Vs of Big Data: Volume, or the sheer amount of data in a set; Variety, or the different types of data in the set; Velocity, or the rate at which data is generated and processed; and Veracity, or how certain and complete the data is. These four factors come together to determine how useful a data set is to the business that owns it. In layman’s terms, it needs to be a lot of data of different kinds, available as soon as possible after it’s made, with enough information and context to be sorted and used to inform decisions. We still have some buzzwords, but much like data after the fourth V, they’re starting to make a little sense now with context!

That’s all well and good, but it leaves us with one big question: when a business has all this data gathered, analyzed and sorted, what do they do with it? Sure, informed insight and data-driven decisions sound nice, but what do they actually mean?

It’s actually fairly straightforward. Data scientists, with the help of powerful computers, pore over the information provided to help make decisions. For governments, that can mean looking at traffic or crime data to find patterns and improve the systems in place. For healthcare, it can help to determine which treatments are effective or fix problems with their payments systems. There are a million ways to use it in just about any field you can think of, and that’s with modern technology in a world where tech advances so quickly, our phones are outdated in a matter of months!

In short, “big” data is a pretty major understatement, and it has practically unlimited applications, because in every industry there are lots of decisions to be made at all levels, and all of those decisions will be better if the person making them has access to as much information as possible. When you put it that way, it’s a lot easier to see why the concept has taken tech by storm over the last couple of decades!