Big Data Problems and Solution, Why Big Data is Gaining Hype in the Market?

Rohit Raut
9 min readSep 17, 2020

--

As you can see from the above image, huge data is generated per day in multi-terabyte quantities. It changes fast and comes in a variety of forms that are difficult to manage and process using RDBMS or other traditional technologies. Big Data solutions provide the tools and technologies that are used to capture, store, search & analyze the data in seconds to find relationships and insights for innovation and competitive gain that were previously unavailable. 80% of the data getting generated today is unstructured and cannot be handled by our traditional technologies.

What is BigData?

Big Data is not some kind of technology or product, but it is a problem faced by many industries day by day, problems like storage, velocity, and cost, etc.

5v’s of Big Data

  1. Volume: It defines the data points which are being generated from various sources in huge volumes and are in the very huge form i.e., of Exabyte and Zettabytes. If we talk about the past couple of decades, various big firms’ collected & stored data related to employees only. But now, these big firms, apart from collecting all the data of their employees, are also collecting the details of their clients, partners, products & services in which they’re dealing upon, and all this leads to the extension of more and more data. If we calculate the amount of data, which is being generated from the beginning of the time until 2003, is equivalent to data which is currently being generated every 2 days. So, that’s volume.
  2. Variety: There are mainly three types of data we consider i.e., structured data, unstructured data and semi-structured data. Out of all these, we’re very much familiar with structured data which is in the form of pure text (person’s name) or in numeric (their age) which is stored in external databases. But, the rest of the two types are new in big data. Unstructured data is in the form of PDF files, video files, audio files, images, tweets, likes, comments, etc. Semi-structured data is in the form of XML files, JSON files, emails, JavaScript files, server log files, sensor data, etc. These are the varieties of data that we’re generating from various sources like mobile devices, satellites, social media networks, IT & Non-IT organizations, etc.
  3. Velocity: If we’re dealing with a huge volume of different types of data, generated from various sources, then the data has to be processed fast which we call Analysis of streaming data. In other words, big data velocity deals with the speed at which data travels from various sources like machines, business processes, networks, mobile devices, social media sites, etc. And, the flow of data from these sources is gigantic and constant, which needs to be stored and processed quickly, and this is not possible with traditional data processing applications.
  4. Veracity: The data points which have been collected & stored from various sources, in different forms, often deal with inaccuracy. Under this, we’ve to deal with poor quality of data, also in huge volumes (say for example Twitter posts with hashtags, typos, abbreviations and colloquial speech) which is not precise and uncertain. But, big data and analytics technology allows us to work with these types of data.
  5. Value: Whether the data is big or little, no matter generated from anywhere in whatever format, should have some value — means we can properly utilize the data at its right cause for its validness. The significance, worth, or functionality of the data to those consuming it is presumably the most pertinent to various firms or organizations. As, we’re aware that data in itself has no importance or utility, but still we need valuable data to get the information.

How Much Data Do We Create Every Day as of 2020?

Source -Internet

With so much information at our fingertips, we’re adding to the data stockpile every time we turn to our search engines for answers.

  • 1.7MB of data will be created every second for every person on earth.
  • More than 7 billion humans use the internet.
  • On average, Google now processes more than 40,000 searches every second (3.5 billion searches per day)!

Social Media

  • Snapshot: there are 229 million daily active Snapchat users worldwide.
  • YouTube: 300 hours of video are uploaded to YouTube every minute! Almost 5 billion videos are watched on Youtube every single day.
  • Twitter: Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, 500 million tweets per day, and around 200 billion tweets per year.
  • Instagram: 95 million photos and videos are shared on Instagram per day.
  • Facebook: System processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half-hour.

Communication

  • We send 16 million text messages
  • There are 990,000 Tinder swipes
  • 156 million emails are sent; worldwide it is expected that there will be9 billion email users by 2019.
  • 15,000 GIFs are sent via Facebook messenger
  • Every minute there are103,447,520 sent
  • There are 154,200 calls on Skype

The Amount of Data Created Each Day on the Internet in 2019

In 2014, there were 2.4 billion internet users. That number grew to 3.4 billion by 2016, and in 2017 300 million internet users were added. As of June 2019, there are now over 4.4billion internet users. This is an 83% increase in the number of people using the internet in just five years!

Each minute of every day the following happens on the internet:

  • Social Media is HUGE — Reports show that almost 300 million new social media users each year. That is 550 new social media users each minute.
  • Since 2013, the number of Tweets each minute has increased by 58% to more than $7474,000 Tweets PER minute in 2019
  • Youtube usage more than tripled from 2014–2016 with users uploading 400 hours of new video each minute of every day! Now, in 2019, users are watching 4,333,560 videos every minute.
  • 300 hours of video are uploaded to YouTube every minute!
  • Instagram users upload over 100 million photos and videos every day and 69,444 million posts every minute.
  • Since 2013, the number of Facebook Posts shared each minute has increased by 22%, from 2.5 Million to 3 million posts per minute in 2016. This number has increased more than 300 percent, from around 650,000 posts per minute in 2011!
  • Every minute on Facebook: 510,000 comments are posted, 293,000 statuses are updated, and 136,000 photos are uploaded.
  • There are over 38,000 status updates on Facebook every minute.
  • Facebook users also click the like button on more than 4 million posts every minute and the like button pressed 13 trillion times.
  • Over 3.5 Billion Google searches are conducted worldwide each minute of every day. That is 2 trillion searches per year worldwide. That is over 40,000 search queries per second!
  • Worldwide over 100 million messages per minute via SMS and in-app messages.
  • 26 billion texts were sent each day by 27 million people in the US. That is 94 text per day per person in the US in 2017.

If we do some quick calculations, we can see the amount of data created on the internet each day. There are 1440 minutes per day…so that means there are approximate:

  • 1,209,600 new data producing social media users each day.
  • 682 million tweets per day!
  • More than 4 million hours of content uploaded to Youtube every day, with users watching 5.97 billion hours of Youtube videos each day.
  • 67,305,600 Instagram posts uploaded each day
  • There are over 2 billion monthly active Facebook users, compared to 1.44 billion at the start of 2015 and 1.65 at the start of 2016.
  • Facebook has 1.58 billion daily active users on average as of Q2 2019
  • 4.3 BILLION Facebook messages posted daily!
  • 5.76 BILLION Facebook likes every day.

Email Use continues to Rising

With this increase in social media use, it doesn't seem like the email is going away anytime soon! Email use continues to grow. The Email Statistics Report 2019–2023 by the Radicati Group confirms this — 293 billion emails are sent daily in 2019, and this is expected to grow by 4.2% yearly to 347 billion in 2023. According to the same report, there are 3.9 billion email users in 2019 and will increase to 4.4 billion by the end of 2023.

Mobile Device Data

  • The amount of mobile data is also blowing up-at the start of 2014, mobile phones/tablets uploaded and downloaded around 2 exabytes (1 exabyte = 1 billion gigabytes) of data. At the start of 2017, data created on mobile devices quadrupled to over 8 exabytes.
  • At the start of 2017, there were 394 billion mobile internet users. There are now over 5 billion mobile device users in 2019. A 67% penetration of the entire global population.
  • Approximately 21.9 billion text messages are sent each day in 2017, compared to 7 billion in 2016 — a 17% increase in just one year.

Data created by the Internet of Things (IoT)

Devices are a huge source of the 2.5 quintillion bytes of data we create every day — not just mobile devices, but Smart TV’s, cars, airplanes, you name it — the internet of things is producing an increasing amount of data.

  • IDC forecasts a 3 percent growth in wearable devices from 2018–2019. There were 28.3 million wearable devices sold in 2016 and estimates that 198 million will be sold in 2019. A 600% increase in just 3 years.
  • Smartwatches were 44.2% of the wearables market in 2019, and that is anticipated to increase to 47% by 20233
  • Between 2016 and 2022, IoT devices are expected to increase at a rate of 21 percent, driven by new use cases. In 2018, mobile phones are expected to be surpassed in numbers by IoT devices, which include connected cars, machines, meters, wearables, and other consumer electronics.
  • Pratt & Whitney’s Geared Turbo Fan (GTF) engine is fitted with 5,000 sensors and can generate up to 10GB of data each second
  • Uber is releasing 6 years of transportation data to cities to help them plan public transit
  • Business Insider predicts that by 2020 75 percent of cars will come with built-in IoT connectivity.

Growth in Data Generating Services

  • Amazon is dominating the marketplace — Amazon processes $373 MILLION in sales every day in 2017, compared to about 120 million amazon sales in 2014
  • Each month more than 206 million people around the world get on their devices and visit Amazon.com.
  • By the end of 2016, Uber had 40 million monthly active users. In 2019 there were 75 million Uber passengers, who are served by a total of 3.9 million drivers
  • 14 million Uber trips are completed each day
  • Venmo has 40 million users and processes $68,493 in transactions EVERY minute.

One of the solutions to this problem is a technology called Distributed Storage.

Distributed Storage

In distributed Storage is also represented by the master-slave model. In this master node splits or strips data into multiple blocks and each block store on different servers in parallel. In BigData, we deal with multiple clusters (computers) often. One of the main advantages of Big Data is that it goes beyond the capabilities of one single super powerful server with extremely high computing power. The whole idea of Big Data is to distribute data across multiple clusters and to make use of the computing power of each cluster (node) to process information.

The framework that helps to build a Cluster is Hadoop.

What is Hadoop?

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Thanks for reading!!

--

--

No responses yet