How MNC's Are Dealing with Big Data
Have you ever seen one of the videos on Facebook that shows a “flashback” of posts, likes, or images—like the ones you might see on your birthday or on the anniversary of becoming friends with someone? If so, you have seen examples of how Facebook uses Big Data.
A report from McKinsey & Co. stated that by 2009, companies with more than 1,000 employees already had more than 200 terabytes of data of their customer’s lives stored. Consider adding that startling amount of stored data to the rapid growth of data provided to social media platforms since then. There are trillions of tweets, billions of Facebook likes, and other social media sites like Snapchat, Instagram, and Pinterest are only adding to this social media data deluge.
Here we are taking facebook as a case for big data.
Facebook user and demographics statistics
- There are 2.375bn billion monthly active users (as of Q3 2018).
- Over 1bn of those are mobile-only users.
- There are 1.49 billion daily active users.
- 47% of Facebook users only access the platform through mobile.
- 83% of parents on Facebook are friends with their children.
- Facebook adds 500,000 new users every day; 6 new profiles every second.
- 68% of US adults use Facebook. 51% of them use it several times a day.
- Worldwide, 26.3% of the online population use Facebook.
- The average (mean) number of friends is 338, and the median (midpoint) number of friends is 200.
- Half of internet users who do not use Facebook themselves live with someone who does.
- Of those, 24% say that they look at posts or photos on that person’s account.
Facebook usage statistics
- 30% of internet users use Facebook more than once a day.
- 45% of people get news from Facebook.
- 40% of people said they would share their health data with Facebook.
- There are an estimated 81 million fake Facebook profiles.
- The most popular page is the Facebook’s main page with 213m likes. Samsung are second with 159m, while Cristiano Ronaldo is third with 122m.
- Facebook accounts for 62% of social logins made by consumers to sign into the apps and websites of publishers and brands.
- 200 million people use Facebook Lite – the app for the developing world’s slow connections.
- Facebook takes up 22% of the internet time Americans spend on mobile devices, compared with 11% on Google search and YouTube combined.
- Users spend an average of 20 minutes per day on the site.
- In a month, the average user likes 10 posts, makes 4 comments, and clicks on 8 ads.
- Hive is Facebook’s data warehouse, with 300 petabytes of data.
- Facebook generates 4 new petabytes of data per day.
- Facebook now sees 100 million hours of daily video watch time.
- Users generate 4 million likes every minute.
- More than 250 billion photos have been uploaded to Facebook.
- This equates to 350 million photos per day.
Social media accelerates innovation, drives cost savings, and strengthens brands through mass collaboration. Across every industry, companies are using social media platforms to market and hype up their services and products, along with monitoring what the audience is saying about their brand.
The convergence of social media and big data gives birth to a whole new level of technology.
What is big data?
Big data refers to data that would typically be too expensive to store, manage, and analyze using traditional (relational and/or monolithic) database systems. Usually, such systems are cost-inefficient because of their inflexibility for storing unstructured data (such as images, text, and video), accommodating “high-velocity” (real-time) data, or scaling to support very large (petabyte-scale) data volumes.
For this reason, the past few years has seen the mainstream adoption of new approaches to managing and processing big data, including Apache Hadoop and NoSQL database systems. However, those options often prove to be complex to deploy, manage, and use in an on-premises situation.
Apache Hadoop is the product developed to manage Big Data.
Now let’s see how Facebook is managing these data :
“Facebook runs the world’s largest Hadoop cluster” says Jay Parikh, Vice President Infrastructure Engineering, Facebook.
Basically, Facebook runs the biggest Hadoop cluster that goes beyond 4,000 machines and storing more than hundreds of millions of gigabytes.
🔺Hadoop provides a common infrastructure for Facebook with efficiency and reliability. Beginning with searching, log processing, recommendation system, and data warehousing, to video and image analysis, Hadoop is empowering this social networking platform in each and every way possible. Facebook developed its first user-facing application, Facebook Messenger, based on Hadoop database, i.e., Apache HBase.
🔺Initially when Facebook implemented Hadoop, it was not designed to run across multiple data centers. And that’s when the requirement to develop Prism was felt by the team of Facebook. Prism is a platform which brings out many namespaces instead of the single one governed by the Hadoop. This in turn helps to develop many logical clusters.
This system is now expandable to as many servers as possible without worrying about increasing the number of data centers.
🔺For all these technologies, the core concept is distributed storage … Let’s see how distributed storage works …
🔺 A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.
The Distributed storage stores the data in parallel by stripping / splitting the GB’s and GB’s of data in some species,. So that it will store the data within the seconds … Data stripping / splitting is done by master node / name node and it transfers data to all the respective Data Nodes / Slave nodes within seconds …
Thank you !!




Comments
Post a Comment