Big Data Unit 1 Notes

CS702(D) Big Data | RGPV CSE 7th Semester | Introduction, Characteristics, Types, Challenges & Analytics

Unit 1

šŸ“˜ Introduction to Big Data

Big Data refers to extremely large and complex data sets that cannot be easily stored, processed or analysed using traditional database systems. Big Data is generated from social media, sensors, mobile devices, online transactions, websites, machines and many other digital sources.

Simple Meaning: Jab data ka size bahut bada ho, speed bahut fast ho aur data different formats me ho, usse Big Data kehte hain.

⭐ Characteristics of Big Data

1. Volume

Volume means huge amount of data. Big Data deals with TB, PB or even EB level data.

2. Velocity

Velocity means speed at which data is generated and processed.

3. Variety

Variety means different types of data such as text, images, videos, audio, logs and sensor data.

4. Veracity

Veracity means accuracy and reliability of data. Big Data may contain noisy or incomplete data.

5. Value

Value means useful information extracted from Big Data for decision making.

šŸ“‚ Types of Big Data

Type Meaning Example
Structured Data Data stored in fixed format like rows and columns. Database tables, Excel sheets
Unstructured Data Data without fixed format. Images, videos, audio, social media posts
Semi-Structured Data Data having some structure but not like relational tables. XML, JSON, log files

āš–ļø Traditional Data vs Big Data

Traditional Data Big Data
Small or medium size data Very large size data
Mostly structured data Structured, semi-structured and unstructured data
Processed using RDBMS Processed using Hadoop, Spark, NoSQL etc.
Limited scalability Highly scalable
Slower for huge data Fast processing using distributed systems

šŸ“ˆ Evolution of Big Data

  1. Earlier data was stored in files and simple databases.
  2. Relational Database Management Systems were introduced for structured data.
  3. Internet, social media and mobile devices increased data generation rapidly.
  4. Traditional systems became insufficient for large and complex data.
  5. Big Data technologies like Hadoop, Spark and NoSQL were developed.

āš ļø Challenges with Big Data

šŸ› ļø Technologies Available for Big Data

Technology Use
Hadoop Distributed storage and processing
HDFS Stores large data across multiple machines
MapReduce Processes Big Data in parallel
Spark Fast data processing engine
Hive SQL-like query processing on Hadoop
Pig Data flow language for large data processing
NoSQL Stores unstructured and semi-structured data
MongoDB Document-based NoSQL database

šŸ—ļø Infrastructure for Big Data

Big Data infrastructure includes hardware, software, storage systems, processing tools and networking resources required to store and analyse huge data.

šŸ“Š Use of Data Analytics

Data Analytics is the process of examining data to find useful patterns, trends and insights. It helps organizations make better decisions.

Applications

āœ… Desired Properties of Big Data System

⭐ Important Questions

  1. Define Big Data. Explain its characteristics.
  2. Explain 5V’s of Big Data.
  3. Differentiate between Traditional Data and Big Data.
  4. Explain types of Big Data with examples.
  5. Write challenges of Big Data.
  6. Explain technologies available for Big Data.
  7. Explain infrastructure required for Big Data.
  8. Write applications of Data Analytics.
  9. Explain desired properties of Big Data system.

šŸ“ Short Revision

Big Data = Huge + Fast + Different format data. Important points: 5V’s, Types of data, Traditional vs Big Data, Challenges, Big Data tools, Infrastructure and Data Analytics.

šŸ”— Related Links