CS702(D) Big Data | RGPV CSE 7th Semester | Introduction, Characteristics, Types, Challenges & Analytics
Big Data refers to extremely large and complex data sets that cannot be easily stored, processed or analysed using traditional database systems. Big Data is generated from social media, sensors, mobile devices, online transactions, websites, machines and many other digital sources.
Volume means huge amount of data. Big Data deals with TB, PB or even EB level data.
Velocity means speed at which data is generated and processed.
Variety means different types of data such as text, images, videos, audio, logs and sensor data.
Veracity means accuracy and reliability of data. Big Data may contain noisy or incomplete data.
Value means useful information extracted from Big Data for decision making.
| Type | Meaning | Example |
|---|---|---|
| Structured Data | Data stored in fixed format like rows and columns. | Database tables, Excel sheets |
| Unstructured Data | Data without fixed format. | Images, videos, audio, social media posts |
| Semi-Structured Data | Data having some structure but not like relational tables. | XML, JSON, log files |
| Traditional Data | Big Data |
|---|---|
| Small or medium size data | Very large size data |
| Mostly structured data | Structured, semi-structured and unstructured data |
| Processed using RDBMS | Processed using Hadoop, Spark, NoSQL etc. |
| Limited scalability | Highly scalable |
| Slower for huge data | Fast processing using distributed systems |
| Technology | Use |
|---|---|
| Hadoop | Distributed storage and processing |
| HDFS | Stores large data across multiple machines |
| MapReduce | Processes Big Data in parallel |
| Spark | Fast data processing engine |
| Hive | SQL-like query processing on Hadoop |
| Pig | Data flow language for large data processing |
| NoSQL | Stores unstructured and semi-structured data |
| MongoDB | Document-based NoSQL database |
Big Data infrastructure includes hardware, software, storage systems, processing tools and networking resources required to store and analyse huge data.
Data Analytics is the process of examining data to find useful patterns, trends and insights. It helps organizations make better decisions.