3.2.4. Database Management System#
Real-world datasets are rarely represented as a single flat table. Instead, they are designed to support specific application requirements such as efficient query execution, optimized read and write performance, indexing, concurrency, and scalability through techniques like partitioning and sharding.
From an analysis perspective, the first step is to identify and retrieve all relevant data components. This typically involves exploring multiple SQL tables or collections in NoSQL systems, understanding how data is organized, and analyzing relationships between entities such as one-to-one, one-to-many, and many-to-many relationships.
Once the schema and relationships are understood, we can write queries to extract the required data. This often involves:
Joins (see SQL Joins)
Aggregate functions such as
COUNT,SUM, andAVGSubqueries and common table expressions (CTEs)
Query optimization techniques to reduce execution time and resource usage
3.2.4.1. Need for DBMS#
As data size and complexity grow, simple storage techniques such as CSV files quickly become impractical. Large files require significant memory and compute resources just to open, and performing even basic operations becomes slow and error-prone. Additionally, flat files provide no built-in support for indexing, concurrency control, or data integrity.
As datasets become more complex, storing thousands of attributes in a single dataframe or file is neither intuitive nor efficient. This approach often leads to high data redundancy, inconsistencies, and difficulties in maintaining and updating the data.
A Database Management System (DBMS) addresses these challenges by providing:
Structured data organization through schemas, tables, and relationships
Data normalization to reduce redundancy and improve consistency
Efficient querying using optimized query planners and indexes
Concurrency control to support multiple users accessing and modifying data simultaneously
Transaction management with ACID guarantees to ensure data correctness
Security and access control through roles, permissions, and authentication mechanisms
Modern DBMSs also provide rich ecosystems of tools and extensions for analytics, monitoring, backups, and visualization. These features make DBMSs a foundational component for building reliable, scalable, and maintainable data-driven systems, especially when working with large and complex datasets.