Database Management System

3.2.4. Database Management System#

Real-world datasets are rarely represented as a single flat table. Instead, they are designed to support specific application requirements such as efficient query execution, optimized read and write performance, indexing, concurrency, and scalability through techniques like partitioning and sharding.

From an analysis perspective, the first step is to identify and retrieve all relevant data components. This typically involves exploring multiple SQL tables or collections in NoSQL systems, understanding how data is organized, and analyzing relationships between entities such as one-to-one, one-to-many, and many-to-many relationships.

Once the schema and relationships are understood, we can write queries to extract the required data. This often involves:

  • Joins (see SQL Joins)

  • Aggregate functions such as COUNT, SUM, and AVG

  • Subqueries and common table expressions (CTEs)

  • Query optimization techniques to reduce execution time and resource usage

3.2.4.1. Need for DBMS#

As data size and complexity grow, simple storage techniques such as CSV files quickly become impractical. Large files require significant memory and compute resources just to open, and performing even basic operations becomes slow and error-prone. Additionally, flat files provide no built-in support for indexing, concurrency control, or data integrity.

As datasets become more complex, storing thousands of attributes in a single dataframe or file is neither intuitive nor efficient. This approach often leads to high data redundancy, inconsistencies, and difficulties in maintaining and updating the data.

A Database Management System (DBMS) addresses these challenges by providing:

  • Structured data organization through schemas, tables, and relationships

  • Data normalization to reduce redundancy and improve consistency

  • Efficient querying using optimized query planners and indexes

  • Concurrency control to support multiple users accessing and modifying data simultaneously

  • Transaction management with ACID guarantees to ensure data correctness

  • Security and access control through roles, permissions, and authentication mechanisms

Modern DBMSs also provide rich ecosystems of tools and extensions for analytics, monitoring, backups, and visualization. These features make DBMSs a foundational component for building reliable, scalable, and maintainable data-driven systems, especially when working with large and complex datasets.