3.2.2. Structured Data#
Most datasets used in data science have a well-defined structure. Each dataset consists of multiple records (rows) and corresponding features (columns), typically organized in a tabular format. This structured data can be stored in various file formats, such as:
Excel files (
.xlsx)CSV files (
.csv)Apache Parquet files
Structured data might also be stored in JSON files, but it’s important to note that JSON itself does not enforce a strict tabular structure. We will cover such cases in more detail under the Unstructured Data section.