1. Introduction - Novel Lifecycle#
Data Science has transformed how decisions are made in the modern world. Once driven largely by intuition, decisions are now increasingly guided by data. With massive amounts of data being produced and collected every day, data science has become a pivotal discipline for extracting meaning and actionable insights from this information.
Applying data science is an iterative process, we hypothesize, test, and refine our ideas continually. Each stage in the lifecycle brings us closer to converting ideas into measurable insights. Several Data Science Lifecycles exist, including CRISP-DM, the Domino Data Science Lifecycle, and the Waterfall Model. You can explore more frameworks here.
For the purpose of this book, we will follow our own Data Science Lifecycle, inspired by these existing frameworks but tailored to align with industry practices and learning objectives.

Data Science Problem Formulation: This stage involves hypothesizing an idea or identifying a problem that can be solved using data science.
Data Acquisition: Once we have defined the problem, we gather relevant data required to solve it.
Data Exploration: After collecting data, we analyze and understand it deeply, examining its features, limitations, and quality.
Data Wrangling: Here we clean, preprocess, and transform raw data into a usable format for modeling.
Data Mining: In this phase, we look for meaningful patterns, relationships, and hidden trends within the data.
Modeling: We build models using our processed data to make predictions or derive insights.
Data Science Applications: Finally, we make our models accessible to real-world users by integrating them into applications such as dashboards, tools, or APIs that enable others to benefit from our work.