2.1. Problem Formulation and Problem Scoping#
Before introducing the formal framework, it is important to spend time understanding what problem formulation means in the context of data science.
2.1.1. Problem Formulation#
The primary goal of problem formulation is to clearly define a problem that can be addressed using data science techniques. This process includes, but is not limited to, narrowing the domain, identifying missing or unknown elements, determining the type of problem to be solved, and outlining a high level plan for approaching the solution.
A data science project can take many forms depending on how the problem is framed. Some common problem types include:
Prediction: Forecasting future outcomes based on historical data, such as energy consumption or sales trends.
Classification: Assigning labels to observations, such as spam detection or medical diagnosis.
Exploration: Discovering hidden structures or patterns in data, such as clustering customers or identifying geographic hotspots.
Recommendation: Suggesting items based on observed preferences or behaviors, such as recommending movies or courses.
Decision Support: Assisting stakeholders in making informed choices, such as optimizing delivery routes or allocating limited resources.
2.1.1.1. Key Considerations#
When shaping an initial idea into a well defined problem, reflect on the following aspects:
Context and Motivation: Why does this problem matter, and who cares about the answer?
Variables: What information is required to make progress? Which variables act as inputs, and which represent outcomes?
Viability: Do you have realistic access to the necessary data? Can the problem be solved with available methods and within a reasonable timeframe?
Ethics and Impact: Could solving this problem introduce unintended consequences? Consider risks such as bias, privacy concerns, or misinterpretation of results.
2.1.1.2. Reflective Prompts#
To move from a vague idea to a clearly defined project, ask yourself the following questions:
What is the central question I want to answer using data?
Why is this question important within its context?
What variables are required, and why are they essential?
What would a meaningful or successful answer look like?
Who would use the results, and how would they benefit?
Is this problem realistic given the resources and constraints I have?
Reflecting on these questions strengthens your problem definition and helps develop a deeper understanding of both the challenge and its potential solutions.
2.1.2. Problem Scoping#
An essential component of problem formulation is problem scoping, which involves defining the boundaries and focus of the problem you are trying to solve. In real world settings, problems are often presented as broad or loosely defined challenges. It is the responsibility of the practitioner to analyze these challenges, identify the core issues, and determine what is explicitly within and outside the scope of the project.
Effective problem scoping helps avoid wasted effort, ensures alignment with goals, and enables measurable impact. When scoping a problem, consider the following:
Define clear boundaries: Specify what the solution will address and what it will intentionally exclude.
Identify stakeholders: Determine who will benefit from the solution and how.
Assess value versus effort: Evaluate whether the potential impact justifies the time and resources required.
Asking these questions early provides clarity and direction. In some cases, this analysis may reveal that a problem is not worth solving. For example, if a solution saves an employee only a few minutes per month but requires weeks of development, the return on effort may be too low to justify the work.
2.1.3. Summary#
With a solid understanding of the fundamentals, we can now define the problem we aim to solve. This step is critical because every stage of the data science lifecycle depends on it. Starting with an unclear problem often leads to repeated revisions, wasting time and resources.
By analyzing the domain and available data, we can refine our focus and state the problem precisely, setting a clear path for effective progress.