Building a data science project from the ground up

A data science project aims to answer a specific question. Often we want to predict a specific characteristic (value) of an observation, before the actual value can be obtained. Here’s how we approach this.

3 min readJun 1, 2018

A clear goal

For each question for which we want a prediction, we implement a dedicated project. Here at Bonitasoft we decided to work on the prediction of whether the time constraint of a service level agreement (SLA) will be met. Many projects face this type of time constraint, that is, of a specific time limit in which to complete a process. Whether for legal constraints or for business efficiency, it is crucial to make sure that processes are executed in a given time-frame.

How to build a data science project

All data science projects share the same high-level framework. One can describe this frame work as like building a rocket with several stages, one on top of the other. At Bonitasoft we like this analogy, because we like the idea that the application of data science to BPM is a bit like “rocket science.” This kind of project requires a team of people with technical, business and data analysis skills.

If the team wants to leverage data to gain information and ultimately knowledge on the system, it must have a fair amount of data and understand them deeply. Basic understanding of the data is not enough!

Gathering the data

Most systems nowadays generate huge amount of logs, events, database records, and sometimes even noSQL documents. It is only a matter of having developers extract them and provide them to the data scientist. Is it really that easy? Yes and no. Having a lot of information is good, and extracting data from a system is (often) pretty straightforward. But the data arenot always readable or usable as is. Data need to be “explained,” interpreted, or translated so that a data scientist can use them.

Cleaning the data

Once the team has the data, it tries to identify missing or invalid values. Inconsistent data must be ignored in further analysis. This is what we call cleaning the data.

Exploring the data

The team can now visualize data aggregated on dashboards with graphs or tables. Such a view of the dataset allows team members to get a better understanding of the execution of the system and the business. From those graphs the team (mainly data scientists and business people) can identify trends or correlations between events and values observed.

As a result the data scientist is able to provide the team with guidance on the Machine Learning techniques that are potentially applicable to their situation. This guidance is often the result of hypothesis and guesswork based on trends and patterns spotted in the data regarding the goal of the project.

Building a Machine Learning (ML) model

Next the developers implement Machine Learning techniques, leveraging the data previously cleaned and explored. They build an ML model, a kind of mathematical formula applied to the data considered to be meaningful from the dataset. After comparing the various techniques efficiency on the dataset, the team can decide which one gives the best results. Unfortunately there is no unique solution that is best for all datasets. The team has to experiment with several until they find a satisfying solution.

When they have found a good model, the team can start making predictions.

At Bonitasoft we are in the process of implementing a new module that will predict if a process will respect its SLA or not, and will then let the process users react to this prediction to correct the situation if necessary.

Stay tuned, for more articles on Data Science applied to BPM!

This story by Olan Anesini and Nicolas Chabanoles was originally published on towardsdatascience.com on June 1, 2018.

Building a data science project from the ground up

A data science project aims to answer a specific question. Often we want to predict a specific characteristic (value) of an observation, before the actual value can be obtained. Here’s how we approach this.

A clear goal

How to build a data science project

Gathering the data

Cleaning the data

Exploring the data

Building a Machine Learning (ML) model

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Bonitasoft

No responses yet

More from Bonitasoft

Automated visual regression testing with TypeScript, Puppeteer, Jest and Jest Image Snapshot

Modifications can sometimes change the appearance of an application. Here’s one way to check for visual regressions.

Embed Bonita Engine in a Spring Boot application

This tutorial is an example of how to embed the Bonita Engine (BPM workflow engine) in a Spring Boot application

Database Schema Versioning and Migrations Made Simpler For High Speed CI/CD

How Liquibase can be used in a Java project, with Spring / Hibernate, to version the database schema

3 simple ways to create connectors for the Bonita platform

Use connectors to execute read/write operations and “connect” the BPM platform to other services & systems

Recommended from Medium

AI Agents: Introduction (Part-1)

Discover AI agents, their design, and real-world applications.

15 AI Agent Business Ideas to Get Rich in 2025

Lists

Predictive Modeling w/ Python

Practical Guides to Machine Learning

Natural Language Processing

data science and AI

The 5 paid subscriptions I actually use in 2025 as a Staff Software Engineer

Tools I use that are cheaper than Netflix

Data Science All Algorithm Cheatsheet 2025

Stories, strategies, and secrets to choosing the perfect algorithm.

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

15 Top Machine Learning Projects for Final Year Students

Machine learning (ML) has become one of the most in-demand skills in the world of technology. Whether it’s predicting house prices or…