What do you do when you are introduced to a complex and exciting data challenge in the insurance industry? You apply, of course! That is how the team at Digital Poirots entered the Croatia osiguranje & BIRD Incubator Data Challenge. To get a better understanding of the challenge and processes, we decided to do a one-on-one interview with two of our data detectives, Renato and Ivo.
Can you tell us a bit more about the organizers and the challenge itself?
Croatia osiguranje, the leading insurance company on the Croatian market, and BIRD incubator organized a data challenge competition. The main tasks of this challenge were to understand the B2B insurance market better, provide market segmentation, analyze market trends and provide forecasts. They wanted for the participants to create a solution that will turn their data into actionable insights.
Can you delve more into those tasks of the challenge?
For the market segmentation, the market needed to be divided based on client size. There should be a better understanding of current clients that will allow better targeting in marketing and help discover a segment that offers the most room for growth. Ultimately, it should help increase sales and market share. In analyzing market trends it was important to define the trends of different insurance products and each market segment, and how they compare to each other. But the bigger challenge was how would these trends behave in the future and what inputs could be provided for business decisions. It was a wide base of questions that needed to be answered, and it was something right up our alley. Ultimately, after months of hard work, our solution won us first place in the challenge.
What were your first thoughts when presented with the project?
The project seemed very challenging, but it also presented a rare opportunity to get inside the real dataset owned by a corporation. Not everyone has the chance to work with such sensitive information and get an insight into someone’s company and market. So, to us, getting our hands on real data and having to do something with it is always exciting. Of course, to comply with data privacy laws, we got access only to anonymized data. Because of that, good communication was integral. We worked with someone else’s data and we needed to understand what each piece of information meant and what were the limitations. Getting to know the insurance market was a starting point for us. We wanted to deliver a high-quality solution that would meet the expectations, so having a clear flow of information proved to be the key to answering the tasks.
Why did you decide to apply?
To us as a company, this was an opportunity to show our skills and knowledge. We wanted to prove why our work is high-quality and why data science and data engineering are fields where we feel very comfortable. Also, cooperating with the leading insurance company on the market was something we did not want to pass. This was a chance to apply what we know to solve a real business issue and to bring great value to the client.
Can you draw a line between the challenge’s problems and the solutions you wanted to provide?
It was clear to us that we would need different approaches for each of the tasks. Market segmentation required unsupervised learning, more precisely clustering algorithms. In contrast, trends and forecasting are perfect candidates for the time series modeling approach, so we went in that direction.
What was your approach to this challenge?
As in any data project, we had to understand the dataset first. We needed to ask the right questions to extract only the relevant information. It’s important to understand relationships inside of the dataset itself and later document the findings.
Then, we proceeded with experiments. It started with researching existing materials and benchmarks on the market to choose appropriate algorithm candidates. If the dataset was too large, we used a representative sample to test the candidates. Next, we set evaluation criteria, compared the results, and picked the best candidate.
When we decided on the best solution, we needed to implement it, analyze the results, and fine-tune it. Afterward, we exposed it through an easy-to-use interactive graphical interface so the client could see it in action, try its features and visualize them. And for the final step, we documented it.
What challenges did you face? What were the limitations?
Every project has its challenges and limitations. To approach the project we needed to gain a bit of expertise in the subject matter. We needed to get to know the insurance market so we could understand pain points and the data itself.
As for restrictions, the main ones were privacy and security restrictions since the data was anonymized. We couldn’t get as deep into it as we expected at first. For example, we couldn’t link the companies to the item purchases, hence we could not use a client-based approach, but a more general one where all purchases were aggregated on a higher level (dates, segment, etc.).
Why did you choose certain models and methods over the others?
Firstly we identified a group of relevant methods that we could use in each of the tasks. We needed to see how they all correlate to the specific task. For the segmentation task, we started with visualizing our data and comparing it with benchmarks. By doing so, we picked only a set of algorithms that work well on similar clustering tasks. Additionally, if more algorithms yielded similar results, we picked the ones that were faster. In forecasting, the second task, the time dimension is very important. As we did not have huge datasets, we discarded all deep learning algorithms that might be used here. Instead, we turned to well-known statistical models that proved themselves in many similar situations. Then we did experiments where we determined which one was the best fit.
What can be improved in the solution to make it even better?
A full partnership with the client would be of significance because then we would eliminate the need for data anonymization and restrictions that existed during this challenge. That way we would get more detailed information or even some new data sources that we may define along the way. We believe that would result in a significant performance boost. Also, the partnership implies even better cooperation and subject matter expertise that could be applied to improve the solution. We may also refine goals or divide current objectives into more scenarios that will lead to:
- Better problem understanding and clearer goals
- Ensuring that only relevant data is used
- Reduction of the complexity
- Better performances
- Faster implementation
Why do you think that data science & data engineering are an integral part of providing exceptional software solutions?
Data science and data engineering bring added value. In case your application gathers data, why wouldn’t you use its full potential to solve some business problems or to provide extra value for your customers. Identification of relevant data you own, and especially the analysis of it, might be challenging for the companies, but that is why companies like us exist. To offer expertise and guidance in your journey.
If your application does not gather the data, or the data is unorganized and scattered across different platforms, your first goal would be to organize it. This is where data engineering comes into play.
Data science and data engineering complement each other and they are both an important part of the processes in building solutions that are based on data.
Working on this Croatia osiguranje and BIRD Incubator challenge was rewarding, not only because of winning first place, but because we learned so much. This was an introduction to the insurance market and we saw many opportunities on how data could be utilized to improve business processes, but also to deliver solutions and results that could open new possibilities for the client. We managed to prove that we have the expertise and a fresh, innovative outlook on the matter. Also, we believe that the solution could be made even better if we get the chance to delve more deeply into the insurance world and its data. The possibilities are endless and we would love to see data science and data engineering as guiding forces in creating smarter solutions.