In the world of the supply chain and input production, having an accompanying solution that could recognize future trends and outcomes is advantageous. In cases where your business faces uncertainties that are linked to your product, it is optimal to have data that could prepare you for that. Imagine a forecasting software solution that could use historical and daily data with machine learning to forecast future numbers! This could bring a certain balance to your production or farm (in our case) and help you make strategic decisions based on more accurate numbers.
Usually, analytical reports are based on historical data and what happened in the past. Analysts draw conclusions from that data and present results generated on past figures. A forecasting module could be implemented on those systems and they could add another dimension to the decision making. It could present future numbers and possibilities, and upgrade your existing software solution. Now, the user is familiar with the past, but also with the future.
In this post, you will see how we managed to build an ML-based solution for milk yield forecasting, that could be integrated on top of the My Dairy Dashboard platform. It uses an existing platform but adds conclusions about future outcomes.
With a dashboard-based approach, the platform gives a complete overview of the dairy farm processes. These cover all events that happen during the cow lifecycle, such as calving, breeding, feeding, milking, health-checking, and other events.
At the end of this post, you will understand how forecasting can enrich the existing platform, and find out the benefits it brings to the clients.
Forecasting Module
Dashboards, prior to the forecasting module, mostly display the current state of a farm along with historical statistics. This is great for showing trends and getting insights into the current performance by comparing various KPIs with historical values. However, it does not provide answers on what the future would look like.
Questions that this module can help answer are:
- How much milk can farmers sell weeks or even months in the future?
- How much milk will be on disposal to be processed by the milk processors?
- What are the trends in overall milk production?
Dairy farm income mainly depends on the amount and quality of the produced milk. As we need numeric inputs to our process, milk quality was determined by butterfat and protein percentages in milk. Hence, knowing the future values of these three allows better planning and confident decision-making. A similar approach can be applied to other farming processes as well. For example, another project we worked on was related to feeding process optimization that is based on feed amount and cost ratio.
The forecasting module offers daily, weekly and monthly granularity, making it very intuitive for interpretation and enabling easier short/mid/long-term decision making.
The best forecasting performance is achieved with a farm-specific approach. This means that the forecasting module chooses the best forecasting method depending on inputs such as herd size, health condition, age, climate, food, environment, stress, and others. Data quality and availability are also considered. Furthermore, the module is self-improving as it learns from its mistakes and corrects them in the future. Performance is tracked automatically by comparing forecasts to true values once they become available.
Our Approach
Making sure the necessary data is in place
As it usually happens, not all data necessary to make (quality) predictions were in place at the beginning of the project. The existing data pipeline hasn’t been designed to keep all the historical records in an easily accessible way. Simply put, there was no such requirement in the past and it has been easier to reason about only a subset of data that was actually needed.
Once we realized there is no sufficient data for the new predictive capabilities of the system, we re-engineered parts of the existing data pipeline in order to make necessary data available.
Understanding the data and making new discoveries
Having data in place, we approached the next challenge. While the end goal was clear, there were many unknowns at the beginning. What does this piece of data actually represent? How, and from where, can we extract particular information we need for prediction? Why are there anomalies in behaviors between multiple herds? These were only some of the questions we faced.
Luckily, through the data science team, we were able to analyze the data and answer all critical questions. When it wasn’t possible to directly extract information, we used methods of approximation/heuristics, as well as a trial and error approach until we got satisfying results.
Suddenly, we were able to get pretty accurate numbers such as lactating cows in a given period, as well as information like what is the value of, the so-called, “Days in milk” for a specific cow.
Moreover, during data analysis, we learned so much that we were able to help our client improve existing algorithms (and related insights) to make them even more accurate and valuable to the customers.
Custom-tailored prediction model per herd
Not all farms (and herds) in stake are the same, and not all contain quality data sets. The solution needed to be flexible and capable of supporting all herds via a custom-tailored prediction model based on the available data sets and unique characteristics of a herd.
One such example was related to cow lactation duration. Different farms end the lactation period of the cow at a different time – some farms tend to milk the cow longer than others for various reasons. Once we applied this learning our prediction model started yielding even better results.
We also adjusted for things like lack of quality data due to manually recorded events (error-prone), late or completely missing data, unexpected correlations, and other causes of bad data.
Fully automated, self-evolving cloud solution
The overall solution is a fully automated process running in the cloud. As the new data arrives, the system learns and becomes better at predicting production volumes. With the cloud approach, we unlocked the capability of up/downscaling of the solution to support any number of herds.
Continuous performance tracking
Simply because the solution is deployed, it doesn’t mean the work is finished. We built-in mechanisms for continuous performance tracking to be able to react to a decreased quality in the prediction numbers if such an event occurs.
Where Does It Run?
The main goals of such solutions are:
- To use historical and current data with machine learning to predict trends and outcomes in output quantity
- To create a fully automated, self-evolving cloud solution
- To develop a system that learns through daily data intake and becomes better at predicting
As the module provides a farm-specific approach we had to consider the scaling capabilities of our solution. In order to scale with the varying number of clients (farms) that could use the module, we decided to go with a cloud-based solution.
The key components of our solution include a data storage service used for storing input files of the process, intermediate results, and process outputs. The process itself had to be designed by an easy to deploy machine learning service that would provide the ability to implement typical ML tasks (such as data preprocessing, model selection, hyperparameter tuning). Finally, we had to use a service that would connect all of these tasks and data storage to a single automated workflow. Certainly, we had to think about the price as well. The goal was to create an efficient solution that would be priced just for the time when it actually runs, and avoid dedicated server instances.
We found all of these components in the AWS cloud ecosystem. More precisely, we used S3 as a storage service, SageMaker for the ML tasks, and Lambdas to connect all of it together to the single automated workflow.
What Does the Future Hold?
After a couple of months of testing, we can say that our predictions hit the targets pretty well. Certainly, the quality differs from farm to farm, depending on the underlying data availability and quality, but overall analysis points to the single-digit percentage errors, both for milk and component predictions.
The end results of this solution, while making sure that all the phases are covered, are 10% better predictions compared to the baseline. This percentage could increase based on continuous daily information intake. More data leads to more accuracy, and finally, to better machine learning. This is a fully automated, self-evolving cloud solution that makes forecasts, enables faster decision making, regulates costs, and improves production management.
Encouraging results like these represent a great motivational factor for our team and all potential clients, as we hope that having the insights into the future milk production and quality could lead to significant benefits for their business.
Hope that you enjoyed reading this post. If you have questions about the topic we will be happy to discuss them in the comments. Feel free to contact us and stay tuned for more details about the topic.