Apache Spark – How to create a powerful streaming application

Apache Spark

After setting up Apache Kafka, the next step in our Retail Business Intelligence Platform project  was to set up Spark, another widely used software solution from the Apache workshop. Apache Spark, the real spark necessary to ignite our project and turn it into a true stream processing arrangement, is probably the most famous streaming engine […]

How to setup a Kafka Streaming project

Streaming Kafka

If you’ve read the previous blog post about this project, you should already know what was the main idea of the project. We’re trying to estimate the unit sales of more than 3000 distinct items in 10 different Walmart stores. The dataset provides us with more than 5 years of data about the dates on […]

Approach for building a business intelligence platform for retail companies

business intelligence platform

Let us introduce our new series of blog posts where you can discover how to build a system and a business intelligence platform that consists of streaming, real time analysis, predictive component, batch processing, and reporting. We will dive into our approach to a new challenge where sales data plays the leading part. After winning […]

How to deal with loudest guy-driven decisions in tech: Part II

Query migration

We started with issues of a wrong tech choice in part I, and now we’ve come to part II where we discuss our approaches to the best solutions. There were two possible solutions to our problems. The first was to keep Ignite and invest efforts to optimize queries and improve cluster configuration options. These include […]

How to deal with loudest guy-driven decisions in tech: Part I

loudest guy-driven decisions in tech

Technology and the tech era have brought many advantages, both to individuals and companies. But there is no one influenced more by technology and its trends than developers and engineers. Overall trends and hypes influence them and cause constant switchovers between technologies. Technology, especially new ones, can significantly improve an engineer’s life, but in some […]

The endless possibilities of machine learning

Machine learning

Machine learning – such a powerful and popular term that has taken over the software development industry, especially data-driven software solutions. It’s a method of data analysis that is based on the concept that systems can learn from data, identify patterns, and forecast future movements. For many companies that either generate or collect vast amounts […]

Data-driven startups – a new driving force of market changes

The reason why startups are interesting is that they require having a set-in business model. They need to have their business plan and future strategies outlined in line with the state of the market. Usually, startups are based on innovative ideas that solve consumers’ or clients’ problems and they answer their needs.   Startups are […]

What are wearable devices without data science and engineering?

In the world of wearable devices, user experience is dependent on metrics and insights they get from using such devices. The accuracy of those metrics is driven by the processes and methods in the fields of data science and data engineering. Our own data detective, Dario Bošnjak had the pleasure of working with the producer […]

Data Integration Tools vs. Data Processing Frameworks/Libraries/Engines

Introduction For any data engineer, data pipelines are very important. Choosing how to develop one is integral for the person building it since it’s going to determine the complexity of the process. When developing a data pipeline you can use two approaches. One of them is to use a programming language and a data processing […]