If you are an enterprise company or an entity that either generates or collects enormous amounts of data, then you must be familiar with the fact that the more data you have, the more you’ll add on. As soon as you recognize the value of data in your everyday operations, you will see that data covers every corner of your business. By recognizing the value and demand for such data and for you to move forward as a company, data will continue to grow. And that calls for an occurrence called data gravity.
What actually is data gravity?
This term was coined in 2010 by Dave McCrory, a software engineer, to explain the notion that large amounts of data attract more data.
Data gravity is an observed tendency that large datasets attract smaller datasets, services, and applications, thus creating an even larger data-heavy or data-rich system.
Why gravity? Well, it is explained that like larger bodies of mass attract smaller bodies with gravitational pull, data systems do the same. The most used comparison is with planets. Presume that a data system is a planet. With its gravity, it will pull in, for example, moons and other bodies.
In the same way that is hard to move bodies pulled by gravity, data, in these situations, becomes difficult to move as well. When data starts pooling into one place, it becomes inevitable that additional data and applications will almost “attach” to it, which often makes the system inflexible.
When gravity is not such a cool concept
Even though you might think, oh, great, all my data is in one place, it isn’t always such a great thing. As data and applications move closer to the central data store, they limit its flexibility which makes it harder to scale or even adopt new applications.
Data latency also becomes an issue. It gets longer and longer to get vital data from its source to the user. Data slows down and, due to flexibility issues, it doesn’t reach the user as fast as it should. Data should be moved closer to applications to improve latency rather than being stored in remote areas.
Most companies don’t consider the fact that their data will grow and consequently outgrow the current system. This will require them to move data to another solution. In the case of data gravity, this becomes an issue since this data has now become harder to move and its portability is low.
But one of the biggest issues with data gravity, if data does become harder to move, is vendor lock-in. By not being able to move data to another technology, companies get stuck using the same vendor because it’s too difficult to switch to another service or software provider. Companies become too dependent on current vendors, they let data accumulate and get heavy, so later on the cost and effort of moving to another vendor are too high.
Business downsides of data gravity
Data gravity doesn’t affect only the data infrastructure and application performance, but it consequently influences the business side as well. Not planning how to centralize data but keeping it close to applications, will in the long run result in low or bad data utilization.
Slow response times
Due to low data agility and cluttered data, response time to changes is going to be much slower. If the end user doesn’t access data on time or at desired moment, his decisions or actions are going to be delayed. This means that some immediate actions won’t be performed and this can cause costs, loss, or lead to wrong decisions.
Inefficient use of resources
Data gravity can slow down or limit the digital transformation of companies. Overly centralized data or even data duplications (when multiple teams work on the same data) can lead to inefficient ways of utilizing those resources or applications. There can be missed opportunities or fewer innovations caused by improper or non-maximized use of resources.
Information silos
One of the biggest problems for data management is data silos. This means that not all stakeholders can access data or it’s not easily accessible. This presents problems in promoting data democratization and in full data utilization. Having one data storage that is unavailable to other groups or departments in the organization defies the purpose of interdepartmental collaboration. So, if someone outside of the group needs access to some information, they will have to go through many hoops to get to it.
Slowing down growth
As said before, data gravity slows down the performance of applications and presents issues in data flexibility. In these cases, people or users who need that data to perform to perfection, aren’t getting what they require. Certain business operations will face issues that will result in slower response times or delayed decisions. It will cause growth and progress to slow down until issues with data and data gravity are resolved.
Data gravity is not necessarily so bad
We did mention that data gravity can cause quite a few issues. But having data centralized is not a bad thing, per se. If your data management knows how to handle it and they control it in case they ever need to move data, then centralized data can be beneficial. The key is to weigh benefits with potential challenges and downsides. Until companies reach the point when they have too much data that slows down operations, data gravity won’t pose such an issue, when managed correctly.
Having data centralized and accessible, can drive decision-making. Data can be easily analyzed and used to create metrics and insights that help take action and make decisions. The issue occurs when data becomes hard to access or is slow to get to the users.
Data centralization also potentiates collaboration across companies and departments, or data democratization. Unless there are data silos, data in one place can allow different teams to work together and share data from a single source of truth. If companies can avoid data duplication, efforts in working with data can be optimized and teams can work more efficiently.
By optimally approaching data gravity management, teams can learn how to utilize data and use it with new technologies and applications that drive innovation and produce competitive advantage. Only when properly managed can data gravity be an asset rather than a challenge. Sometimes it’s more effective to manage it rather than waste resources to completely eliminate it.
How to tackle data gravity before it hits
To successfully minimize, prevent or manage data gravity, there are some steps one can take.
First and foremost, understand, monitor, and assess your data gravity. When you know how your systems work and what are the advantages and disadvantages, then you’ll know your course of action.
That’s why it’s important to understand your data, how it’s collected, stored, and in what form. You need to know how data originated, who created it, and how it is used. This will help in understanding the scope of data utilization and which parts of it are beneficial to your company. Knowing everything about your data implies evaluating your data management processes, as well. By identifying what can be improved, data gravity can be managed far more effectively.
Try to centralize your data where it’s most optimal to actually use data gravity to your advantage. Try avoiding overly disperse data and having way too many points where data gravity can occur.
A couple of the most used methods of battling data gravity are edge computing and using a hybrid cloud approach. Edge computing helps move data closer to the source, and the cloud approach helps you leverage the power of data gravity.
But, before all that, take time and resources to understand and implement data governance and rules on how to tackle and manage data. And let’s not forget data quality management. Both of these areas control and monitor data while making sure your company gets the best of it.
Don’t mistake data gravity for foreshadowing something ominous
If you went through the whole post, you surely noticed that data gravity brings a lot of good but also some complications. The whole influence of data gravity will depend on the type of business you run, the data collection and processing infrastructure, and the data management systems you have established.
For some, data gravity will come as a blessing, and for others, it can prove to be tricky. The only way to be prepared for both scenarios is to preemptively consider both.