Data

It pays to be old school when it comes to really understanding your data.

As the world becomes more digital and the amount of data grows and grows - it is approximately 30 times the size it was in 2020 compared to 2010 - more and more information will need to be sifted through.
6 minute read
By Martin Connor

As such, more and more automated solutions for decision making based on data will be utilised, resulting in marketers and marketing analysts, become increasingly reliant on Artificial Intelligence (AI) and Machine Learning (ML).

Nothing’s New, But It’s All Change

Marketers have always sought to target the right customer or audience, personalize the content they receive at the right time and in the best channel. Nothing new there, as that’s been the holy grail of marketing for many years but now it’s becoming a reality. Platforms, processes and ways of working have quickly evolved in recent years, making that a reality. Machine learning and AI algorithms can be set up to make decisions in real-time based on latest interactions across any channel.

Hollywood 01

GIGO still applies!

From Star Wars to Terminator and Minority Report, Hollywood and the film industry has always led us to believe that one day the machines and AI will take over the world. As such it would be easy to think then, that machines can take over the roles of the humans in the world of data and analytics. And whilst there is some truth in this and that ML and AI can play a very important role in helping us to make sense of our data, the old rules still apply.

GIGO

It’s certainly true that ML and AI can help ‘repair’ and make sense of poor-quality data but before we can get anywhere near the point of applying models to our data, there is still a whole load of cleaning and restructuring that is needed. Much of the data stored around the world today is in a right mess!

Structure. What Structure?

Data bombards us from so many directions, at different rates, with most of it being unstructured, which results in it being stored in various formats that are not linked by common keys or identifiers, like in traditional database formats. There may be strange characters in the data, which can lead to reading errors and cause scripts to crash or to not know where the end of a line is. There may be inconsistent field delimiters which leads to data in the wrong place. There may be duplication of data, mixed formats of dates, the list goes on. So, with any project, there is a huge amount of discovery work and process to do with the data before it can be made sense of and help to design some sort of structure. A good old fashioned eye-balling of the data can reap dividends and save so much time! Then there needs to be some data engineering done to derive pseudo keys that can help us to stitch the data together and create new, cleaned up data layers that are more friendly to the systems that are going to ingest them.

You Still need Humanware!

Once we have a cleaned data layer with a structure, an analytical audit should be performed across all the variables to work out field distributions, field populations, where there are anomalies and how best to deal with them. Algorithms can then be set up to transform data based on these findings. After that, there needs to be further analysis to help answer business questions and develop strategies of how to use the data and how it can be enhanced going forward. Only with the experience and intuition of humans can any sense be made of all of this.

Humanware

Now the machines can rise!

Once all the preparation has been completed, we can think about applying machine learning algorithms in the right way and implementing them. They can be set up to retrain themselves and be left to do their thing for a period. However, they will always need monitoring and revisiting by people. As circumstances in a business change, new products are introduced, or new trends and factors in the market come into play, analysts and strategists are needed to see how this may impact the models and what changes might be needed in the data, so that the existing models can be modified, or even completely replaced.

What if the steps above are skipped?

Data can still be fed into a model, it could still work and it might produce some predictive output, but there’s’ a high chance that the results will be poor, meaningless and unstable. As an experienced analyst, I have seen it happen where those new to the discipline are skipping straight through to the machine learning part and haven’t realised the amount of work that needs to go in beforehand. I believe this is, for the most part, due to the way data science and analytics are being taught at Universities, with pre-cleaned data being provided for the students to develop their models. It is understandable, as it avoids a lot of hours of work that isn’t the focus of the course they are on. It is up to people like me to help them develop these skills when they start out in their careers. The analogy with decorating is that you spend most of the time sanding, filling cracks and preparing before you even start to paint and make things look pretty!

Go ‘old school’ for Success!

The process of cleaning, transforming, and understanding your data can be typically up to 90% of the work for any analyst on a project. However, I have often found this is not the perception of our clients and therefore we must help them understand the ‘old school’ processes required before we can get to the point of answering the real brief.

Martin Connor
Senior Analyst, ekino London.
An experienced Data Analyst who helps clients turn base data in to actionable insight. Martin has a track record of managing successful data driven projects for multinational companies.

more from ekino

5 minute read
Many businesses are overwhelmed by data and the metrics available to them, which they could use for measurement. A common trend we see with clients is in trying to measure everything.
By Elena Popov
12 minute read
Progressive Profiling is an approach to capturing data that can gradually help develop and build our understanding of a customer and their behaviour over time.
By Nick Jordan