Vodafone is currently the second largest telecommunication company in Hungary, and recently acquired UPC Hungary to extend its previous mobile services with fix portfolio. Following the acquisition, Vodafone Hungary serves approximately 3.8 million residential and business subscribers. This story is about how Vodafone Hungary benefited from moving its data and analytics platform to Google Cloud.Â
To support this acquisition, Vodafone Hungary went through a large business transformation that required changes in many IT systems to create a future-ready IT architecture. The goal of the transformation was to provide future-proof services for customers in all segments of the Hungarian mobile market. During this transformation, Vodafone’s core IT systems changed, which created the challenge of building a new data and analytics environment in a fast and effective way. During the project data had to be moved from the previous on-premises analytics service to the cloud. This was achieved by migrating existing data and merging them with data coming from the new systems in a very short timeframe of around six months. During the project there were several changes in the source system data structure that needed to be adapted quickly on the analytics side to reach the Go Live date.
Data and analytics in Google Cloud
To answer this challenge, Vodafone Hungary decided to partner with Google Cloud. The partnership was based on implementing a full metadata-driven analytics environment in a multi-vendor project using cutting edge Google Cloud solutions such as Data Fusion and BigQuery. The Vodafone Hungary Data Engineering team gained significant knowledge of the new Google Cloud solutions, which meant the team was able to support the company’s long-term initiatives.
Based on data loaded by this metadata-driven framework, Vodafone Hungary built up a sophisticated data and analytics service on Google Cloud that helped it become a data-driven company.
By analyzing data from throughout the company with the help of Google Cloud, Vodafone was able to gain insights that provided a clearer picture of the business. They now have a holistic view of customers across all segments.Â
Along with these core KPIs, the advanced analytics and Big Data models built on the top of this data and analytics services ensures that customers get more personalized offers than was previously possible.. It used to be the case that a business requestor needed to define a project to send new data to the data warehouse. The new metadata-driven framework allows the internal data engineering team to onboard new systems and new data in a very short time (within days), thus speeding up the BI development and decision-making process.
Technical solution
The solution uses several technical innovations to meet the requirements of the business. The local data extraction solution is built on the top of the CDAP and Hadoop technologies written in CDAP pipelines, PySpark jobs, and Unix shell script. In this layer, the system gets data from several sources in several formats including database extracts and different file types. The system needs to manage around 1,900 loads on a daily basis, and most data arriving in a five-hour time frame. Therefore, the framework needs to be a highly scalable system that can handle the high loading peaks without generating unexpected cost during the low peaks.
Once collected, the data from the extraction layer goes to the cloud in an encrypted and anonymized format. In the cloud, the extracted data lands in a Google Cloud Storage bucket. By arriving at the file, it triggers the Data Fusion pipelines in an event-based way by using the Log Sink, Pub/Sub, Cloud Function, and REST API. After triggering the data load, Cloud Composer controls the execution of the metadata-driven, template-based, auto-generated DAGs. Data Fusion ephemeral clusters were chosen as they adapt to the size of each data pipeline while also controlling costs during low peaks.Â
The principle of limited liability is important. Each component has a relatively limited range of responsibilities, which means that Cloud Function, DAGs, and Pipelines contain the minimum responsibilities and logic that is necessary to finish their own tasks.
After loading this data into a raw layer, several tasks are triggered in Data Fusion to build up an historical aggregated layer. The Vodafone Hungary data team can use this to create their own reports in a Qlik environment (which also runs on the Google Cloud environment) and build up Big Data and advanced analytical models using the Vodafone standard Big Data framework.Â
The most critical point of the architecture is the custom triggering function, which handles scheduling and execution of processes. The process triggers more than 1,900 DAGs per day, while also moving and processing around 1 TB of anonymized data per day.
The way forward
After the stabilization, the optimization of the processes started taking into account cost and efficiency levels. The architecture was upgraded to use Airflow 2 and Composer 2 as these systems became available. Moving the architecture to these versions increased performance and manageability. Going forward, Vodafone Hungary will continue searching for even more ways to improve processes with the help of the Google Support team.Â
To support fast and effective processing, Vodafone Hungary recently decided to move the control tables to Google Cloud Spanner and keep only the business data in BigQuery. This delivered a great improvement in processing.
In the analytics area, Vodafone Hungary plans to move to more advanced and cutting-edge technologies, which will allow the Big Data team to improve their performance by using Google Cloud native machine learning tools such as Auto ML and Vertex AI. These will further improve the effectiveness of the targeted campaigns and offer the benefit of advanced data analysis.
To get started, we recommend you check out BigQuery’s free trial and BigQuery’s Migration Assessment.
Cloud BlogRead More