Terabytes of data landing monthly in an on-premise Data Lake doesn’t scale very well, especially when considering the amount of intermediate results generated by different pipelines consuming the data. To accommodate the amount of data that is nowadays generated by vehicles, a Data Lake should double its size every year just to accommodate the data.
This is the reason why a large German automotive company decided together with Data Reply to migrate the existing Data Lake to AWS. The main challenge was how to tackle this migration while guaranteeing zero downtime for the existing use cases, as well as a smooth transition for users and stakeholders. Developing a set of reusable Data Services which support different formats for data input and output, as well as different type actions, allowed the team to deploy 80 different data pipelines, using the same piece of code and only modifying its behaviour by the use of configurations. Common actions supported by the data service are anonymization, data enrichment, data denormalization and flatting of complex structure.