A Scalable Architecture Designed for Time and Place
Our first step was to design an architecture that could process and store large influxes of data at a quick pace. Scalability was crucial considering the inevitable increase in data for processing and storage. The obvious choice for scalability was the digital clouds. In this case, the Amazon Web Services (AWS) cloud platform provided the best data storage options. We created a database within AWS and a data pipeline to collect the acquired data and write it into the database for the NO2 gas that can form particulate nitrates.
When Caeli retrieves information from their own database, they often want to filter it by time and location: for example, data from Amsterdam during January 2021. Filtering by time is not a problem because the data is stored chronologically (ascending); the database system roughly ‘knows’ in which rows the January 2021 records are found.
However, it becomes more complicated when you also want to filter the data by location. The visual data is not georeferenced in the order of X and Y coordinates, and only about one record in a million in the database actually matches with coordinates in Amsterdam. It is very inefficient to check each of these lines, so the challenge was to find an architecture that could efficiently filter through multiple dimensions.
Rob: applying my knowledge and skills at Caeli
I made intensive use of many tools and techniques that the Young Mavericks traineeship introduced to me. Both Amazon Web Services (AWS) and the Hadoop ecosystem, which together were the focus of this assignment, were extensively practiced during the traineeship.
The training prepared me to provide the best solution for Caeli’s data management. On the one hand, Caeli receives a working end product that fits the necessary precision and accessibility of information. Nitrogen dioxide (NO2) data is now automatically collected and stored in a scalable database where records can be efficiently filtered by both date-time and location. On the other hand, clear documentation of processes and transfer protocols allows Caeli to manage this product itself and to reuse it for other atmospheric particulate matter.
“The project helped us to migrate from an on-premise environment to the cloud environment. Young Mavericks and Rob put us on the right track so that we were able to migrate our environment to an AWS environment. As a result, our NO2 data has become available and we have been able to take a step to scale up to other products and to be ready for other countries.” – Martin and Tim from Caeli