Category: Data engineering

Caeli: The Sky Is Not the Limit in Measuring Particulate Matter in the Atmosphere

Measuring by Satellite

Caeli is a startup dedicated to providing insight into air quality with a view from above. The satellites orbiting our planet provide their end-users with chronological and (near) real-time information. Satellite imagery can be a cheaper and more readily available option than remote sensing as a tool for measuring the molecular composition of our atmosphere. Generating maps displaying particulate matter such as Nitrogen Dioxide (NO2), Ammonia (NH3), Methane (CH4) and Ozone (O3) can help the public and government understand how changes to the atmosphere may affect health or influence the climate.

An enormous amount of visual and quantitative satellite data needs to be processed to create these real-time insights. Copernicus Sentinel imagery offers Caeli historical insights to observe change in air quality over time. The raw multispectral and geographically calibrated source data must not only be able to be processed quickly but also organized chronologically and stored overtime in accessible files. The original Caeli database was not scalable enough to support these data streams, so we needed a new architecture. 

Read more

ELT, ETL and Data Pipelines: loading Data in an Automated Manner and how to do it yourself

My name is Don and I am something between a data engineer and a data scientist. I automate repetitive tasks, generate insights from the data, manage projects and I often help in an advisory role for these processes. I especially enjoy the creative process required to solve problems using data.

Read more

Data Pipeline Implementation: how to do it yourself

These instructions build on what has been discussed in ELT, ETL and Data Pipelines. In that guide, we discussed the problems that arise in storing and using data for a company. In response to those problems, we introduced the concept of Data Pipelines, which helps the company become better aware of the data loading steps and incorporating these steps in the most optimal way to create a Data-driven Culture. We also discussed some specific tooling that can be used to properly deploy Data Pipelines. 

Now that we understand the concepts behind Data Pipelines, we will now apply them to implement a functioning Data Pipeline. Just like most of our data engineering processes, we follow a step-by-step plan and provide an implementation strategy for each step. 

Hopefully a step-by-step plan will give you a solid foundation when you are constructing your own Data Pipeline as well as the implementation methods.  You can find the whole code on our Giftlab.

Read more

How Docker and Kubernetes’ Open Source Technology is Winning Over Businesses

By Young Mavericks’ Data Engineer Don de Lange

Data Engineers must ensure that technological solutions for companies can actually be implemented. In order to fulfil these technical promises in an increasingly complex data-driven world, like many other Data Engineers, I use various software and data tools. Docker and Kubernetes, two leaders in the field of open source technologies, help to build, manage and scale apps in containers. In this blog I explain what Docker and Kubernetes are, why more and more companies are taking advantage of their expertise and how you yourself can take advantage of these platforms.

Read more

Discover how Young Mavericks’ Data Engineering Traineeship inspired Kevin’s (32) career switch

Hey everyone, I’m Kevin Bowey. Thanks to Young Maverick’s October Data Engineering Traineeship, I am currently enjoying working as Junior Data Engineer. You might think to yourself: wow, starting a traineeship at 32? Hell yes! Many of my fellow trainees dove into the world of Big Data during or immediately after their studies. My path to machine learning, however, was a lot less straightforward – a ‘detour’ that gave me unique skills as a Data Engineer.

Read more