Category: Data engineering

Trainee Data Engineering Shama: ‘Give Yourself a Solid Technical Foundation with Young Mavericks’

During her bachelor Industrial Design and her master Computer Science, Shama (26) discovered her passion for data, technology, and engineering. After a simple application process, Shama began her Data Engineering traineeship at Young Mavericks. Now, eight weeks later, she reflects: “The traineeship helped me recognize my own value. I’m not just someone who invents and programs things – I am a person who adds value to companies.”

Read more

In Search of Solar Panels

Harnessing data technology for sustainability: New cutting-edge satellite photos were released to the public earlier this year, and with plenty of experience developing engineering and machine learning to enhance remote sensing data analysis and organization, we were ready for them. As Young Mavericks, we had the tools and experience to hit the ground running and improve the Basic Registration Topographical (BGT) large-scale maps of the Netherlands with unregistered objects to understand – and encourage – the use of solar panels

Read more

Caeli: The Sky Is Not the Limit in Measuring Particulate Matter in the Atmosphere

Measuring by Satellite

Caeli is a startup dedicated to providing insight into air quality with a view from above. The satellites orbiting our planet provide their end-users with chronological and (near) real-time information. Satellite imagery can be a cheaper and more readily available option than remote sensing as a tool for measuring the molecular composition of our atmosphere. Generating maps displaying particulate matter such as Nitrogen Dioxide (NO2), Ammonia (NH3), Methane (CH4) and Ozone (O3) can help the public and government understand how changes to the atmosphere may affect health or influence the climate.

An enormous amount of visual and quantitative satellite data needs to be processed to create these real-time insights. Copernicus Sentinel imagery offers Caeli historical insights to observe change in air quality over time. The raw multispectral and geographically calibrated source data must not only be able to be processed quickly but also organized chronologically and stored overtime in accessible files. The original Caeli database was not scalable enough to support these data streams, so we needed a new architecture. 

Read more

ELT, ETL and Data Pipelines: loading Data in an Automated Manner and how to do it yourself

My name is Don and I am something between a data engineer and a data scientist. I automate repetitive tasks, generate insights from the data, manage projects and I often help in an advisory role for these processes. I especially enjoy the creative process required to solve problems using data.

Read more

Data Pipeline Implementation: how to do it yourself

These instructions build on what has been discussed in ELT, ETL and Data Pipelines. In that guide, we discussed the problems that arise in storing and using data for a company. In response to those problems, we introduced the concept of Data Pipelines, which helps the company become better aware of the data loading steps and incorporating these steps in the most optimal way to create a Data-driven Culture. We also discussed some specific tooling that can be used to properly deploy Data Pipelines. 

Now that we understand the concepts behind Data Pipelines, we will now apply them to implement a functioning Data Pipeline. Just like most of our data engineering processes, we follow a step-by-step plan and provide an implementation strategy for each step. 

Hopefully a step-by-step plan will give you a solid foundation when you are constructing your own Data Pipeline as well as the implementation methods.  You can find the whole code on our Giftlab.

Read more