Machines Learning the Lay of the Land and Signaling Mutations

The Netherlands is defined by water and our ability to manage it. In order to manage it we have to monitor it, and monitoring it requires large sets of input data and their analyses. Using machines to automate these data processes brings 21st century tools to the long tradition of Dutch water management. Young Mavericks specializes in the design, data mining and analysis, and machine learning aspects of digital technology and then trains data scientists to apply the technology with an open mind to diverse contexts. Our model allows different companies and organizations to gain freshly trained talent ready to meet the demands of specific data and information analysis. One of our recent governmental partners required up-to-date analysis of geographic mapping representing the complex blend of land and water under its jurisdiction.

Water in the North Holland Quarter

Hoogheemraadschap Hollands Noorderkwartier (HHNK), the Water Board for the North Holland Quarter, is charged with the entire spectrum of hydrology – including about 20.000 km of waterways and 2.000 km of flood defense systems – in a sector that depends dearly on careful and sustainable management. HHNK’s maps acquire a precise and detailed meta-level view of the entire area by making the differing data agree and work together. We met these challenges with a project designed to automate topographical map-making in the long term while signaling short-term changes in the surroundings.

The solution demanded organizing the flow, segmentation and standardization of one of the most vast and diverse data sets ever processed by Young Mavericks. We called the project ‘Mutation Signaling Using Image Segmentation and Aerial Remote Sensing Data’. This puzzling technical description translates to identifying changes (mutations) when the aerial photos capturing separate light spectrums, the digital government maps, and elevation sensory data are combined together over time.

The main idea of the project was to explore the possibilities of machine learning by starting with an ambitious goal and determining how close image processing methods can get to achieving it. The initial goal was to construct a map similar to the BGT (Basis Grootschalige Topografie), the digital map of the Netherlands maintained by the Dutch Government, using aerial photographs and remote sensing data.

This problem can be seen as an image segmentation problem, where if you take a raster of input data where each pixel represents a real-world location, a label can be assigned to each pixel. All of these pixels together then form a labelled map of the input data.

Three sources were available to base the new map on: RGB photographs, CIR photographs and height maps constructed from laser altimeter data, all supplied as orthogonal photographs made from a small plane. This means that each pixel has 7 associated values, 3 values from the 3 channels of the aerial photograph, 3 values from the infrared photograph and 1 channel from the height map. We chose to use a convolutional neural network for this project, as they are known to give the best results in image segmentation and image processing in general. To use neural networks however, labels are needed as well as input data.

To get these labels the BGT was used. The BGT provides Key Registers, the result of labeling organizations, railways, roads, buildings, trees and waterways and matching them with their topographical images and locations. This allowed us to construct label masks, converting the shapefiles to rasters that correspond to the pixels of the input photographs, with a class per pixel. Before enhancing our own mapping imagery with the BGT labels, we had to find a way to make our local data sources work together. This was cumbersome due to the size of our files (2.2 GB per file and 300 MB per photo) and the various photo resolutions and file types.

Mutations in the Maps

Once we knew how to construct the underlying infrastructure of the maps we turned back to the BGT to align its locations with our own and select the necessary mask labels. Many of the BGT labels were irrelevant; human concepts such as municipality jurisdictions cannot be captured by aerial photography. Taking photos from such distance also meant that many objects identified in the BGT were lost to our scale and scope. We settled on a set of 7 labels organizing 32 BGT categories. These were chosen to represent the physical appearance of the landscape: Open-Textured (sand, gravel), Half-Textured (brick, cobble), Closed-Textured (concrete, asphalt), Non-Grass Vegetation (bushes, trees), Water, Buildings and Grass. Additionally the convoluted network also segmented watersides, a landscape feature based on the slope of the terrain near bodies of water.

We broke down 100 km2 of input/label blocks into smaller samples and then fed them into the network. Segmenting the larger data compilation into smaller chunks, along with standard image augmentation techniques, ensured that the network would almost never see the same input twice. The resulting segmentation proved precise and reliable, especially for pixel-classes like water and buildings.

Our customized segmentation and standardization provided mutation signaling: the process of identifying differences between the digital map and the real world. The mutations are signaled when either the pixel classification or the BGT labels are incorrect, most often discovered by an incongruity between the areas of water classified by the neural network and the original BGT water labels/locations. These differences can stem from various underlying causes, such as new bodies of water that have not yet been updated in the BGT, but mislabeling or errors can also occur in the network. This same process can be applied to find mutations in man-made structures, such as new buildings and land developments. This final tool can be run every time a new set of aerial photographs generates a map with possible mutations.

The data analysis can help HHNK manage their territory, but will also prove useful in improving the BGT mapping. By automatically identifying new trees, foliage, land developments and waterways, HHNK can fulfill its obligations of alerting the BGT to new changes and help improve the government’s identification of water, land and urban development for both private and public use.

Interview with Young Mavericks’ Data Engineering trainees Sije and Don

What drew Sije and Don to the Data Engineering Traineeship? What are the program’s advantages and challenges and why do they wholeheartedly recommend the Young Mavericks traineeship? Read the double interview with trainees Sije and Don below.

Read more

Young Mavericks Behind the Scenes: Interview with Trainer Jelmer

One year ago, Jelmer van Nuss joined the Young Mavericks team to provide training to Data Engineers trainees and to help coordinate the content of guest lectors. “Recently I also have been involved in developing new assignments for the Data Engineers.” What made him join Young Mavericks initially and how does he look back on his first year? Read all about Jelmer’s experiences at Young Mavericks.

Read more