top of page

Machine Learning for Data Quality Problems

Machine Learning (ML) sometimes is referred to as a ‘black box’. However ML simply refers to a process where a machine is trained to learn and automate statistical analysis. This method becomes particularly useful for quality assurance of raw real time data feeds, such as those used by FloodMapp’s ForeCast and NowCast products.

FloodMapp's ForeCast and NowCast use the most up-to-date stream level data (from over 80,000 gauges) from various agencies across Australia and USA. We translate raw stream level data points into a real-time flood extent to provide emergency managers with situational awareness for unfolding flood events.


Utilising raw real-time data is powerful however the data hasn't been through a quality assurance process. Some of the issues routinely encountered are:

  • Stream gauges datums are non-existent or change due to infrastructure calibration or replacement.

  • Stream gauge sensor data is noisy - registers incorrect data points (outliers).

  • Stream gauges stop recording during flood events - sensors have been damaged or network infrastructure is offline.

A summary of how some of these issues have been overcome with ML is provided below.

  • Gauge elevation/datum inconsistencies are managed through understanding the distribution of the water level record and relative gauge datum and using that distribution as a basis for comparison. This allows adjustments to the gauge zero should there be a notable change in the cease to flow values being recorded.

  • To fix noisy data or outliers, we use a combination of signal processing and classification methods. Signal processing (Hampel filters) are used to identify noise over the historical record of a given stream gauge. This approach is great for historical records, but it is somewhat weak when trying to identify noise and outliers for new data points. Therefore historical results are used to train classifiers. Classifiers are then able to discern whether new data points are noise or true data points in real time. This is an important process and is tested rigorously to ensure real data (such as a steep increase on a rising limb of a hydrograph) is not filtered out of dataset.

Figure 1: Noisy data (outliers) for the Chambers Creek Alert Station which were filtered out using ML.

  • When stream gauges stop broadcasting, it’s either due to decommissioning or the gauges have been damaged as a flood event unfolds. For us, the latter is particularly problematic as it prevents the generation of results when they are needed the most. To address this, we us ML to generate a resultant hydrograph based on both the historical series and observed behaviour of upstream and downstream gauges. An example of this approach is shown at the Freeman’s Reach gauge in North-West Sydney which was damaged during the March 2022 flood event. Using ML techniques, FloodMapp was able to re-construct the gauge recording using nearby stream gauges. As the re-constructed gauge recording occurred automatically, no loss of service was experienced by our NowCast client.

Figure 2A: Freeman’s Reach gauged hydrograph which failed after the first peak of the March 2022 event.

Figure 2B: Freeman’s Reach re-constructed gauged hydrograph enabling the second peak of the March 2022 event to be modelled in our NowCast flood extent product.

Figure 2C: NowCast result during the second peak of the March 2022 flood event.

Figure 3: FloodMapp PostCast inundation (leveraging reconstructed stream gauge data) validated using Sentinel-1 satellite imagery

Figure 4: FloodMapp NowCast inundation at Freeman's Reach (leveraging reconstructed stream gauge data) validated using traffic camera

23 views0 comments
bottom of page