Research Projects

The DARE team have a proven track record of leveraging the benefits of data science methodologies applied to natural resource and environmental challenges, across the domains of water, minerals and biodiversity. Learn about some of our recent projects below.

Where is All the Water?

Joshua Simmons, John Close, Subhas Mukhopadhyay, Willem Vervoort

In this multidisciplinary research project, DARE developed a probabilistic modelling framework, based on the application of Bayesian inference techniques, to explain and quantify unaccounted differences in the water balance for major rivers up to catchment scales, in the Murray Darling Basin.

The project helps NSW government agencies to improve aspects of their water management, by overcoming gaps and discrepancies in the data of water assets and provides a research platform for integrating different types of sensors and the data analytics used to aid modelling, predictions and decision making.

A school of fish in the ocean with Minderoo Foundation logo in the foreground

Global Fishing Index

Past DARE members:
Roman Marchant, Vincent Chin, Sally Cripps

The Global Fishing Index is a comprehensive report on the state of marine fisheries around the world that assesses the governance and stability of fisheries in 142 coastal states.

DARE provided a comprehensive and thorough comparison of catch-only stock assessment methods using three different simulation frameworks (FLR, DLMTool, and Rpath). By using diverse simulation frameworks, the team aimed to better quantify and characterise the accuracy of each model as a function of stock characteristics, such as fish life histories.

Subterranean Fauna

Maria Lopes, Mark Lindsay, Ed Cripps, Mark Jessell

There is a lack of information available about the Western Australia subterranean fauna. This lack of information prevents the environmental protection authorities from making informed decisions. Subterranean fauna includes aerobic species living above the underground water table (Troglofauna) and aquatic species living within the groundwater (Stygofauna), often tens of metres below ground. They are usually associated with the physical and chemical properties of the groundwater, and some geological and climatic features.

In this project, we are analysing the subterranean fauna’s relationship with the groundwater’s physical and chemical properties, along with geological features and climatic and geophysical datasets to understand their habitat’s requirements.

Detecting streamflow trends

Joshua Simmons, Katie Silversides,
Rajitha Athukorala, Willem Vervoort

WaterNSW has an important statutory role to protect and enhance the quality and quantity of water in declared catchment areas. As such, they aim to more accurately calculate the surface water losses experienced in mining impacted catchments in the Metropolitan Special Areas.

DARE worked with the WaterNSW team to apply Bayesian Generalised Additive modelling approaches (among other probabilistic methods) to assess and quantify streamflow trends to evaluate mining impacts on undermined catchments in these areas.

A close-up image of wheat in a field, with CSIRO Data61 logo in foreground.



Linduni Rodrigo, Matt Cleary
(and Sally Cripps, Nandini Ramesh)

Climate change is anticipated to have an adverse effect on yield and quality of wheat production in Australia, where extreme climate events can negatively impact farm profitability and cause significant challenges to farmers.

However, modelling crop production using climate variables is challenging due to the non-linear relationships between these quantities. Thus, a data-driven method is necessary to understand how these climate variables are connected to each other and how they collectively influence crop production.

This project utilises graphical models to model and understand the dependencies among key climate indices and the production of wheat in Australia. This could potentially assist in identifying the most suitable locations for growing wheat in Australia and how current wheat-growing locations would change with climate change in order to support adaptation and planning efforts.

Causal Inference with Bayesian Networks

Emma Nguyen (and Sally Cripps)

Inferring causal structures from observational and experimental data is fundamental to developing impactful interventions in many domains such as medicine, resource management, policy, and business. Structural Causal Models (SCMs) provide a comprehensive and intuitive framework to uncover causal dependencies among interested covariates, through the use of probabilistic graphical models, that brings together graphical models, structural equations, interventional, and counterfactual logic.

This project explores the utilization of Bayesian Networks (BNs), one of the most widely used SCMs, for identifying causal relationships, particularly in the absence of interventional data. We aim to bridge the methodology gaps for causal inference by leveraging BN structure learning and established approaches – such as potential outcomes framework and score matching – to explore the causal associations solely from observational data, with application to social science and resource management.

This project is in collaboration with experts from DARE industry partner, McKinsey & Company, who are providing a valuable opportunity for PhD candidates to gain rich practical experience while undertaking important research for the organisation.

An image of a cyclone in the ocean.

Cyclone trajectory & intensity prediction with uncertainty quantification

Arpit Kapoor, Lucy Marshall, Rohitash Chandra

Cyclone track forecasting is a critical climate science problem involving time-series prediction of cyclone location and intensity. Machine learning methods have shown much promise in this domain, especially deep learning methods such as recurrent neural networks (RNNs).

However, these methods generally make single-point predictions with little focus on uncertainty quantification. Although Markov Chain Monte Carlo (MCMC) methods have often been used for quantifying uncertainty in neural network predictions, these methods are computationally expensive. Variational Inference (VI) is an alternative to MCMC sampling that approximates the posterior distribution of parameters by minimizing a KL-divergence loss between the estimate and the true posterior.

In this project, we present variational RNNs for cyclone track and intensity prediction in four different regions across the globe. We utilise simple RNNs and long short-term memory (LSTM) RNNs and use the energy score (ES) to evaluate multivariate probabilistic predictions.

Particle based methods for generative modelling and Bayesian inference

Sahani Pathiraja

In many machine learning and Bayesian inference tasks, one is faced with needing to approximate complicated distributions and efficiently generate samples from them. This is a particularly challenging task when considered in the context of non-linear high dimensional dynamical systems, such as in weather prediction.

This research focuses on developing new methods for sampling and uncertainty quantification in such cases, as well as in understanding their theoretical properties. These methods are all primarily known as interacting particle methods, in that they involve a set of particles which are evolved in time according to some pre-defined set of equations.

We are interested in rigorously determining the performance characteristics of such interacting particle methods (e.g. control based particle filters such as the feedback particle filter, particle-based variational inference methods). Additionally, we are interested in developing new methods inspired by variational auto encoders and diffusion models to improve the way machine learning is used to extract information from medical images.

Uncertainty in Data-Driven Models

Megan Nguyen, Minh-Ngoc Tran, Rohit Chandra

Data-driven models can become highly complex due to the presence of confounding and mediating variables. In such systems, there are typically three types of variables:

  1. Input Variables: Characteristics or properties that are often used to initiate a process
  2. Control Variables: Parameters or actions that humans or automated systems introduce during the process, used to influence the system to achieve desired outcomes
  3. Output Variables: Represent the results or products generated

The complexity arises from the intricate relationships between these variables. Within each phase of a process, causal relationships emerge between the state variables (input and control variables) and the process variables (output variables). This interplay can make predicting outcomes a challenging task. However, evaluating these models based solely on their predictive performance is often inadequate. This is because the goal is not just to predict outcomes but to understand how control variables can be adjusted to optimise results in future operations.

In our approach, we implement a Gaussian Variational Bayes framework, which not only helps quantify uncertainty in the model’s predictions but also in the error term, allowing for a more robust and nuanced understanding of the system and its behavior.

This project is in collaboration with experts from DARE industry partner, McKinsey & Company, who are providing a valuable opportunity for PhD candidates to gain rich practical experience while undertaking important research for the organisation.

Microplastics under a magnifying glass

‘Stat on Pixels’: Automated

Microplastic Counting

Rajitha Athukorala

Addressing microplastic pollution (plastic debris <5mm in size) in marine areas is an emerging priority for government and natural resource managers.

Studies such as the NSW state-wide marine debris threat and risk assessment underline the significance of this emerging issue for the NSW marine estate, and the threat microplastics present to social and environmental assets.

One of the biggest challenges for effectively managing microplastic pollution is a lack of data on the distribution, abundance, and sources of microplastics within NSW. The environmental microplastic assessments used to date are expensive, labour intensive, and lack standardisation in sampling and analysis.

To improve large-scale microplastic monitoring, DARE’s Rajitha Athukorala partnered with scientists from the Water, Wetlands, & Coastal Science (WWCS) team, from the Department of Planning and Environment. Under the supervision of Dr Shivanesh Rao and Samantha Lynch, Rajitha created an automated counting method for microplastics.

The WWCS team use Nile Red dye to stain microplastics, which glow under specific conditions, making them easier to identify and count. The dye is one tool being used to generate a state-wide baseline data set – measuring the microplastic density in 120 NSW estuaries.

An image of the Murray River with the WaterSENSE logo in the foreground



2021-2024 (Ongoing)
Willem Vervoort (and Richard Scalzo)

Remote sensing time series holds great promise for data-driven management of water resources at catchment scale. This research stream develops new uncertainty-enabled water accounting methodologies, fusing data from the microwave-band Sentinel satellite with in-situ measurements. Techniques used include probabilistic modelling with physical components, Gaussian process mixtures, and deep learning.

WaterSENSE will provide water-availability and mapping services for any place in the world at different time and space resolutions, based on integrated Copernicus data, hydrological models; and local data.

Learning with Noisy Labels

2020-2024 (Ongoing)
Tongliang Liu

In machine learning, labels are the foundational “truths” on which algorithms are trained. However, a significant number of real-world datasets contain noisy labels. When algorithms affected by label noise are deployed in critical environments – be it medical diagnostics, financial predictions, or safety-critical systems – the consequences can be not only costly but also life-altering or even life-threatening. Recognising and addressing label noise is, therefore, not just a technical necessity but a matter of utmost repsonsbility.

Our research is centered on understanding and identifying label errors, ensuring that machine learning models are built upon a foundation of accuracy and trustworthiness.

High Resolution NSW Aridity Index

Rajitha Athukorala

The aridity index, also known as the Budyko radiative index of dryness, is a dimensionless parameter that represents the long-term balance between net radiation and precipitation.

Dr Neda Sharifi Soltani and Dr Zacchary Larkin from the Estuaries and Catchment team in the Department of Planning and Environment created a high resolution (30m) aridity index dataset for NSW, based on the method proposed by Nyman et al., 2014.

Rajitha Athukorala contributed to solving the computational complexity of generating the dataset (which was not feasible with ESRI ArcGIS or QGIS) at very high resolution, using python geospatial programs and removing mathematical inconsistencies which were leading to computational errors.

This dataset serves as a valuable tool for understanding and managing water resources, assessing environmental conditions, and informing decision-making in a wide range of applications related to water management, land use, and climate change adaptation.

A water pipe coming out of the ground

Application of model selection to

water balance models and inputs

Katie Silversides, Willem Vervoort

Groundwater is a vital and dependable resource. However, the pressure on groundwater due to increased extraction is increasing, especially during drought.

A key problem is that groundwater is mostly unobservable, except through highly localised groundwater level points (groundwater wells).

Models are therefore essential tools to develop different sustainable water management scenarios and can be seen as integrators of knowledge and data. However, models are complex, and multiple conceptual model configurations can potentially fit the data equally well. In other words, different interpretations of how we visualise the underground soil, rock and water holding layers can all potentially explain the variation in observed groundwater levels.

This project focuses on using a Bayesian model selection framework to identify the most likely conceptual model(s) with quantified uncertainties. This approach takes advantage of the fact that multiple models can fit the data with different probabilities and will compare different combinations of inputs and model structures.

Quantum Computing

Minh-Ngoc Tran
(and Anna Lopatnikova, Scott Sisson)

Quantum computing has emerged as the next computing technology paradigm, which promises to transform many critical fields. Because quantum computers function fundamentally differently from classical computers, the emergence of quantum computing technology will lead to a new evolutionary branch of statistical and data analytics methodologies.

Our goal is to develop a suite of quantum statistical methods for the use in Bayesian machine learning, as well as enable the statistician and data scientist communities to collaborate with quantum algorithm designers.

Spatio-temporal species

distribution modelling

Sam Mason, David Warton

A fundamental problem in ecology is understanding how species relate to their environment. In particular, we wish to understand how species distribution is constrained by climate in order to understand how species respond to climate change.

We can study how species distributions are changing as the climate changes using long-term datasets using a spatio-temporal model.

In this project, we will develop tools to facilitate spatio-temporal modelling of species distribution data to understand the current and potential future effects of climate change on species distribution.

Bush fire affected forest

Post-Fire Camera Trap Image Data Analysis

Aaron Greenville, Travis Stenborg (and Elise Verhoeven)

A total of 24 million hectares burnt across south-east Australia during the 2019/20 mega-fires (Boer et al. 2020), leading to concern that some threatened fauna may inch closer to extinction. There have been few large on-ground studies of animals in northern NSW since the megafires and thus we lack knowledge of how species and communities are recovering. The aims of this study were to:

  1. Assess the response of individual species to different levels of fire severity (high, low, unburnt) and how this is varies across the landscape
  2. Determine if fire severity affects predator activity patterns
  3. Investigate changes in species assemblages with respect to fire severity

This was achieved by analysing approximately 396,000 images from 135 camera traps set up by the Conservation and Restoration Science Branch of NSW DPE. These were deployed according to a strict experimental design across three regions in north-eastern NSW following the 2019-20 fires. AI model, Wildlife Insights, was used to assist in species identification in the camera trap images, with 58 species identified across the 135 locations, including threatened species, such as the long-nosed potoroo.

Probabilistic Geophysical Inversions

Mark Jessell, Mark Lindsay
(and Richard Scalzo)

Inference of subsurface geological structure is a key part of decision-making in a range of contexts. While both geophysical imaging and geological modelling are mature areas, uncertainty-quantified inference of geological parameters or histories from geophysics is still uncommon, and poses interesting methodological challenges.

This research focuses on parametric representations of geology, including “implicit” and “kinematic” models that embody simplified descriptions of geological processes, and how to embed them within Bayesian statistical models for uncertainty-quantified inference.

Llara Landscape Rehydration

2021-2024 (Ongoing)
Willem Vervoort

In collaboration with the Sydney Institute of Agriculture, the project team aim to strengthen the knowledge around landscape rehydration and resilience with their research at Llara, Narrabri.

The team are working with the regenerative agriculture network in northwest NSW, including landholders delivering landscape rehydration research, by capturing and analysing hyperspectral drone imagery, soil and pasture sampling, gauging soil moisture, and more.

In 2023, the project received additional funding to support the project for another four years.

A rock formation with red and brown colors.

Blockworlds 0.1.0

Mark Lindsay, Mark Jessell, Ed Cripps
(and Richard Scalzo, Guillaume Pirot, Jeremie Giraud, Sally Cripps)

Geological modeling of subsurface structures is critical to decision-making across numerous application areas, including mining, groundwater, resource exploration, natural hazard assessment, and engineering, yet is also subject to considerable uncertainty.

Blockworlds aids decision-making informed by models, and is a novel step in the increasingly active area of research in geology and geophysics.

A drone flying over a field of dry grass

Quantifying Biomass Production & Quality

January – February 2023
Willem Vervoort, Rajitha Athukorala

This project was undertaken with two undergraduate Denison scholars. The University of Sydney Science Summer Research Program provides students an opportunity to engage in a formal supervised research project over the summer break period.

Capturing feed production at large scales is complex due to spatial and temporal variation. Drones capture hyperspectral imagery across a range of scales, but conversion of images to production is still underdeveloped. This project developed a pipeline for image processing.

Interested in partnering with DARE for a Research Project? We’d love to hear from you!