Our PhD program

A PhD in Data Science

Applications with mathematical rigour

We are searching for committed and excited PhD students to work on Centre projects starting in 2021. Generous scholarships of up to $40,000 per annum are available to successful applicants. Applicants must have an understanding of the foundations of data science, for example a qualification in mathematics, statistics, computer science or a strong quantitative background such as engineering, econometrics or earth sciences.

DARE PhD candidates will undertake a cohort based learning and professional development program including advanced data science. Candidates may also undertake field work in rural and regional Australia as part of their one-year industry placement. Opportunities for conference presentations, exchange study with the Alan Turing Institute (UK) and employment in industry or government at the end of the program are also available. Stipends support candidatures of three years, with the opportunity to spend some of the time (up to one year) in an industry placement programme, applying their data science skills to support best possible evidence-based management of the nation’s natural resources.

ABOUT THE COHORT-BASED PROGRAM

Every PhD candidate will be required to undertake a compulsory twelve week program in data science within the first 12 months of candidature. The cohort based program will cover 2 core modules:

DATA5710 – Applied Statistics for Complex Data

 

Unit Description

This unit will train students in Bayesian machine learning models for the analysis of complex datasets. The unit will also train students in a small research project, which will allow students to put the concepts, knowledge, and experience into practice. Assessment of the project will be via submission of a journal-style paper, a short newspaper article/press/media release, and wrap-up sessions in which students explain their projects to the whole cohort, and receive feedback, thus ensuring that all students derive some benefit from all the projects. Datasets will be provided by DARE partners to ensure the relevance of the work.

Topics

  • Bayesian linear regression and Gaussian processes regression
  • Neural networks perceptron (SGD and ADAM)
  • GLM (Regression and Classification)
  • Graphical models
  • Mixture models (finite and infinite)
  • Time series and longitudinal models, state space models, random effects
  • Deep learning – advanced neural networks (CNN, RNN)

DATA5711 – Bayesian Computational Statistics

 

Unit Description

In this unit you will learn the mathematical foundations and practical implementation of a variety of estimation algorithms. The unit will first introduce advanced statistical inference, followed by a detailed presentation of a variety of methods for achieving inference. Students who complete this unit will develop critical skills to correctly use advanced statistical machine learning in practice. This unit will not only improve analytical skills but also encourage work in a multidisciplinary team and work on real world problems with datasets provided by the ARC Centre on Data Analytics for Resources and Environment (DARE).

Topics

  • Introduction to Python/R and version control (GIT)
  • Statistical inference and probability theory (Frequentist and Bayesian, Priors)
  • Simulation based methods (IS, MCMC, HMC, SMC, ABC, PT)
  • Bayesian optimisation and variational inference
  • Dimension reduction (Shrinkage, variable selection, feature extraction)
  • Model selection and model averaging
  • Spatial temporal regression
WHAT ARE THE CHALLENGES?

The DARE Centre is developing data science skills that can be translated across each of the three domain areas of minerals, water and biodiversity. Capability will be built through three core data science areas beginning with understanding the priors (or history) and data of the domain, progressing to model construction and concluding with the interpretation and information for better decision making.

Studying the three domain areas jointly is crucial. It highlights that an action taken to maximise some payoff function in one domain area has consequences for the payoff function in another, thereby explicitly encoding the cumulative impact of an action.

Additionally, although different in surface structure, the three domain areas share a need for probabilistic thinking and proper uncertainty quantification for optimal decision making.

Biodiversity

Biodiversity, the diversity of livings things on Earth, underpins and influences almost every product and service we value today and is essential to provide future generations of Australians with a healthy, sustainable economy and environment. Decisions for the management of biodiversity, the ability to prioritise actions and policy, must be informed by complex models based on relatively sparse measurement data where uncertainty quantification is key to decision making.

Water

Water is a fundamental resource, vital to the genesis and sustainability of communities, ecosystems and industry. Understanding the drivers behind water supply and usage and quantifying the joint uncertainty around supply, and demand is critical to many areas of the Australian economy.

The foremost challenge in water resource management today is to make uncertainty quantified predictions in a changing environment for applications such as ecosystem management, flood warning, and water allocation. This is a difficult task when one considers the uncertainty associated with modelling daily rainfall, hydrologic observations, and future climate.

Minerals

Mineral discovery underpins large parts of the Australian economy where mining contributes over $60bn a year to national GDP. However, most of Australia’s current mineral production and exports are sourced from deposits discovered during an exploration high more than two decades ago. The grand data science challenge is to exploit the vast amount of minerals exploration data to discover what mineralogy exists beneath the 80% of Australia where favourable geology lies below regolith or other barren cover.

DARE is building the new skills and capabilities necessary to meet this challenge: in managing the vast amount of available geophysical information, in developing methods that can fuse this data in to uncertainty quantified geological models, and in developing evidence-based approaches to characterisation of mineralisation.

APPLICATIONS

This page summarises possible PhD projects that DARE CIs and Partners are interested in.  If you are interested in pursuing a PhD in any of the listed topics, please e-mail your CV and copy of academic transcript to .

We will forward your information to our CIs to review your eligibility to enrol in a PhD and the chances of obtaining a scholarship.

DARE is no longer providing scholarships, so interested students should be willing to apply for an RTP scholarship or equivalent.