Abstract:
Inverse parameter estimation of process-based
models is a long-standing problem in many scientific disciplines.
A key question for inverse parameter estimation
is how to define the metric that quantifies how well model
predictions fit to the data. This metric can be expressed by
general cost or objective functions, but statistical inversion
methods require a particular metric, the probability of observing
the data given the model parameters, known as the
likelihood.
For technical and computational reasons, likelihoods for
process-based stochastic models are usually based on general
assumptions about variability in the observed data, and not
on the stochasticity generated by the model. Only in recent
years have new methods become available that allow the generation
of likelihoods directly from stochastic simulations.
Previous applications of these approximate Bayesian methods
have concentrated on relatively simple models. Here, we
report on the application of a simulation-based likelihood
approximation for FORMIND, a parameter-rich individualbased
model of tropical forest dynamics.
We show that approximate Bayesian inference, based on
a parametric likelihood approximation placed in a conventional
Markov chain Monte Carlo (MCMC) sampler, performs
well in retrieving known parameter values from virtual
inventory data generated by the forest model. We analyze
the results of the parameter estimation, examine its sensitivity
to the choice and aggregation of model outputs and
observed data (summary statistics), and demonstrate the application
of this method by fitting the FORMIND model to
field data from an Ecuadorian tropical forest. Finally, we discuss
how this approach differs from approximate Bayesian
computation (ABC), another method commonly used to generate
simulation-based likelihood approximations.
Our results demonstrate that simulation-based inference,
which offers considerable conceptual advantages over more
traditional methods for inverse parameter estimation, can be
successfully applied to process-based models of high complexity.
The methodology is particularly suitable for heterogeneous
and complex data structures and can easily be adjusted
to other model types, including most stochastic population
and individual-based models. Our study therefore provides
a blueprint for a fairly general approach to parameter
estimation of stochastic process-based models.