Advertisement

Open Access

Peer-reviewed

Research Article

Phylodynamic Inference and Model Assessment with Approximate Bayesian Computation: Influenza as a Case Study

  • Oliver Ratmann ,

    * E-mail: oliver.ratmann@imperial.ac.uk

    Affiliations: Department of Biology, Duke University, Durham, North Carolina, United States of America, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom

  • Gé Donker,

    Affiliation: NIVEL, Netherlands Institute for Health Services Research, Utrecht, The Netherlands

  • Adam Meijer,

    Affiliation: RIVM, National Institute for Public Health and the Environment, Centre for Infectious Disease Control, Bilthoven, The Netherlands

  • Christophe Fraser,

    Affiliation: Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom

  • Katia Koelle

    Affiliations: Department of Biology, Duke University, Durham, North Carolina, United States of America, Fogarty International Center, National Institutes of Health, Bethesda, Maryland, United States of America

Phylodynamic Inference and Model Assessment with Approximate Bayesian Computation: Influenza as a Case Study

  • Oliver Ratmann, 
  • Gé Donker, 
  • Adam Meijer, 
  • Christophe Fraser, 
  • Katia Koelle
spacer
x
  • Published: December 27, 2012
  • DOI: 10.1371/journal.pcbi.1002835
  • Article
  • Authors
  • Metrics
  • Comments
  • Related Content

Abstract

A key priority in infectious disease research is to understand the ecological and evolutionary drivers of viral diseases from data on disease incidence as well as viral genetic and antigenic variation. We propose using a simulation-based, Bayesian method known as Approximate Bayesian Computation (ABC) to fit and assess phylodynamic models that simulate pathogen evolution and ecology against summaries of these data. We illustrate the versatility of the method by analyzing two spatial models describing the phylodynamics of interpandemic human influenza virus subtype A(H3N2). The first model captures antigenic drift phenomenologically with continuously waning immunity, and the second epochal evolution model describes the replacement of major, relatively long-lived antigenic clusters. Combining features of long-term surveillance data from the Netherlands with features of influenza A (H3N2) hemagglutinin gene sequences sampled in northern Europe, key phylodynamic parameters can be estimated with ABC. Goodness-of-fit analyses reveal that the irregularity in interannual incidence and H3N2's ladder-like hemagglutinin phylogeny are quantitatively only reproduced under the epochal evolution model within a spatial context. However, the concomitant incidence dynamics result in a very large reproductive number and are not consistent with empirical estimates of H3N2's population level attack rate. These results demonstrate that the interactions between the evolutionary and ecological processes impose multiple quantitative constraints on the phylodynamic trajectories of influenza A(H3N2), so that sequence and surveillance data can be used synergistically. ABC, one of several data synthesis approaches, can easily interface a broad class of phylodynamic models with various types of data but requires careful calibration of the summaries and tolerance parameters.

Author Summary

The infectious disease dynamics of many viral pathogens like influenza, norovirus and coronavirus are inextricably tied to their evolution. This interaction between evolutionary and ecological processes complicates our ability to understand the infectious disease behavior of rapidly evolving pathogens. Most statistical methods for the analysis of these “phylodynamics” require that the likelihood of the data can be explicitly calculated. Currently, this is not possible for many phylodynamic models, so that questions on the interaction between viral variants cannot be well-addressed within this framework. Simulation-based statistical methods circumvent likelihood calculations. Considering interpandemic human influenza A virus subtype H3N2, we here illustrate the effectiveness of these methods to fit and assess complex phylodynamic models against both sequence and surveillance data. We find that combining molecular genetic and epidemiological data is key to estimate phylodynamic parameters reliably. Moreover, the information in the available data taken together is enough to expose quantitative model inconsistencies. Methods such as ABC which can combine sequence and surveillance data appear to be well-suited to fit and assess mechanistic hypotheses on the phylodynamics of RNA viruses.

Citation: Ratmann O, Donker G, Meijer A, Fraser C, Koelle K (2012) Phylodynamic Inference and Model Assessment with Approximate Bayesian Computation: Influenza as a Case Study. PLoS Comput Biol 8(12): e1002835. doi:10.1371/journal.pcbi.1002835

Editor: Sergei L. Kosakovsky Pond, University of California San Diego, United States of America

Received: March 19, 2012; Accepted: October 19, 2012; Published: December 27, 2012

Copyright: © 2012 Ratmann et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: We gratefully accept financial support from the Wellcome Trust (www.wellcome.ac.uk), grant WR092311MF, the National Science Foundation (www.nsf.gov), grant NSF-EF-08-27416, and through the RAPIDD program of the Science and Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health (www.fic.nih.gov). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Many infectious pathogens, most notably RNA viruses, evolve on the same time scale as their ecological dynamics [1]. One of the perhaps best documented examples are human influenza A viruses, which cause substantial morbidity and mortality as they escape host immunity predominantly through the evolution of their surface antigens [2]. The resulting, dynamical interaction between the ecological and evolutionary processess can be better understood through the formulation and simulation of so-called “phylodynamic” mathematical models, e.g. [3]–[8]. However, while data on disease incidence as well as viral genetic and antigenic variation are increasing for many viruses, e.g. [9]–[13], fitting and assessing phylodynamic models to these data is still not commonly done.

Historically, epidemiological time series data have been pervasively used to analyze hypotheses of host-pathogen interactions at the population level [14]–[17]. However, time series data capture the underlying evolutionary processes of pathogens only very indirectly. For flu, this has limited the type of infectious disease models that can be statistically interfaced with time series data, and the number of epidemiological parameters that can be simultaneously estimated [18], [19]. Consequently, the disease behavior of rapidly evolving pathogens is increasingly studied under additional, complementary data sets [1], most typically in ways that attempt to qualitatively reproduce prominent disease attributes [3]–[8].

More recently, coalescent-based statistical methods have been used to elucidate the disease dynamics of RNA viruses from molecular genetic data alone [20]. These methods have been particularly useful to reconstruct epidemiological transmission histories, identifying when and where transmission occurred and how viral populations change over time. For example, coalescent-based analyses have highlighted the importance of the tropics in the complex circulation dynamics of human influenza A (H3N2) virus (in short: H3N2) [9], [21], [22]. However, most coalescent methods estimate past population dynamics within a class of flexible demographic functions including exponential and logistic growth as well as the nonparametric Bayesian skyride [23], [24]; but see also [25]. These demographic functions do not explicitly describe the non-linear population dynamics of RNA viruses. Thus, assessing which ecological interactions underlie observed patterns of sequence diversity, and estimating the respective strength of these interactions, is difficult within this framework.

Because of these limitations, we adopt a different statistical approach known as Approximate Bayesian Computation (ABC) to infer the phylodynamics of RNA viruses. ABC allows mechanistic phylodynamic models to be simultaneously fitted against both sequence and surveillance data. This method circumvents explicit likelihood calculations by simulating instead from the stochastic model that defines the likelihood [26]. Recent extensions of ABC allow for model assessment to be carried out at no further computational cost [27]. We further suggest incorporating variable selection procedures to quantify if and to what extent the data provide support for the inclusion of specific model components [28].

To demonstrate the utility of our approach, we consider the phylodynamics of interpandemic H3N2. We obtained weekly reports of H3N2 incidence in the Netherlands from 1994–2009 by combining influenza-like-illness (ILI) surveillance data with detailed records of associated, laboratory-confirmed cases of flu by type and subtype [29], [30], and similarly for France and the USA; see Figure 1 and the supplementary online material (Text S1). In addition, we reconstructed the ladder-like phylogeny of H3N2's haemagglutinin gene (HA) from dated European sequences collected in 1968–2009 (see Figure 1 and Text S1). To represent H3N2's global phylodynamics, we focus on a class of spatially structured phylodynamic compartmental models that formalize probabilistically how evolving, antigenic variants interact epidemiologically. These antigenic variants might correspond to the major antigenic clusters that are distinguishable in H3N2 antigenic maps [31], but can in principle also represent a different phenotypic resolution. The evolutionary dynamics of viral genotypes are separately formulated for each antigenic phenotype because genetic distances do not necessarily easily translate into phenotypic relationships [5]. Spatial substructure has been incorporated in several models of H3N2 phylodynamics to reflect the global circulation of the virus [4], [8], [32]. We adopt here a simple source-sink framework, where the sink is thought of as the Netherlands into which viral genetic diversity and antigenic strains are imported on a seasonal scale from a source population where the virus persists [9], [33]. We fit and assess two distinct models to the combined features of sequence and incidence data described in Figure 1 and Table 1. The first model captures H3N2's antigenic drift phenomenologically through gradual loss of immunity, and the second model describes the antigenic evolution of the virus explicitly with particular assumptions on the tempo of antigenic change.

spacer
Download:
  • PPT
    PowerPoint slide
  • PNG
    larger image ()
  • TIFF
    original image ()
Figure 1. Features of H3N2 sequence and incidence data.

(A) Weekly ILI time series from the Netherlands, and estimated time series of influenza A(H3N2) from weekly virological data. Type and subtype specific time series were estimated under an additive Negative Binomial regression model; see Text S1. (B) Reconstructed HA phylogeny from 776 European sequences with known times of isolation. The phylogeny was inferred with the BEAST program under a relaxed Exponential clock; see Text S1. (C) H3N2 seasonal attack rates (rATT), calculated from estimated H3N2 case report times series in the Netherlands in 1994–2009 (blue), and the USA (cyan) as well as France in 1997–2008 (black). (D) Ratio of consecutive case report attack rates on the log scale. (E) Autocorrelation of case report peaks. (F) Histogram of the duration of seasonal epidemics at half their peak size. (G) Number of estimated nucleotide substitutions of dated HA sequences from the root A/Bilthoven/16190/68 as in Smith et al. [31]. Nucleotide substitutions were estimated with BEAST under an Exponential clock (red) and Lognormal clock model (violet). (H) Histograms of pairwise nucleotide diversity among sequences collected in the same season. (I) Time series of the number of phylogenetic lineages circulating within the same month. (J) Time series of the time to the most recent common ancestor of phylogenetic lineages circulating within the same month. Colors from H to J are as in G.

doi:10.1371/journal.pcbi.1002835.g001

spacer
Download:
  • PPT
    PowerPoint slide
  • PNG
    larger image ()
  • TIFF
    original image ()
Table 1. Basic phylodynamic summaries of H3N2 surveillance data and phylogenies, and calibrated weighting schemes.

doi:10.1371/journal.pcbi.1002835.t001

Methods

Approximate Bayesian Computation

To perform phylodynamic inference and goodness-of-fit analyses for complex phylodynamic models, we adopt a simulation-based approach that has become known as Approximate Bayesian Computation (ABC) [26]. Our first goal is to estimate the posterior densityspacer (1)of epidemiological and evolutionary model parameters spacer under approximations to the likelihood spacer of observed population incidence and phylogenetic data spacer . The prior density spacer can be used to incorporate existing information or limit the range of plausible values of model parameters. Our second goal is to assess fitted phylodynamic models based on a recent extension of ABC [27].

ABC methods circumvent computations of the likelihood spacer by comparing the observed data spacer to simulated data spacer in terms of many, lower-dimensional summary statistics spacer , spacer , spacer such as those in Figure 1. Using a distance function spacer that compares summaries, each simulation spacer is weighted according to the magnitude of the summary error spacer under a weighting scheme spacer , and this value is used in place of the likelihood term in Monte Carlo algorithms. In essence, ABC is a particular auxiliary variable Monte Carlo method, where the spacer summary errors take on the role of auxiliary variables. Integrating these errors out, the ABC likelihood approximation spacer adopted here isspacer (2)where the weighting scheme is typically the Indicatorspacer (3)with tolerance parameter spacer or the Exponentialspacer (4)with spacer . Intuitively, the summary errors indicate how well a parameterized model reproduces the observed data. Once Monte Carlo algorithms such as the Markov Chain Monte Carlo (MCMC) sampler proposed by Marjoram et al. [34] have converged, the magnitude of the summary errors can be used to diagnose goodness-of-fit with respect to each of the summaries spacer . To use this detailed information on each summary, we prefer using (2) to the Mahanalobis approximation (see [26]). Although uncommon, we typically use the log ratio spacer so that the errors spacer can be uniformly interpreted as fold-deviations. Parameter inference using ABC is approximate in that the ABC target density spacer approaches the posterior density (1) as spacer tends to zero if the summaries are sufficient for spacer [26]. We use a Monte Carlo algorithm that is very similar to the MCMC sampler in Figure 2. A full specification of the algorithm is given in Text S1.

spacer
Download:
  • PPT
    PowerPoint slide
  • PNG
    larger image ()
  • TIFF
    original image ()
Figure 2. Overview of simulation-based phylodynamic inference and model assessment.

Phylodynamic hypotheses are formulated into evolving, dynamical systems models. We used a two-tier model formulation whose genetic component is tied to its ecological component through the flows through the prevalence class. Existing knowledge on model parameters is incorporated through the prior

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.