A journey into the most accurate earthquake forecast model (so far)

13 min readOct 17, 2019

A basic guide on the ETAS (Epidemic Type Aftershock Sequence) model, which includes essential theory, applications and links to useful codes

Earthquakes tend to cluster in space and time. This well-known property of seismicity, which represents the strongest departure from a purely random process, is successfully reproduced by stochastic point process models based on a few empirical scaling laws such as the Gutenberg and Richter [1], the Omori [2] and the Omori-Utsu [3] relationships [4,5,6,7].
Among them, the ETAS (Epidemic Type Aftershock Sequence) model [4,5] represents the most accurate short-term earthquake forecast model available according to prospective tests carried out by the Collaboratory for the Study of Earthquake Predictability (CSEP) [see the SRL dedicated volume]. Figure 1 shows an example of the ETAS forecast for Italy:

Figure 1 — a) Historical seismicity in Italy (reprinted from [24]); b) Map of the expected number of events with ML ≥ 5.0 in the forecast period, for each cell (reprinted from [19]). The ETAS forecast reproduces very well the spatial clustering of the largest historical earthquakes in Italy.

The ETAS world

The ETAS model belongs to the class of branching point process models (also known as Hawkes or self-exciting point processes). The occurrence of an earthquake excites the ongoing seismic activity, increasing the probability of occurrence of new earthquakes in the near future; intuitively, the spatiotemporal extent of such excitation increases with the magnitude of the exciting event. This behavior, which represents the endogenous component of the seismic process, is responsible for the temporal clustering of events. In the context of Hawkes processes, such correlated earthquakes are defined as descendants. When picturing them, you should imagine to link each descendant to its closest ancestor (closest in the space-time domain): going back in time to the older generations, you will reach, at some point, the first event of the cluster which, by definition, is an independent earthquake (if that was not the case, it should have had a link to an ancestor too):

Figure 2 — On the left: schematic representation of an earthquake family tree (cluster). Events belonging to the same cluster are dependent on each other. On the right: schematic representation of the Simpsons family tree!

As for any branching model, for the process to be sustained, a background rate is necessary, which in many applications is fed by an external source (you can think of it as the fuel of the process engine). These background events, defined as immigrants in the context of Hawkes processes, are independent events which represent the exogenous component of the spatiotemporal process (see Figure 3). When dealing with earthquakes, the background seismicity is explained as the result of a constant tectonic loading of plate motions and is commonly described as a Poisson process which is stationary in time.

Figure 3 — Representation of a Hawkes process as group of family trees. ■ = immigrants, ● = descendants, ✕ = generated point process. Reprinted from [25].

All this implies that, in the ETAS model, seismicity is the result of the sum of two components: the background seismicity (independent events) and the triggered seismicity (clustered events). Mathematically, this leads to the following expression for the conditional seismic intensity, i.e. the expected rate of earthquakes at time t given information about the past of the process (temporal process only) [4,5]:

where:

Ht is the seismic history up to time t;
𝜇 represents the background seismicity rate (i.e., the rate of independent events), which is assumed to be constant;
f(M) is the function describing the magnitude-frequency distribution, typically represented by the Gutenberg-Richter exponential law:

g(t,M) is the defined as the excitation function in the context of Hawkes processes; here, it represents the rate of triggered events, which results from the combination of the modified Omori-Utsu law [3] and the Utsu aftershock productivity law [8]. The reason for the ∝ symbol is that you need to opportunely normalize the Omori-Utsu law for it to be a PDF, when you plug it into the equation for the conditional intensity estimation.

In the equations above, Mc is the magnitude threshold; K, 𝛼, c, p are constants (see below for a more detailed description about these parameters). If the earthquake process is modeled as a spatiotemporal process, the conditional intensity is then defined as [5]:

In this case, g(t,x̄,M) - the rate of triggered events - includes also a kernel describing the spatial distribution of triggered events. That means you need to estimate additional parameters, who depend on the specific distribution you use to model the location distribution of triggered events.

You should have noticed that f(M) multiplies both 𝜇 and g(t,M) (or g(t,x̄,M) in the spatiotemporal case). The separability of the magnitude distribution — f(M) is not influenced by past events - points to an important assumption underlying the ETAS model: the magnitude-independence assumption. This implies that all the magnitudes are randomly drawn from the same magnitude-frequency distribution, no matter whether they have been triggered or not.

Important: in the ETAS world, the discrimination is only made between dependent (triggered) and independent (background) events, not among foreshocks, mainshocks and aftershocks. This implies that all earthquakes are treated equally, i.e. each earthquake is the realization of the same physical process. While it could sound as a simplistic assumption, this is actually correct from an evidence-based standpoint: to date, there are no physical characteristics distinguishing foreshocks, mainshocks and aftershocks and any discrimination of this type is only an a posteriori classification.

ETAS parameters estimation: what a hassle!

Before estimating them, let’s have a look at what the ETAS parameters represent:

K and 𝛼 are constants of the Utsu aftershock productivity law:

where g(M) represents the average number of events directly triggered by an earthquake of magnitude M. Intuitively, the higher the magnitude M, the larger the number of triggered events. In this equation, K quantifies the capability of triggering new earthquakes, no matter what the magnitude of the triggering event is. On the contrary, 𝛼 accounts for the magnitude role in the triggering process: a large 𝛼 implies that large earthquakes have a much higher triggering capability than the smaller ones, whereas a small 𝛼 gives a relatively higher weight to the triggering capability of small earthquakes. See Figure 4 for more details.

Figure 4 — Aftershock distribution modeled for 𝛼 large and 𝛼 small (the circle sizes represents the triggering capability of an earthquake): when 𝛼 is large, the sequence is dominated by large events, whereas when 𝛼 is small, smaller events play a role too. Reprinted from [11].

2. c and p are constants of the modified Omori-Utsu law:

g(t) represents the rate of triggered events as a function of time. This rate decays with time t after the mother event according to a power law. The parameter c is a small constant (generally a fraction of a day) added to the time difference t, which is usually interpreted as a delay between the end of the triggering event rupture and the start of triggered activity. The parameter p controls how fast the decay is.

𝜇, K, 𝛼, c, p in the ETAS model depend on the seismicity characteristics of the region under consideration. Therefore, the ETAS model needs to be parameterized for the earthquake catalog used for the analysis. The optimal parameters are commonly found by the Maximum Likelihood Estimation (MLE) method, i.e. they correspond to the parameter values that maximize the (log) likelihood function logL(𝜃). In the simple case of a purely temporal earthquake process, logL(𝜃) reduces to:

where 𝜃 is the set of parameters to be estimated and [0,T] is the time interval for which the parameters need to be estimated. Finding the maximum of a function is a typical optimization problem, which has to be solved numerically in most of the cases. In [4], the Davidon-Fletcher-Powell (DFS) method [9,10] is used. Given a set of initial values for the parameters, the DFS algorithm iteratively generates a local approximation of the log-likelihood function by a quadratic function until it converges to its maximum. Since the DFS algorithm does not require exact 2nd derivatives (it only approximates them), it belongs to the class of Quasi-Netwon methods.

Estimating ETAS parameters is a hassle for several reasons, e.g. they can be strongly biased [11], correlated [12], unstable; furthermore, the DFS algorithm depends on the initial values, it could converge to a local maximum instead of the global one and it struggles when dealing with flat regions in the likelihood function. Several alternative approaches have been proposed to relax some of these issues, such as the Expectation Maximization algorithm and the stochastic optimization methods (see [13] and references therein). However, you should always keep in mind the limitations above when interpreting the ETAS parameters in physical terms. In particular, always take into account their intrinsic variability before concluding that the difference in the parameters estimated for the groups of data you are comparing arises from different physical processes.

ETAS as a null model

Once you have estimated the parameters, you can employ ETAS as a reference model for seismicity, i.e. as the null model for testing hypotheses [14]. In practice, this implies that you need to simulate one or more catalogs based on the ETAS model. Each synthetic catalog represents a different realization of ETAS, due to its stochastic nature. Two typical approaches can be used to draw samples from a stochastic model, the inversion method and the thinning approach (see [15] for further details):

Inversion method: mainly consists in sampling randomly from the cumulative density function (if this can be inverted analytically);
Thinning method: it is a type of acceptance-rejection method, which consists of simulating a random variable from a known, simpler, proposal distribution g(x) — which is an envelope of the unknown and more complicated distribution f(x) (i.e., f(x)<g(x)) — and then thinning (or rejecting) the outcomes with a probability 1-p = f(x)/g(x).

Once you have done with simulation, you can run your tests on both the synthetic and the real catalogs, to see whether there are features in the real seismicity that are not observed in the ETAS world. In case of discrepancies, you may want to propose your own way to edit the model with the aim to improve the fit with the observations (under the assumption that your data is totally trustable and not biased!).

ETAS as a diagnostic tool

You can also use ETAS as a benchmark for detecting anomalies in the seismicity for the period and the region you are interested in. A standard procedure for this goal is represented by the residuals analysis [4]. The main idea is to transform the occurrence times t into a new scale such that they follow a homogeneous Poisson process if and only if the model used for the rescaling (the ETAS model in this case) is the correct model. Specifically, each occurrence time is transformed to the integrated conditional intensity function 𝛬(t) (also called compensator):

𝛬(t) represents the expected number of events in the interval [0,t].

It is possible to demonstrate that if N={t₁,t₂,…,tn} is a Poisson process with intensity λ(t), then N* = {Λ(t₁), Λ(t₂), …, Λ(tn)} is a homogeneous Poisson process with intensity unit rate [16]. It follows that, if we have correctly modeled the observed temporal process (i.e, if the ETAS model correctly capture the main features of the analyzed seismicity), the transformed times should behave like a Poisson process with rate 1. Intuitively, the applied transformation is very convenient, as it enables us to apply well-known statistical tests (e.g. the Kolmogorov-Smirnov test and the Runs test or the conditional chi-square test) to check whether the transformed times follow a stationary Poisson process.
A simple way to use the residual analysis as a diagnostic tool for seismicity is to plot the observed number of events as a function of the expected number of events (i.e., the transformed times): if the ETAS model is the correct model, we should get a straight-line graph, for the observed number of events should resemble the expected number of events over time. Any departures from a straight line represents a deviation of data from the ETAS model (Figure 5). A possible reason for that could be a temporal variation in the background seismicity rate.

Figure 5 — a) Plot of the observed number of events as a function of the expected number of events. If the model is correct, we expect a straight-line graph; b) Cumulative number of M ≥ 3 events vs transformed time. The dashed black ellipses and dashed rectangles highlight two anomalies: the first has been interpreted as a relatively low aftershock productivity of the aftershock activity of the 2007 Chuetsu-Oki earthquake, while the latter as the activation of aftershock production following the 2011 M9.0 Tohoku-Oki earthquake. Reprinted from [26].

ETAS for declustering

The most common declustering approaches are deterministic, for they identify aftershock sequences based on user-defined spatiotemporal windows. These a priori constraints, together with the a posteriori classification of foreshocks-mainshocks-aftershocks, increase the degree of subjectivity in the declustering process.

The ETAS model has been used as the foundation for a stochastic declustering algorithm. The term “stochastic” stems from the idea that each event is assigned a probability of being triggered/background. The rationale behind this approach is that, in most cases, earthquakes are never completely independent or triggered, for they are the result of the combination of both the endogenous and exogenous components of seismicity. Within this method, the optimal spatiotemporal windows for triggered events search are found by fitting the ETAS model to the catalog under consideration, with the aim to reproduce the clustering characteristics in the studied area. See [18] for further details.

Useful codes

So now, I guess, you wanna look at the codes. Here is a list which can help you, enjoy!

SASeis2006 by Y. Ogata [17]  (Fortran + R for diagnostic plots) — Temporal ETAS model only. Download: link1 or link2. Optimization based on the Davidon-Fletcher-Powell algorithm. *Note: the files are written in f77, which could cause some issues when compiling them with Fortran95/2003/2008.etas8p by J. Zhuang and Y. Ogata [18] (Fortran) —  Spatiotemporal ETAS model. Download it at this link. Optimization based on the Davidon-Fletcher-Powell algorithm. Includes the stochastic declustering algorithm.ETAS by A. Jalilian and J. Zhuang (R) — Spatiotemporal ETAS model. Download it at this link. Optimization based on the Davidon-Fletcher-Powell algorithm.SEDAv1.0 by A. M. Lombardi [19] (Fortran core codes with Matlab interface) — Spatiotemporal ETAS model. Download it at this link. Optimization based on Simulated Annealing.etas_solve by Kasahara et al. [20] (Fortran)- Temporal ETAS model only. Download: link1 or link2. Optimization based on the steepest descent method.etas_FLP by M. Chiodi and G. Adelfio (R) — Spatiotemporal ETAS model. Download it at this link. The optimization algorithm depends on the choice of initial values.etas by Mizrahi et al. [21] (Python) - Spatiotemporal ETAS model. Download it at this link. Based on Expectation Maximization algorithm.bayesianETAS by G. J. Ross [22] (R) - Temporal ETAS model. Download it at this link. Bayesian Markov chain Monte Carlo.ETAS.inlabru by M. Naylor et al. [23] (R) - Temporal ETAS model. Download it at this link. Bayesian integrated nested Laplace approximation (INLA).ETAS aftershock simulator by K. Felzer [Matlab] — Download it at this link. Some of the codes above also enable you to simulate a catalog based on the ETAS model. But if you are only interested in this task, then this is the program for you. *Note: earthquakes with a magnitude larger than 6.5 are modeled as planar sources; if you need to model all the events as point sources, you can easily achieve that by setting ‘LowerLimitOnPlanarSourceQuakes’ (=minimum magnitude for an earthquake to be simulated as a planar source) to a magnitude greater than ‘UpperLimitOnSimMags’ (=maximum simulated magnitude).

Acknowledgments

I would like to thank Anna Maria Lombardi (INGV) and Ilaria Spassiani (INGV) for their feedbacks!

References

¹ Gutenberg, B. & Richter, C. F., 1944. Frequency of earthquakes in California. Bulletin of the Seismological Society of America 34(4), 185–188.

² Omori, F., 1894. On the after-shocks of earthquakes (Vol. 7). The University.

³ Utsu, T., 1957. Magnitude of earthquakes and occurrence of their aftershocks. Zisin, Ser. 2, 10, 35–45.

⁴ Ogata, Y., 1988. Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical association 83(401), 9–27.

⁵ Ogata, Y., 1998. Space-time point-process models for earthquake occurrences. Annals of the Institute of Statistical Mathematics 50(2), 379–402.

⁶ Reasenberg, P. A., & Jones, L. M., 1989. Earthquake hazard after a mainshock in California. Science 243(4895), 1173–1176. 10.1126/science.243.4895.1173

⁷ Gerstenberger, M. C., Wiemer, S., Jones, L. M., & Reasenberg, P. A., 2005. Real-time forecasts of tomorrow’s earthquakes in California. Nature 435(7040), 328. 10.1038/nature03622

⁸ Utsu, T. (1970). Aftershocks and earthquake statistics (1): Some parameters which characterize an aftershock sequence and their interrelations. Journal of the Faculty of Science, Hokkaido University. Series 7, Geophysics, 3(3), 129–195.

⁹ Davidon, W. C. (1991). Variable metric method for minimization. SIAM Journal on Optimization, 1(1), 1–17.

¹⁰ Fletcher, R., & Powell, M. J. (1963). A rapidly convergent descent method for minimization. The computer journal, 6(2), 163–168.

¹¹ Seif, S., Mignan, A., Zechar, J. D., Werner, M. J., & Wiemer, S. (2017). Estimating ETAS: The effects of truncation, missing data, and model assumptions. Journal of Geophysical Research: Solid Earth, 122(1), 449–469. 10.1002/2016JB012809

¹² Harte, D. S. (2015). Model parameter estimation bias induced by earthquake magnitude cut-off. Geophysical Journal International, 204(2), 1266–1287. 10.1093/gji/ggv524

¹³ Lombardi, A. M. (2015). Estimation of the parameters of ETAS models by Simulated Annealing. Scientific reports, 5, 8417. 10.1038/srep08417

¹⁴ Stallone, A., & Marzocchi, W. (2018). Empirical evaluation of the magnitude-independence assumption. Geophysical Journal International, 216(2), 820–839. 10.1093/gji/ggy459

¹⁵ Zhuang, J. & S. Touati (2015), Stochastic simulation of earthquake catalogs, Community Online Resource for Statistical Seismicity Analysis, 10.5078/corssa-43806322. Available at www.corssa.org

¹⁶ Vere-Jones, D., & Schoenberg, F. P. (2011). Rescaling marked point processes.

¹⁷ Ogata, Y. (2006). Statistical analysis of seismicity-Updated version (SASeis2006). Computer Science Monographs, 33, 1–28.

¹⁸ Zhuang, J., Ogata, Y., & Vere-Jones, D. (2002). Stochastic declustering of space-time earthquake occurrences. Journal of the American Statistical Association, 97(458), 369–380. 10.1198/016214502760046925

¹⁹ Lombardi, A. M. (2017). SEDA: A software package for the Statistical Earthquake Data Analysis. Scientific reports, 7, 44171. 10.1038/srep44171

²⁰ Kasahara, A., Yagi, Y., & Enescu, B. (2016). etas_solve: a robust program to estimate the ETAS model parameters. Seismological Research Letters, 87(5), 1143–1149.

²¹ Mizrahi, L., Nandan, S., & Wiemer, S. (2020). The Effect of Declustering on the Size Distribution of Mainshocks. arXiv preprint arXiv:2012.09053.

²² Ross, G. J. (2021). Bayesian estimation of the ETAS model for earthquake occurrences. Bulletin of the Seismological Society of America, 111(3), 1473–1480.

²³ Naylor, M., Serafini, F., Lindgren, F., &Main I.G. (2023). Bayesian modeling of the temporal evolution of seismicity using the ETAS.inlabru package. Front. Appl. Math. Stat. 9:1126759.

²⁴ Valensise, G., Amato, A., Montone, P., & Pantosti, D. (2003). Earthquakes in Italy: Past, present and future. Episodes, 26(3), 245–249.

²⁵ Laub, P. J., Taimre, T., & Pollett, P. K. (2015). Hawkes processes. arXiv preprint arXiv:1507.02822.

²⁶ Kumazawa, T., & Ogata, Y. (2014). Nonstationary ETAS models for nonstandard earthquakes. The Annals of Applied Statistics, 8(3), 1825–1852. 10.1214/14-AOAS759