Earthquake Cluster Analysis: K-means approach

Introduction

Published in

Saral Karki

6 min readMay 24, 2019

An earthquake is the shaking on the surface of the Earth, resulting from the sudden release of energy in the Earth’s lithosphere that creates seismic waves. Earthquakes can range in size from those that are so weak that they cannot be felt, to violent ones that can cause serious damage to entire cities. The seismicity or seismic activity of an area refers to the frequency, type and size of earthquakes experienced over a period of time.

Depending on the magnitude scale of an earthquake they can be categorized as:

Source: http://www.geo.mtu.edu/UPSeis/magnitude.html

Nepal is one of the most seismically active regions in the world. The quakes in Nepal are a manifestation of the ongoing convergence between the Indo-Australian and Asian tectonic plates that have progressively built the Himalayas over the last 50 million years.¹

From 1994 to 2016, there have been over 900 significant tremors in Nepal. In the year 2015 alone there were well over 400 significant tremors in Nepal. The tremor in 2015 was mostly aftershocks after the 7.8 Magnitude earthquake with its epicentre at Gorkha struck Nepal. According to the United States Geological Survey(USGS), the April 25th earthquake in Nepal was registered at 7.8 Magnitude.²

Many researchers have carried out research for the study and analysis of earthquake clusters. Additionally, K-means clustering has been proposed to make the partition of earthquake source zones³ ⁴. Pepi Novianti, Dyah Setyorini, Ulfasari Rafflesia in their paper ‘K-Means cluster analysis in earthquake epicentre clustering’ talk in great detail about the use of K-means algorithm for earthquake clustering⁵. Furthermore, Kamat and Kamath have also used K-means algorithm for their paper “Earthquake Cluster Analysis: K-Means Approach’. They have also discussed in great detail about the use of the algorithm.⁶

In this paper, I try to replicate the two papers above mentioned with earthquake dataset from Nepal. The dataset was extracted from the National Seismology Department of Nepal. Additionally, apart from using the K-means algorithm for clustering, the dataset has also been clustered from three different clusters: 1994 to 2004, 2015 and 2016–2019.

The dataset contains 933 data points, however, as a part of the study only earthquakes categorised moderate or stronger are taken into consideration for this clustering study. Therefore, we are looking at 125 earthquakes categorized at least as moderate earthquake from June 25, 1994, to May 11, 2019. The dataset was extracted from the National Seismological Center.

K-Means Cluster Analysis

Cluster analysis is a multivariate method that searches for patterns in a data set by grouping the observations into clusters. The goal of this method is to find an optimal grouping for which the observations or objects within each cluster are similar (homogeneous). However, the clusters are dissimilar to each other. (Novianti et al. 2011, pg. 82)

The data were then clustered based on their geographical location and magnitude.

K-means clustering is a type of unsupervised learning, which means that it is used when we have unlabeled data. The algorithm works iteratively to find groups(clusters) in the data, and the number of the groups is represented by variable K. We use the Euclidian distance to find the distance between the data points and centroids, and it is with this distance we can sort the data points into clusters. The Euclidian distance can be calculated as:

Fig: Euclidian distance for two and three dimension

Steps for K-means clustering:

Decide on the number of clusters(K) for the dataset and choose k random points in the dataset. These k random points are called centroids and they should be equal to the number of clusters.
Calculate the Euclidean distance between each data point and chosen clusters centroids. A point is considered to be in a particular cluster when the Euclidean distance is minimum.
Define new centroids for each cluster by taking the mean of all points assigned to the cluster.
Iterate over steps 2 and 3 until the positions of centroids no longer change and the cluster assignment remains the same.

The optimal value of K

In an unlabelled dataset finding the optimal value of K is a challenge. In this case, too, the number of clusters required is not certain. Therefore, we used a heuristic approach with Elbow Method to identify the optimal value for K.

We identified the value for K using the scikit learn package. We plot the sum of squared distance upon running K-means for various values of K. In the plot we can see a clear Elbow like hinge and we take that point as the optimal value for K. According, to our plot we pick K = 2.

There are limitations of using the elbow method as it is a heuristic approach.

Result and Discussion

Using the optimal number (K) as suggested by the elbow method we were able to group the earthquakes from the dataset into two different clusters. The number of earthquakes with a magnitude of 5 or higher is considerably higher on the red cluster. A total of 92 earthquakes make up the red clusters, and 33 earthquakes make up the blue cluster. The central region of Nepal looks to have been most affected by the earthquake in these 25 years. The April 2015 earthquake and its aftershocks explain why we have such a cluster in the central region. A total of 52 earthquakes with a magnitude of 5 or more took place in 2015 alone.

The map also shows the centroid epicentres in yellow for the two clusters. The clusters were generated by calculating the Euclidian distance between the centroids and the data points.

The epicentres of earthquake divided into two clusters along with the magnitude

One interesting observation in the map was that the epicentres of the earthquakes were either in the hilly or the mountainous region of Nepal. Going back to the origin of hills and mountains of Nepal which were due to massive earthquakes this does explain why the epicentres are predominantly in the mountainous regions.

Next up, the earthquake data was divided into three clusters, before 2015, on 2015 and 2016 onwards. All the earthquakes from 1994 until the end of 2014 were put together in the “before 2015 cluster”. All the earthquakes in 2015 were put together in the “On 2015” cluster. Finally, all the earthquakes after 2015 were placed in “2016 onwards” cluster.

Earthquake epicentres clustering — 1994–2014,2015,2016–2019

From the map, we can observe that the epicentres of the earthquakes were pretty spread out before 2015. And it looks like there’s an even number of the earthquakes in the eastern and the western region. However, in 2015 the clusters are all bunched together in the central region of Nepal. We can see a couple of earthquakes in the western region during 2015, however, those were prior to April 25, 2015. Even after 2016, it can be seen most of the moderate, strong, major earthquakes are still occurring with the epicentre clustered close to the central region.

The dataset can be found here

The notebook can be found here

The pdf version

References:

https://viewsweek.com/why-nepal-is-so-prone-to-earthquakes/
https://earthquake.usgs.gov/earthquakes/eventpage/us20002926/executive
G. Weatherill and P.W. Burton. “Delineation of Shallow Seismic Source Zones Using k-means Cluster Analysis. with Application to the Aegean Region”. Geophysical Journal International, Vol. 176, pp 565–588. 2009.
K. Rehman, P.W. Burton, and G.A. Weatherill. “K-Means Cluster Analysis and Seismicity Partitioning for Pakistan”. Journal of Seismology, Vol.18, pp. 401–419, 2014.
Novianti P, Setyorini D and Rafflesia U, “K-Means cluster analysis in earthquake epicentre clustering”, International Journal of Advances in Intelligent Informatics, Vol. 3, №2, pp. 81–89, July 2017
Kamat R.K. and Kamath R.S.,” Earthquake Cluster Analysis: K-Means Approach”, Journal of Chemical and Pharmaceutical Sciences, Vol.10, Issue 1, pp.250–253, March 2017

Earthquake Cluster Analysis: K-means approach

Introduction

K-Means Cluster Analysis

Steps for K-means clustering:

The optimal value of K

Result and Discussion

References:

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Saral Karki

Written by weirdbutwired

No responses yet