Abstract
South Africa is considered the murder capital of the world. The challenge for the South African government is to attract foreign investment to boost the economy in a country plagued by homicide. In this study, a change-point analysis was used to pinpoint significant changes in the murder trends in each of the nine provinces in South Africa from 2005 to 2015. This analysis will assist authorities to gain a better understanding of the big picture view in order to mitigate against this crime. Two methods were used in the analysis, namely, CUSUM and Bootstrap. CUSUM was used to analyse data trends, and Bootstrap was used to calculate the occurrence of change points based on the confidence level. The results of the analysis clearly show the abrupt shifts in murder data across the provinces of South Africa. In addition, we used the South African population statistic dataset from 2005 to 2015 to evaluate the relationship between population of the nine provinces and contextualise the murder crime rates year to year and province to province.
1. Introduction
Crime in South Africa is gaining traction, and the government is desperately seeking ways to shake off the tag of being associated as the crime capital of the world in order to boost investor confidence [1, 2]. The murder crime in South Africa is on the increase, and the country has been ranked as one of the most murderous countries in the world [3, 4]. In the past research, works were conducted on the rise of murder in South Africa and reported in the literature [5–9]. Several reasons are put forward for the high crime rate, and these include the low standard of education, alcohol abuse, a lack of social and vocational skills, poor housing and living conditions, and a lack of parenting skills [1]. Violent crime is increasing faster than any other crime in South Africa.
An abrupt change occurs suddenly, and the application of abrupt change detection in crime studies is highly important to give early notice of an impending crime and the consequences on the health of a nation. Traditionally, control charts were used to detect changes. A major difference between control charts and change-point analysis is that control charts are updated following the collection of each data point while change-point analysis is conducted only after all the data are collected. Control charts can detect isolated abnormal points and a major change quickly while change-point analysis can detect subtle changes frequently missed by control charts [10]. Recently, change-point analysis has been extensively used and proven to be a powerful analytic tool for time-series datasets and revealing underlying trends. Several studies show that change points are efficient in exposing the presence of hidden change points in sequence or series datasets [11–14].
The murder trends in the nine provinces of South Africa were investigated in this work, and change-point analysis was conducted to find a substantial change. Two powerful change-point analysis tools, namely, CUSUM and Bootstrap, were used to discover trends and occurrence of change points in the South African murder data for 10 years (2005–2015). This research will enable the South African government to see at a glance the murder trend and particular point at which change occurred upward or downward and work toward prevention of the crime.
Some of the related research found in the literature on murder in South Africa is discussed in Section 2.
2. Related Works
2.1. Murder in South Africa
Murder in South Africa mostly occurs as a result of conflict between different groups engaging in certain activities, which include taxi-related, illicit mining, political motives, and hostel-related violence [15]. Over 20 years of postapartheid, South Africa still experiences an extreme homicide rate, which is among the world's highest, apart from war zones. According to the 2013 global report on homicide, Southern Africa and Central America have the highest murder rate [6]. The report from George Otieno et al. shows a high rate of homicide-related deaths in the typical rural South African population and solicited for quick attention in order to prevent loss of life [6].
According to Lindegaard, homicide is the first leading cause of death in South Africa and was regarded as a serious health issue [8]. On average, the death caused by violence in South Africa is almost twice the global average. The young men aged 15 to 29 years have been reported as culprits, suspects, and victims of homicide [7–9]. In the rural/township areas of South Africa, the victimization rate for men is quite higher than in the big cities [6, 8]. In addition, the homicides for South African women are committed by their intimate partners, and the rate is six times more than the world average [8].
Sexual homicide is a form of gender-based violence that happens in one in five female homicides and one in ten child homicides in South Africa [16]. Abraham et al. indicated that the rate of adult women’s sexual homicide is on the increase, and sexual child homicides for boy and girl children show different patterns of risks, and the girl child has the highest risk. The pervasiveness of various aspects (forensic, social, and demographic characteristics of the victims and perpetrators.) of sexual homicide in adult women, male, and female children is reported by Abraham et al. [17].
According to McCafferty & Action, crime statistics in the new South Africa (post 1994) shows violent crime had the greatest increase of all crime categories. Murder is a subcategory of violent crime. Between 1994 and 2002, all other violent crimes such as attempted murder, serious assault, and rape were on the rise except for murder that was declining [1]. Statistics released by the South African police service in 2008 show that there were 18 487 documented murders in South Africa between 2007 and 2008. Although this statistic shows a decline of more than 19%, homicide in South Africa is still higher than the global average [16]. As crime rates increase in South Africa, conviction rates decrease adding to the culture of violence [1].
Data from the Crime Information Analysis Centre (CIAC) show that between 1994 and 1999, the postapartheid years, one out of every three crimes in South Africa were violent crimes. Interpol data from three countries, namely, Australia, South Africa, and Columbia, show that only South Africa had the highest rate of violent theft, robbery, and murder [1]. According to Interpol statistics, South Africa has the highest per capita rates of murder and rape [18]. In the period from 1994 to 1999, all serious crimes increased in Pretoria and Durban by 19% and in Cape Town by 17% [18].
Since 1994, statistics show that Johannesburg is the capital city in South Africa for a serious crime. This is followed by Pretoria, Cape Town, and Durban [1]. An analysis of murders in Johannesburg, Pretoria, Cape Town, and Durban shows that people in townships and poorer parts of the city were more at risk of murder [18].
There are cases where murder in South Africa is committed at the hands of police brutality. One such case that drew a lot of public attention was the Marikana massacre in the North West Province on 16 August 2012. In this incident, 34 striking miners were brutally murdered by the police who claimed that they acted in self-defence [19].
This study will use a change-point analysis to assess the murder trends across the provinces of South Africa.
3. Materials and Methods
An experimental design with quantitative analysis was employed in this study. In this research, the crime statistics for South African dataset was analyzed using the change-point analysis data processing technique. Two change-point techniques are combined, namely, CUSUM and Bootstrap, as suggested by Taylor and Arif et al. [10, 20]. The approach was aimed at making a realistic interpretation of murder trends in South Africa. The CUSUM charts were used to detect significant changes and indicate when the murder rate was out of control. The Bootstrap technique is a resampling technique. The application of the bootstrapping technique indicated that some change points had occurred in the data. The 1000 bootstraps used in the experiments are recommended as the minimum number [10, 20]. It is with 95% confidence that all significant changes in the table are pinpointed. The change-point analysis uses a recursive algorithm to identify multiple changes.
The crime statistics for the South African dataset provide a history of crime statistics from 2005 to 2015 per province and station and are available online. The dataset provides a vast number of crime statistics from all South African provinces. The dataset was last updated on 18 November 2019 and with version 2 being the current version [21]. The change points obtained indicated the time where there was a noticeable deviation in the murder statistics in South Africa.
The South African population statistics [22] provides information on the South African population for each province (2005 to 2015). The comparison of the population and crime rates is shown in Figures 1(a)–9(b) of Supplementary Material-2 and discussed in Section 4.
3.1. Change-Point Analysis
In order to detect whether a change or more than one change occurred, a change-point analysis can be performed. Furthermore, change-point analysis can be used to detersssmine when the changes occurred and with what confidence the change occurred. The confidence level indicates the likelihood that a change occurred, and the confidence interval indicates when the change occurred. A change-point analysis can be applied to all types of time-ordered data [10].
Control charts have two horizontal lines (upper control limit and lower control limit) that indicate the maximum range that values are expected to vary. If points appear within these two horizontal lines, then it means no change has occurred. Points outside these limits indicate a change has occurred. While control charts are useful to detect changes, the analysis of changes is lacking [10].
3.1.1. Change-Point Analysis Algorithm
The change-point analysis aims at detecting any change in the mean of a process in historical data such as murder crime datasets. Performing this analysis, the following questions can be adequately answered: Did a change occur? Did more than one change occur? When did the changes occur? and How confident are we that they are real changes? [10].
Suppose x1, x2, …, xn denote n data points in time series and let S0, S1, …, Sn represents the cumulative sum of the points. To calculate the change-point analysis, the following three steps shall be applied to the initial dataset D0 = {X1, …, Xn} of size n (n0 = |D0|). The mean of x1, x2, …, xn is expressed by
The cumulative sum always starts at zero, 0. Therefore, let S0 be equal to zero, S0 = 0.
Then, Si is calculated repetitively as follows:
Before the computation of bootstrap analysis, there is a need to generate a boundary for the chart, an approximation of the magnitude of the change is calculated as
After the computation of the magnitude of change, iteratively, the bootstrap analysis is then executed a number of times N on D0. As described [20], a single bootstrap is executed as follows:(i)A bootstrap dataset Dl of size n from data points of time series in dataset D0 is represented as xj (j = 1, 2, 3, …, n). This dataset is generated by original n values, which are randomly reordered, which is also known as sampling without replacement (SWOR).(ii)The bootstrap CUSUM is computed by following a similar method based on the bootstrap sample and is defined as Sj.(iii)The magnitude of change for the bootstrap CUSUM is calculated as follows:(iv)Then, where the original magnitude of change is more than the magnitude of change of bootstrap CUSUM, , the number of bootstraps is counted. Let N be the number of bootstrap sample executed, and K be the number of bootstraps for which , where the confidence level that a change has occurred as a percentage is defined as
The bootstrapping ends up in an independent error structure [20], which is a distribution-free approach with a single assumption. Errors distributed as shown below are being referred to as an independent error structure:where mi denotes the mean at the time I, ei is a random error correlated with the i-th value, and the independent ei is assumed to have a zero (0) mean value, to be identically and normally distributed. Usually, mi = mi − 1 except for a small number of values of i are called change points. When change is detected, an approximation of when the change happened can be computed. The CUSUM estimator is calculated as follows:where Sm is the furthest point from the zero value in the CUSUM chart, and the last point before the change occurred is estimated by point m while point m + 1 estimates the first point after the change occurred [20]. The mean square error (MSE) is used as the second estimator when the change happens.
Let MSE (m) be defined as for a given sub-dataset D as follows [20]:
The bootstrapping technique is adopted to detect multiple changes. A repetitive analysis must be done to get other significant change points at consequent levels and the confidence limits and levels [20].
Significant change points can be revealed by the application of this technique to time-series data on the murder crime dataset, which is considered for this study.
3.2. Research Methodology
Data preprocessing was conducted in the python high-level programming language. Python supports modules and packages, which makes it attractive for rapid application development. The python package pandas have been used for practical real-world data analysis. Pandas are well suited to the dataset because the data are tabular with heterogeneously typed columns, as in an Excel spreadsheet. Furthermore, the data are time-series data. Data scientists follow several stages when working with data, namely cleaning the data, analyzing or modeling the data, and finally organizing the results of the data in the form of tables and graphs. Pandas are well suited to all these stages.
Figure 1 depicts the process flow diagram of the research conducted. As shown in Figure 1, the South African crime data were preprocessed using python data analytics tool. CUMSUM and Bootstrapping techniques were applied to the preprocessed data to detect abrupt changes that occurred at different points.

4. Results and Discussion
The murder statistics at each police station in every province from 2005 to 2015 was summed to provide provincial statistics on murder using python and displayed in Table 1. Table 1 shows the results for murder statistics in South Africa per province from 2005 to 2015.
Table 1 shows that the province of KwaZulu-Natal has the highest average murders (4149.82) over the ten-year period from 2005 to 2015. This is followed by the provinces of Gauteng (3514.09) and Eastern Cape (3402.18). Table 1 also shows that, on average, more murders were committed in 2015 than any other year. The graphical representation of the murder statistics of the 9 provinces of South Africa is depicted in Figures 1–9 found of Supplementary Materials-1 of this article. Figures 1–9 show the number of crimes in each of the nine (9) provinces for the period of ten years (2005–2016). For instance, Figure 9 of Supplementary Materials-1 shows that 750 murder occurred in 2005-2006, and more 900 murder occurred 2015-2016 in the North West Province of South Africa. It is possible to display the murder crime data on a control chart, but the drawback is that significant change points will not be pinpointed [10].
Table 2 shows the total population (ten years) of each of the nine provinces in South Africa, 2005 to 2015. The comparison of the rates of population and murder crime for all the provinces was carried in our experiments, and the results are depicted in Figures 1(a)–9(b) of Supplementary Material-2. Figures 1(a) and 1(b) show that the crime rate from 2005 to 2015 in Western Cape Province is not in accordance with the population rate of the province. The population increased steadily yearly, but the murder crime can be high or low any year. For instance, the murder crime increased in 2006 compared to 2005 and began to drop yearly from 2007 to 2009. It went up again by 2010 and dropped a bit by 2011. It increased progressively from 2012 to 2015. Looking at the comparison results of the remaining 8 provinces (Figures 2(a)–9(b)) on the rates of population and murder crime, it is all the same, the crime rate does not correspond with the population.
Conversely, comparing province to province in terms of high/low population and crime rates, the results of our experiment depicted in Figures 1(a)–9(b) of Supplementary Material-2 and Table 3 show that the four provinces (Gauteng, KwaZulu-Natal, Eastern Cape, and Western Cape) with highest population have the highest crime rates, and the province (Northern Cape) with the least population has the least crime rate.
Table 3 shows the average population and the murder crime in South Africa per province over a ten-year period.
Table 3 shows that provinces with the highest populations namely, Gauteng, KwaZulu-Natal, Eastern Cape, and Western Cape have the highest murder rates while provinces such as Northern Cape with low murder rates. Interestingly, KwaZulu-Natal with the second largest population has the highest murder rates.
In order to observe changes in the murder trends during the timed period such as noting when changes occurred and by how much it changed, a change-point analysis was performed. The results of the experiments are presented in the graphs and tables below.
Figure 2 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 3 displays the results of the CUSUM analysis.


The shaded background represents a region expected to contain all the values based on the current model that a change occurred. The one change is represented by the shifts in the shaded background. In Figure 2, the red line represents the upper and lower limits, and it can be observed from the figure that some points appear above the upper limit between 2005 and 2007 and below the lower limits around 2011. These points can be labelled as outliers as they fall outside the boundary. In Figure 3, the blue region of the CUSUM represents the existence of a change. Significant changes occurred in the period from 2009 to 2015. The CUSUM graph shows a descending trend.
Table 4 shows the results of the bootstrapping analysis of murder data of KwaZulu-Natal.
The analysis detected only one change in 2011. This year represents the first year of the change. The confidence level indicating how confident the analysis is that the change actually happened is 98%. Table 4 indicates that, prior to the change, the murder statistics was 3324, and while after the change, it was 3626. Table 4 also gives a level associated with each change. Any number of levels can exist dependent on the number of changes found. Level 1 change is the change that is most visibly apparent in the plot in Figure 2. The level 1 change is apparent in the CUSUM chart displayed in Figure 3.
Figure 4 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 5 displays the results of the CUSUM analysis.


Figure 4 shows that there is no point outside the control limits. However, there is one change represented by the shift in the blue region in the background. This change would have been missed by a control chart because all points are within the control limits. Figure 5 shows the CUSUM chart with background changes. The straightness of the line segments before and after the changes indicates that the changes were fairly sudden. The CUSUM chart shows a descending trend in the blue region. Determining the exact time of the reduction of murders is the key to solving the murder problem.
In order to get a better understanding of the number and timing of change points, bootstrapping was used. Table 5 shows the results of the bootstrapping analysis of murder data for Gauteng.
The analysis detected a level 1 change that occurred in 2011. As shown in Table 5, the number of murders reduced from 3401 to 3012, the confidence interval of a change between 2009 and 2012 at a confidence level of 97%. Insight into why there was a drastic reduction in murders pinpointed by the change-point analysis in 2011 is required by authorities.
Figure 6 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 7 displays the results of the CUSUM analysis.


Figure 6 reveals one change point in the blue-shaded region. All points are within the control limits. In Figure 7, the CUSUM plot shows significant changes in the blue region. The CUSUM chart detects significant changes after 2012, and there is an ascending trend in the blue region.
Table 6 shows the results of the bootstrapping analysis of murder data given in Western Cape.
Table 6 shows the results of a level 1 change in 2012. The confidence interval that a change took place between 2011 and 2013 is at a confidence level of 97%. The number of murders in the Western Cape Province increased from 2070 to 3106.2. Level 1 change is the change that is most visibly apparent in the plot in Figure 6. The level 1 changed is apparent in the CUSUM chart displayed in Figure 7.
Figure 8 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 9 is the results of the CUSUM analysis.


Table 7 below shows the results of the change-point analysis on murder data given in Eastern Cape in Table 1.
A level 1 change was discovered in 2008 as illustrated in Table 7. The confidence interval that a change took place between 2008 and 2012 is at a confidence level of 94%. The number of murders in the Eastern Cape Province decreased from 3284 to 3206.
Figure 10 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 11 is the results of the CUSUM analysis.


In Figure 10, the blue region shows one change point after 2009. All points are within the control limits. In Figure 10, the CUSUM chart concurs with the one change point. Significant changes are detected in the blue region. The CUSUM chart shows an ascending trend.
Table 8 shows the results of bootstrapping analysis of murder data given in Free State.
The analysis discovered a level 1 change in 2009. The result is shown in Table 8. The confidence interval that a change took place between 2009 and 2010 is at a confidence level of 92%. The number of murders in the Free State Province increased from 891.3 to 961.17.
Figure 12 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 13 is the result of the CUSUM analysis.


In Figure 12, the change-point analysis shows one significant change in the blue region. The figure also shows one point below the lower limit that represents an extreme value. There was a downward trend in the murder statistics after 2009. In Figure 13, the CUSUM chart confirms this change with the presence of a blue region.
Table 9 below shows the results of the bootstrapping analysis of murder data given in Mpumalanga.
The results show a level 1 change in 2011 in Table 9. The confidence interval that a change took place between 2011 and 2012 is at a confidence level of 92%. The number of murders in the Mpumalanga Province decreased from 711 to 686. Level 1 change is the change that is most visibly apparent in the plot in Figure 12. The level 1 changed is apparent in the CUSUM chart displayed in Figure 13.
Figure 14 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 15 is the results of the CUSUM analysis.


In Figure 14, there are no changes displayed in the blue region. All points appear within the control limits for the timed period. In Figure 15, the CUSUM chart shows no significant changes as indicated by the absence of a blue region. The results of the bootstrapping analysis of murder data given in Mpumalanga also showed no significant changes in murder data.
Figure 16 shows a graphical presentation of the results of the change-point analysis with background changes and control limits while Figure 17 is the results of the CUSUM.


In Figure 16, there are no changes displayed in the blue region. All points appear within the control limits for the timed period. In Figure 17, the CUSUM chart shows no significant changes as indicated by the absence of a blue region. The results of the bootstrapping analysis of murder data given in Limpopo also showed no significant changes in murder data.
Figure 18 shows a graphical presentation of the results of the change-point analysis with background changes and control limits and Figure 19 is the results of the CUSUM analysis.


In Figure 18, there are no changes displayed in the blue region. All points appear within the control limits for the timed period. In Figure 19, the CUSUM chart shows no significant changes as indicated by the absence of a blue region. The results of the bootstrapping analysis of murder data given in the North West Province also showed no significant changes in murder data.
From the experiment, it is clear that abrupt shifts in murder data across provinces were detected by analyzing change points and the level of changes. A level 1 change was detected in the analysis, and this was apparent in the CUSUM charts depicted. Table 10 below shows the instances of upward, downward, and no shift in the murder trends from 2005 to 2015.
In order to generate results that are more precise, the number of bootstraps must be increased, which doubles the duration of the analysis [20].
5. Conclusions
The change-point analysis on murder data over the ten-year period is preferred to the control chart, as in many instances, the control chart missed the changes. A review by authorities of when the changes occurred can provide valuable insight in reducing the number of murders committed in South Africa. The change-point analysis show trends in the data and give authorities the big picture view of the data.
Data Availability
The South Africa crime and population datasets used to support the findings of this study is available in the https://www.kaggle.com/slwessels/crime-statistics-for-south-africa and http://www.statssa.gov.za/publications/P0318/P03182018.pdf.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
The authors acknowledge the Durban University of Technology for making funding opportunities and materials for experiments available for this research project. The research was funded by the Durban University of Technology.
Supplementary Materials
Supplementary materials-1 (contains Figures 1–9, each figure represents the statistics of murder crime in each of the nine provinces in South Africa for ten-year period (2005–2015)). Supplementary materials-2 (contains Figures 1(a)–9(b), each set shows the statistics of the total population and murder crime rates for the respective nine provinces in South Africa over a period of 10 years (2005–2015). (Supplementary Materials)