Visualisation of multi-dimensional data and biplots

This well-established research group collaborates with several international experts in the field, as well as with Industry in South Africa. New students join the group continually to proceed with their master’s or doctoral studies. Since 1996 the group has already produced 30 Masters and 10 PhD graduates. Several additional registered postgraduate students are currently engaged with research in visualisation of multi-dimensional data and related biplots.

As a result, numerous R packages and chunks of R code for biplots have been developed by many different individuals. The research group was awarded the EMS faculty’s elite research grant for 2020, specifically to collate and enhance the capabilities of this existing software with new R packages like Plotly and Shiny. The aim is to provide a coherent, user friendly visualisation package for researchers and application.

The application of biplot methodology to data sets originating from diverse fields of application such as archaeology, ecology, psychology, accountancy, chemistry, wood science, health sciences and industry stimulates the development of new theory and procedures which in turn set the scene for subsequent theoretical research.

- NJ le Roux
- S Lubbe
- CJ van der Merwe
- J Nienkemper-Swanepoel

Natural Selection Modelling

Molecular evolutionary studies offer effective methods for using genomic information to investigate biomedical phenomena. Phylogenetics – the study of the evolutionary relationships among living organisms using genetic data – is one of those methods. An aspect of phylogenetics that has gained significant research attention is natural selection. It includes the use of theoretical probability techniques to identify genetic loci that are crucial to the survival of living organisms. One utility of natural selection analyses is in the understanding of how pathogens evade treatments and the immunity of their hosts.

Given the lack of cure for some existing pathogens such as HIV and the continuous outbreaks of new pathogens such as the coronavirus, natural selection remains an active research field. It has traditionally been studied as a measure of relative fitness among alleles in the field of population genetics. As a result, there exists a good volume of theoretical work with respect to natural selection in the population genetics literature. This rich body of work is yet to be fully synchronised with the phylogenetic analogue due to mathematical complexities.

The focus of this group is to bridge the gap between the population genetic and phylogenetic fields with respect to natural selection research.

- H Saddiq

Application of measures of divergence in statistical inference

In the literature several measures of divergence between two probability distributions have been proposed such as Pearson’s chi-square divergence, power density divergence, phi measures of divergence and many others. Some of these measures have been applied to particular areas of statistical inference, but some areas have either been “avoided” or “neglected” by only receiving marginal attention. In this research divergence measures are applied to some of these areas, including goodness-of-fit tests and extreme value theory. In particular the classes of f-divergences associated with Arimoto’s and Tsallis’ entropies, are studied, extended and applied to such problems.

- T de Wet
- F Österreicher (University of Salzburg)

Rough Volatility Modelling in Energy Markets

Empirical evidence shows that market volatility is rough. The goal of this project is to take forward volatility model for commodity market where instantaneous volatility process is rough. This is idea is brought forward by our prior paper with Christina Nikitopoulos where we proved empirically that commodity market is rough with a Hurst index of order 0.01. As an application, we want to focus on model option pricing and estimation of model parameters. The main challenge is to derive bivariate Laplace transform via a sequences of rescaled Hawkes processes. Once this is in place, we then hope to employ commodity market data and carry out empirical studies including pricing and estimation.

- M Alfeus
- Ludger Overbeck (University of Giessen, Germany)
- Christina Nikitopoulos (University of Technology Sydney, Australia)

Campanometry

Campanology, the study of bells, bell-casting and bellringing, is quite an old discipline. A major aspect of campanology is clearly the sound produced and thus the discipline is traditionally closely tied to physics, and in particular to acoustics, the scientific study of sound and sound waves. In contrast to this, the study of the quantitative or statistical aspects of bells and their properties is much more recent. The term Campanometry was coined for this multi-disciplinary field of music, history, mathematics/statistics, acoustics and metallurgy. The famous campanologist Andre Lehr (1929 – 2007) is credited as the founder of Campanometry. Of particular interest is the measurement and statistical study of the different partials of bells and carillons. Since bells are usually tuned to have their partials at the ideal values, the deviations from these ideal values supply important information on the sound quality of a bell. Furthermore, measurements on their physical properties also provide useful data for analyses. In this research bells in the Western Cape are identified, pictured and measured physically and acoustically and the information is stored in the SUNDigital Collections, the digital heritage repository of the Stellenbosch University Library. The data thus obtained is used to statistically model different aspects of bells and carillons, inter alia to what extent they comply with certain standard design criteria and to statistically analyse the sound quality of the bells. Furthermore, using statistical classification techniques, bells of unknown founders in the database can be classified as being founded by a particular founder. The latter is analogous to the statistical classification of unknown authors of manuscripts.

- T de Wet
- PJU van Deventer
- JL Teugels (KUL, Belgium)

Open-Set Classifiers using Extreme Value Theory

This research proposes and studies open-set recognition classifiers that are based on the use of extreme value statistics. For this purpose, a distance ratio is introduced that expresses how dissimilar a target point is from known classes by considering the ratio of distances locally around the target point. The class of generalized Pareto distributions with bounded support is used to model the peaks of the distance ratio above a high threshold. Both first and second order approximations are used. Furthermore, numerical methods to select the optimal threshold used in the generalized Pareto distribution are studied. Ultimately, the generalized Pareto distribution is used to extend supervised classifiers to be able to detect categories in the data not seen by the model previously. The performance of the proposed methods is applied to several image and bioacoustics data sets where it performs well compared to similar open-set recognition and anomaly detection methods.

- ML Steyn
- T de Wet
- S Luca (U Ghent)
- B De Baets (U Ghent)

Novelty detection using Extreme Value Theory

Novelty detection is a branch of statistics that concerns detecting deviations from the expected normal behaviour. It is generally the case that the anomalous events have catastrophic financial or social impacts and, therefore, only occur rarely. Consequently, the broad approach is to construct a model representing the normal behaviour of the underlying system. New observations are then tested against this model of normality. One approach to discriminate between expected and anomalous observations is to threshold the model of normality probabilistically. This method has the advantage that the certainty in discriminating between normal and novel observations is quantified. Recently, an approach based on extreme value theory has been formulated to threshold the model representing the normal state. Under the assumption that the probability density function of the variables in their normal state is well defined, extreme value theory is utilised to derive a limiting distribution for the minimum probability density of the data. A significant advantage that this approach i nherits is the ability to perform novelty detection in multimodal and multivariate data spaces. Further research is now being carried out in which the theory of second order regular variation is used to determine the rate of convergence of the extreme value-based novelty detection algorithm. This research extends current models by using the lower order statistics of the probability density values to approximate the limiting distribution of the minimum probability density. Consequently, the extreme value distribution is approximated by using more information than only the sample of minima.

- ML Steyn
- T de Wet

A consistent polynomial factor model of the term structure of roll-over risk

Polynomial based models offer computational efficiency. We extend the model A Consistent Stochastic Model of the Term Structure of Interest Rates for Multiple Tenors (in the Journal of Economic Dynamics and Control (JEDC)) to model roll-over risk. Here, we explore the advantage of polynomial processes as applied in finance over affine processes.

- M Alfeus
- Martino Grasselli (University of Padova, Italy)
- Erik SchlÖgl (University of Technology Sydney, Australia)

Statistical inference of complex survey data

Most survey data analysed in practice originate from non-simple random sampling (non-SRS) designs. These designs typically combine different sampling methods, such as stratification and cluster sampling. This is known as complex sampling, a technique employed to ensure that the sample collected represents the target population as closely as possible. This project extends our previous research in the field of complex sampling to develop models and methods of analysis to account for the complex design of non-SRS multivariate data, a highly unexplored area. The newly developed models and methods will be evaluated in two ways, using simulated hierarchical data, such that the evaluation can be carried out under controllable circumstances, as well as using real-world data to ensure that the developed models account for real-world anomalies.

- R Luus (UWC)
- A Neethling (UFS)
- T de Wet

Robust estimation of extreme value parameters

In this research robust estimators are proposed and studied for the parameters of interest in extreme value theory. Such estimators are obtained using the minimum density power divergence distance measure with an exponential regression model of log spacings under a second order condition on the relevant slowly varying component. For robustness, the influence functions and gross error sensitivities of these estimators are determined. The parameters considered are the extreme value index and extreme quantiles. The estimators obtained are compared to existing robust estimators. The comparisons are carried out on simulated contaminated samples from different distributions as well as on several well-known practical data sets.

- T de Wet
- R Minkah (University of Ghana)
- A Ghosh (Indian Statistical Institute)

Pricing Exotic Derivatives for Cryptocurrency Assets – A Monte Carlo Approach

We present an approach to pricing exotic derivatives, specifically lookback options, for cryptocurrencies. Our approach is to propose a discreetly monitored window average lookback option, whose monitor instants are randomly selected within the time to maturity, and whose monitoring price is the average asset price in a specified window surrounding the instant.

- M Alfeus
- Shiam Kannan (Cornel University)

Extreme Value Theory

Saddlepoint approximations have been applied successfully in many areas of statistics (as well as in other sciences, e.g. physics, applied mathematics and engineering). However, very little work has been done on applying the saddlepoint in Extreme Value Theory (EVT). In recent research the authors have applied it to approximating the distribution of the Hill estimator, the well-known estimator for the extreme value index (EVI). The approximation in that case was extremely accurate. Further research is now being carried out in which the saddlepoint is applied to other estimators of the EVI as well as to estimators of other relevant EVT parameters e.g. quantiles. The saddlepoint will also be used to find improved confidence intervals for these parameters.

- S Buitendag
- T de Wet
- J Beirlant (KUL, Belgium)

Optimality in Weighted L2-Wasserstein Goodness-of-Fit Statistics

In two recent papers, del Barrio et al. (1999) and del Barrio et al. (2000), the authors introduced a new class of goodness-of-fit statistics based on the -Wasserstein distance. It was shown that the desirable property of loss of degrees-of-freedom holds only under normality. Furthermore, these statistics have some serious limitations in their applicability to heavier-tailed distributions. To overcome these problems, the use of weight functions in these statistics was proposed and investigated by de Wet (2000, 2002) and Csörgő (2002). In the former the issue of loss of degrees-of-freedom was considered and in the latter the application to heavier-tailed distributions. In de Wet (2000, 2002) it was shown how the weight functions could be chosen in order to retain the loss of degrees-of-freedom property separately for location and scale. The weight functions that give this property, are the ones that give asymptotically optimal estimators for respectively the location and scale parameters – thus estimation optimality. In this paper we show that in the location case, this choice of “estimation optimal” weight function also gives “testing optimality”, where the latter is measured in terms of approximate Bahadur efficiencies.

- T de Wet

Repairable systems in Reliability: Bayesian Approaches

Research on repairable systems and their evaluation of their performance in terms of reliability and availability. Multi-unit systems are investigated. A Bayesian method of assessing reliability is of primary interest, since very little is published on this topic.

- PJ Mostert
- VSS Yadavalli
- A Bekker (University of Pretoria)

Bayesian analysis of cancer survival data using the lifetime model

Bayes estimators for some of the lifetime distribution parameters, such as the mean survival time, the hazard function and the survival distribution function are derived for survival data from various lifetime models. The estimators are derived using a selection of loss functions. The survival data are normally censored and the theory is based on right-censored data – other types of censoring are also investigated – non-parametrically and parametrically. Various types of prior distribution are used in this study.

- PJ Mostert
- JJJ Roux (University of South Africa)
- A Bekker (University of Pretoria)

Forecasting by identification of linear structure in a time series

Forecasting is an important and difficult problem in time series analysis. Traditional methods are based on fitting a model to the available data, and extrapolating to future time points. An example is the class of Box-Jenkins models. In this research a new model-free approach is investigated, based on the principal components structure of the so-called time-delay matrix.

- H Viljoen

Pricing Non-linear Interest Rate Derivatives in Multicurve framework

This is an extension of the paper A Consistent Stochastic Model of the Term Structure of Interest Rates for Multiple Tenors mainly focused on pricing and calibration of nonlinear interest rate instruments such as Caps/floors, Swaptions, Constant Maturity Swaptions (CMS) and Bermudan Swaptions. This project will dig deeper to investigate the best risk-free rate that should be used for the market consistent discounting of cashflows; OIS, repo rate and SOFR. The ultimate objective is to derive the distribution of roll-over risk from the calibrated model and ‘perhaps’ derive a risk-neutral expectation of the roll-over risk.

- M Alfeus
- Alex Backwell (University of Cape Town)
- Andrea Macrina (University of College London, UK)
- Erik SchlÖgl (University of Technology Sydney, Australia)
- David Skovmand (University of Copenhagen, Denmark)

Analysis of the performance of actuarial science students

Studies have been carried out to better understand the performance of actuarial science students both in the various university modules as well as in the examinations of the actuarial profession. Performance has been analysed by degree programme, investigating the time taken to graduate, the number of exemptions from the profession’s examinations obtained, the programmes to which students who leave the actuarial programme migrate, and the influence on performance of factors such as school mathematics, language, etc. The perceptions of students on the factors which lead to success has been investigated. The performance of students in the professions examinations has also been investigated, taking into account the university attended, gender, race, examination centre, etc.

- PG Slattery

Digital soil mapping