Doctoral Studies

PhD's of the Department since 2001

2023

Label-dependent splitting for multi-label data

2022

2021

A quantitative analysis of investor over-reaction and under-reaction in the South African Equity Market : a mathematical statistical approach

One of the basic foundations of traditional finance is the theory underlying the efficient market hypothesis (EMH). The EMH states that stocks are fairly and accurately priced, making it impossible for

2020

Extreme Quantile Inference

A novel approach to performing extreme quantile inference is proposed by applying ridge regression and the saddlepoint approximation to results in extreme value theory. To this end, ridge regression is…

Classify yield spread movements in sparse data through triplots

In many developing countries, including South Africa, all data that are required to cal​culate the fair values of financial instruments are not always readily available. Additionally, in some instances…

Feature selection on multi-label classification

The field of multi-label learning is a popular new research focus. In the multi-label setting, a data instance can be associated simultaneously with a set of labels instead of only a single label. This…

2019

Biplot methodology for analysing and evaluating missing multivariate nominal scaled data

This research aims at developing exploratory techniques that are specifically suitable for missing data applications. Categorical data analysis, missing data analysis and biplot visualisation are the…

2018

Regularised Gaussian belief propagation

Belief propagation (BP) has been applied as an approximation tool in a variety of inference problems. BP does not necessarily converge in loopy graphs and, even if it does, is not guaranteed to provide….

2017

A statistical analysis of student performance for the 2000-2013 period at the Copperbelt University in Zambia

Education in general, and tertiary education in particular are the engines for sustained development of a nation. In this line, the Copperbelt University (CBU) plays a vital role in delivering the…

2016

Statistical inference of the multiple regression analysis of complex survey data

The quality of the inferences and results put forward from any statistical analysis is directly dependent on the correct method used at the analysis stage. Most survey data analyzed in practice originate from stratified multistage cluster samples or complex samples…

2015

Multivariate statistical process evaluation and monitoring for complex chemical processes

In this study, the development of an innovative fully integrated process monitoring methodology is presented for a complex chemical facility, originating at the coal feed from different mines up to the…

2014

The identification and application of common principal components

When estimating the covariance matrices of two or more populations, the covariance matrices are often assumed to be either equal or completely unrelated. The common principal components (CPC) model…

2013

Multi-label feature selecti​on with application to musical instrument recognition

An area of data mining and statistics that is currently receiving considerable attention is the field of multi-label learning. Problems in this field are concerned with scenarios where each data case can…

2012

Bayesian approaches of Markov models embedded in unbalanced panel data

Multi-state models are used in this dissertation to model panel data, also known as longitudinal or cross-sectional time-series data. These are data sets which include units that are observed across two…

2011

Statistical inference for inequality measures based on semi-parametric estimators

Measures of inequality, also used as measures of concentration or diversity, are very popular in economics and especially in measuring the inequality in income or wealth within a population and…

2010

Improved estimation procedures for a positive extreme value index

In extreme value theory (EVT) the emphasis is on extreme (very small or very large) observations. The crucial parameter when making inferences about extreme quantiles, is called the extreme value index…

2008

Assessing the influence of observations on the generalization performance of the kernel Fisher discriminant classifier

Kernel Fisher discriminant analysis (KFDA) is a kernel-based technique that can be used to classify observations of unknown origin into predefined groups. Basically, KFDA can be viewed as a non-linear extension of Fisher’s…

Variable selection for kernel methods with application to binary classification

The problem of variable selection in binary kernel classification is addressed in this thesis. Kernel methods are fairly recent additions to the statistical toolbox, having originated approximately two decades ago in …

A framework for estimating risk

We consider the problem of model assessment by risk estimation. Various approaches to risk estimation are considered in a uni ed framework. This a discussion of various complexity dimensions and approaches to obtaining bounds…

2007

Some statistical aspects of LULU smoothers

The smoothing of time series plays a very important role in various practical applications. Estimating the signal and removing the noise is the main goal of smoothing. Traditionally linear smoothers were used, but nonlinear…

Aspects of model development using regression quantiles and elemental regressions

It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from the classical Gaussian assumptions (outliers) as well as data aberrations in the design space. The two major…

2003

Influential data cases when the C-p criterion is used for variable selection in multiple linear regression

In this dissertation we study the influence of data cases when the Cp criterion of Mallows (1973) is used for variable selection in multiple linear regression. The influence is investigated in terms…

2002

Time series forecasting and model selection in singular spectrum analysis

Singular spectrum analysis (SSA) originated in the field of Physics. The technique is non-parametric by nature and inter alia finds application in atmospheric sciences, signal processing and recently…

Edgeworth-corrected small-sample confidence intervals for ratio parameters in linear regression

In this thesis we construct a central confidence interval for a smooth scalar non-linear function of parameter vector f3 in a single general linear regression model Y = X f3 + c. We do this by…​

2001

Extensions of biplot methodology to discriminant analysis with applications of non-parametric principal components

Gower and Hand offer a new perspective on the traditional biplot. This perspective provides a unified approach to principal component analysis (PCA) biplots based on Pythagorean distance; canonical…​​