A-quick-tour-of-StMoE

Introduction

StMoE (Skew-t Mixture-of-Experts) provides a flexible and robust modelling framework for heterogenous data with possibly skewed, heavy-tailed distributions and corrupted by atypical observations. StMoE consists of a mixture of K skew-t expert regressors network (of degree p) gated by a softmax gating network (of degree q) and is represented by:

  • The gating network parameters alpha’s of the softmax net.
  • The experts network parameters: The location parameters (regression coefficients) beta’s, scale parameters sigma’s, the skewness parameters lambda’s and the degree of freedom parameters nu’s. StMoE thus generalises mixtures of (normal, skew-normal, t, and skew-t) distributions and mixtures of regressions with these distributions. For example, when q = 0, we retrieve mixtures of (skew-t, t-, skew-normal, or normal) regressions, and when both p = 0 and q = 0, it is a mixture of (skew-t, t-, skew-normal, or normal) distributions. It also reduces to the standard (normal, skew-normal, t, and skew-t) distribution when we only use a single expert (K = 1).

Model estimation/learning is performed by a dedicated expectation conditional maximization (ECM) algorithm by maximizing the observed data log-likelihood. We provide simulated examples to illustrate the use of the model in model-based clustering of heterogeneous regression data and in fitting non-linear regression functions.

It was written in R Markdown, using the knitr package for production.

See help(package="meteorits") for further details and references provided by citation("meteorits").

Application to a simulated dataset

Generate sample

n <- 500 # Size of the sample
alphak <- matrix(c(0, 8), ncol = 1) # Parameters of the gating network
betak <- matrix(c(0, -2.5, 0, 2.5), ncol = 2) # Regression coefficients of the experts
sigmak <- c(0.5, 0.5) # Standard deviations of the experts
lambdak <- c(3, 5) # Skewness parameters of the experts
nuk <- c(5, 7) # Degrees of freedom of the experts network t densities
x <- seq.int(from = -1, to = 1, length.out = n) # Inputs (predictors)

# Generate sample of size n
sample <- sampleUnivStMoE(alphak = alphak, betak = betak, sigmak = sigmak, 
                          lambdak = lambdak, nuk = nuk, x = x)
y <- sample$y

Set up StMoE model parameters

K <- 2 # Number of regressors/experts
p <- 1 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

stmoe <- emStMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - StMoE: Iteration: 1 | log-likelihood: -288.886616832502
## EM - StMoE: Iteration: 2 | log-likelihood: -274.551939353402
## EM - StMoE: Iteration: 3 | log-likelihood: -266.497536933617
## EM - StMoE: Iteration: 4 | log-likelihood: -261.857813858145
## EM - StMoE: Iteration: 5 | log-likelihood: -259.069450275996
## EM - StMoE: Iteration: 6 | log-likelihood: -257.279029763355
## EM - StMoE: Iteration: 7 | log-likelihood: -255.993957131894
## EM - StMoE: Iteration: 8 | log-likelihood: -254.933320108311
## EM - StMoE: Iteration: 9 | log-likelihood: -253.933596553251
## EM - StMoE: Iteration: 10 | log-likelihood: -252.892640656719
## EM - StMoE: Iteration: 11 | log-likelihood: -251.743365035994
## EM - StMoE: Iteration: 12 | log-likelihood: -250.444614798949
## EM - StMoE: Iteration: 13 | log-likelihood: -248.977203634933
## EM - StMoE: Iteration: 14 | log-likelihood: -247.350575089965
## EM - StMoE: Iteration: 15 | log-likelihood: -245.596522092922
## EM - StMoE: Iteration: 16 | log-likelihood: -243.756307285109
## EM - StMoE: Iteration: 17 | log-likelihood: -241.864587222424
## EM - StMoE: Iteration: 18 | log-likelihood: -239.953951029685
## EM - StMoE: Iteration: 19 | log-likelihood: -238.056248150092
## EM - StMoE: Iteration: 20 | log-likelihood: -236.195802710322
## EM - StMoE: Iteration: 21 | log-likelihood: -234.387136977492
## EM - StMoE: Iteration: 22 | log-likelihood: -232.636346282879
## EM - StMoE: Iteration: 23 | log-likelihood: -230.951331041944
## EM - StMoE: Iteration: 24 | log-likelihood: -229.340514565141
## EM - StMoE: Iteration: 25 | log-likelihood: -227.802797852467
## EM - StMoE: Iteration: 26 | log-likelihood: -226.332157730926
## EM - StMoE: Iteration: 27 | log-likelihood: -224.911732359032
## EM - StMoE: Iteration: 28 | log-likelihood: -223.520115774558
## EM - StMoE: Iteration: 29 | log-likelihood: -222.132397318382
## EM - StMoE: Iteration: 30 | log-likelihood: -220.722977862227
## EM - StMoE: Iteration: 31 | log-likelihood: -219.273728685894
## EM - StMoE: Iteration: 32 | log-likelihood: -217.781217209287
## EM - StMoE: Iteration: 33 | log-likelihood: -216.247568957512
## EM - StMoE: Iteration: 34 | log-likelihood: -214.686630848182
## EM - StMoE: Iteration: 35 | log-likelihood: -213.124965232665
## EM - StMoE: Iteration: 36 | log-likelihood: -211.595228679404
## EM - StMoE: Iteration: 37 | log-likelihood: -210.133182860863
## EM - StMoE: Iteration: 38 | log-likelihood: -208.761733107594
## EM - StMoE: Iteration: 39 | log-likelihood: -207.505603093142
## EM - StMoE: Iteration: 40 | log-likelihood: -206.373329778521
## EM - StMoE: Iteration: 41 | log-likelihood: -205.372114352052
## EM - StMoE: Iteration: 42 | log-likelihood: -204.498244415933
## EM - StMoE: Iteration: 43 | log-likelihood: -203.738389227599
## EM - StMoE: Iteration: 44 | log-likelihood: -203.079067766075
## EM - StMoE: Iteration: 45 | log-likelihood: -202.513398753339
## EM - StMoE: Iteration: 46 | log-likelihood: -202.032282890454
## EM - StMoE: Iteration: 47 | log-likelihood: -201.623328696478
## EM - StMoE: Iteration: 48 | log-likelihood: -201.278496521231
## EM - StMoE: Iteration: 49 | log-likelihood: -200.990982392119
## EM - StMoE: Iteration: 50 | log-likelihood: -200.754407258189
## EM - StMoE: Iteration: 51 | log-likelihood: -200.559001222497
## EM - StMoE: Iteration: 52 | log-likelihood: -200.398549872961
## EM - StMoE: Iteration: 53 | log-likelihood: -200.267046004182
## EM - StMoE: Iteration: 54 | log-likelihood: -200.159885508749
## EM - StMoE: Iteration: 55 | log-likelihood: -200.072379210803
## EM - StMoE: Iteration: 56 | log-likelihood: -200.001038301145
## EM - StMoE: Iteration: 57 | log-likelihood: -199.942530915927
## EM - StMoE: Iteration: 58 | log-likelihood: -199.89522985542
## EM - StMoE: Iteration: 59 | log-likelihood: -199.857461309248
## EM - StMoE: Iteration: 60 | log-likelihood: -199.827905802677
## EM - StMoE: Iteration: 61 | log-likelihood: -199.805387299422
## EM - StMoE: Iteration: 62 | log-likelihood: -199.78849134333
## EM - StMoE: Iteration: 63 | log-likelihood: -199.776642765286
## EM - StMoE: Iteration: 64 | log-likelihood: -199.76904515713
## EM - StMoE: Iteration: 65 | log-likelihood: -199.765156830374
## EM - StMoE: Iteration: 66 | log-likelihood: -199.764043033773

Summary

stmoe$summary()
## ------------------------------------------
## Fitted Skew t Mixture-of-Experts model
## ------------------------------------------
## 
## StMoE model with K = 2 experts:
## 
##  log-likelihood df      AIC       BIC       ICL
##        -199.764 12 -211.764 -237.0517 -237.1476
## 
## Clustering table (Number of observations in each expert):
## 
##   1   2 
## 249 251 
## 
## Regression coefficients:
## 
##     Beta(k = 1)  Beta(k = 2)
## 1   -0.01006586 -0.003688364
## X^1  2.53158068 -2.576650566
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2)
##      0.3765313     0.2719162

Plots

Mean curve

stmoe$plot(what = "meancurve")

Confidence regions

stmoe$plot(what = "confregions")

Clusters

stmoe$plot(what = "clusters")

Log-likelihood

stmoe$plot(what = "loglikelihood")

Application to real dataset

Load data

library(MASS)
data("mcycle")
x <- mcycle$times
y <- mcycle$accel

Set up StMoE model parameters

K <- 4 # Number of regressors/experts
p <- 2 # Order of the polynomial regression (regressors/experts)
q <- 1 # Order of the logistic regression (gating network)

Set up EM parameters

n_tries <- 1
max_iter <- 1500
threshold <- 1e-5
verbose <- TRUE
verbose_IRLS <- FALSE

Estimation

stmoe <- emStMoE(X = x, Y = y, K, p, q, n_tries, max_iter, 
                 threshold, verbose, verbose_IRLS)
## EM - StMoE: Iteration: 1 | log-likelihood: -593.23812442799
## EM - StMoE: Iteration: 2 | log-likelihood: -585.982322360787
## EM - StMoE: Iteration: 3 | log-likelihood: -584.203326518032
## EM - StMoE: Iteration: 4 | log-likelihood: -583.076174703954
## EM - StMoE: Iteration: 5 | log-likelihood: -579.739851320572
## EM - StMoE: Iteration: 6 | log-likelihood: -572.859817696828
## EM - StMoE: Iteration: 7 | log-likelihood: -567.000109614853
## EM - StMoE: Iteration: 8 | log-likelihood: -563.657911942832
## EM - StMoE: Iteration: 9 | log-likelihood: -562.096289384627
## EM - StMoE: Iteration: 10 | log-likelihood: -561.337697496131
## EM - StMoE: Iteration: 11 | log-likelihood: -560.773133833654
## EM - StMoE: Iteration: 12 | log-likelihood: -560.260481727092
## EM - StMoE: Iteration: 13 | log-likelihood: -559.886789309604
## EM - StMoE: Iteration: 14 | log-likelihood: -559.635872803354
## EM - StMoE: Iteration: 15 | log-likelihood: -559.434357246517
## EM - StMoE: Iteration: 16 | log-likelihood: -559.254164807977
## EM - StMoE: Iteration: 17 | log-likelihood: -559.090252118998
## EM - StMoE: Iteration: 18 | log-likelihood: -558.942402881181
## EM - StMoE: Iteration: 19 | log-likelihood: -558.810555576634
## EM - StMoE: Iteration: 20 | log-likelihood: -558.694578106762
## EM - StMoE: Iteration: 21 | log-likelihood: -558.59014276968
## EM - StMoE: Iteration: 22 | log-likelihood: -558.495193317621
## EM - StMoE: Iteration: 23 | log-likelihood: -558.408006858728
## EM - StMoE: Iteration: 24 | log-likelihood: -558.326993424879
## EM - StMoE: Iteration: 25 | log-likelihood: -558.250977779787
## EM - StMoE: Iteration: 26 | log-likelihood: -558.178701213986
## EM - StMoE: Iteration: 27 | log-likelihood: -558.109102696266
## EM - StMoE: Iteration: 28 | log-likelihood: -558.041463856591
## EM - StMoE: Iteration: 29 | log-likelihood: -557.974463308535
## EM - StMoE: Iteration: 30 | log-likelihood: -557.907506579817
## EM - StMoE: Iteration: 31 | log-likelihood: -557.839519655437
## EM - StMoE: Iteration: 32 | log-likelihood: -557.769979512929
## EM - StMoE: Iteration: 33 | log-likelihood: -557.697898658522
## EM - StMoE: Iteration: 34 | log-likelihood: -557.622632551957
## EM - StMoE: Iteration: 35 | log-likelihood: -557.543309527861
## EM - StMoE: Iteration: 36 | log-likelihood: -557.458940384955
## EM - StMoE: Iteration: 37 | log-likelihood: -557.368224640238
## EM - StMoE: Iteration: 38 | log-likelihood: -557.269684024249
## EM - StMoE: Iteration: 39 | log-likelihood: -557.161547292055
## EM - StMoE: Iteration: 40 | log-likelihood: -557.041956535991
## EM - StMoE: Iteration: 41 | log-likelihood: -556.909309546063
## EM - StMoE: Iteration: 42 | log-likelihood: -556.762050411908
## EM - StMoE: Iteration: 43 | log-likelihood: -556.59906379332
## EM - StMoE: Iteration: 44 | log-likelihood: -556.422077974346
## EM - StMoE: Iteration: 45 | log-likelihood: -556.242162161107
## EM - StMoE: Iteration: 46 | log-likelihood: -556.091341440487
## EM - StMoE: Iteration: 47 | log-likelihood: -556.004664239088
## EM - StMoE: Iteration: 48 | log-likelihood: -555.971257140171
## EM - StMoE: Iteration: 49 | log-likelihood: -555.960113259389
## EM - StMoE: Iteration: 50 | log-likelihood: -555.956489937821

Summary

stmoe$summary()
## ------------------------------------------
## Fitted Skew t Mixture-of-Experts model
## ------------------------------------------
## 
## StMoE model with K = 4 experts:
## 
##  log-likelihood df       AIC       BIC       ICL
##       -555.9565 30 -585.9565 -629.3117 -629.3075
## 
## Clustering table (Number of observations in each expert):
## 
##  1  2  3  4 
## 28 37 31 37 
## 
## Regression coefficients:
## 
##     Beta(k = 1) Beta(k = 2)  Beta(k = 3) Beta(k = 4)
## 1   -3.11670591 1282.733402 -1826.127568 292.5983661
## X^1  0.74184053 -139.318622   112.662390 -12.1614638
## X^2 -0.07175525    3.407279    -1.691592   0.1248084
## 
## Variances:
## 
##  Sigma2(k = 1) Sigma2(k = 2) Sigma2(k = 3) Sigma2(k = 4)
##       12.09729      1138.317      551.4754      572.9232

Plots

Mean curve

stmoe$plot(what = "meancurve")

Confidence regions

stmoe$plot(what = "confregions")

Clusters

stmoe$plot(what = "clusters")

Log-likelihood

stmoe$plot(what = "loglikelihood")