Calculate the conditional Mahalanobis distance for any variables.

cond_maha(
data,
R,
v_dep,
v_ind = NULL,
v_ind_composites = NULL,
mu = 0,
sigma = 1,
use_sample_stats = FALSE,
label = NA
)

## Arguments

data

Data.frame with the independent and dependent variables. Unless mu and sigma are specified, data are assumed to be z-scores.

R

Correlation among all variables.

v_dep

Vector of names of the dependent variables in your profile.

v_ind

Vector of names of independent variables you would like to control for.

v_ind_composites

Vector of names of independent variables that are composites of dependent variables

mu

A vector of means. A single value means that all variables have the same mean.

sigma

A vector of standard deviations. A single value means that all variables have the same standard deviation

use_sample_stats

If TRUE, estimate R, mu, and sigma from data. Only complete cases are used (i.e., no missing values in v_dep, v_ind, v_ind_composites).

label

optional tag for labeling output

## Value

a list with the conditional Mahalanobis distance

• dCM = Conditional Mahalanobis distance

• dCM_df = Degrees of freedom for the conditional Mahalanobis distance

• dCM_p = A proportion that indicates how unusual this profile is compared to profiles with the same independent variable values. For example, if dCM_p = 0.88, this profile is more unusual than 88 percent of profiles after controlling for the independent variables.

• dM_dep = Mahalanobis distance of just the dependent variables

• dM_dep_df = Degrees of freedom for the Mahalanobis distance of the dependent variables

• dM_dep_p = Proportion associated with the Mahalanobis distance of the dependent variables

• dM_ind = Mahalanobis distance of just the independent variables

• dM_ind_df = Degrees of freedom for the Mahalanobis distance of the independent variables

• dM_ind_p = Proportion associated with the Mahalanobis distance of the independent variables

• v_dep = Dependent variable names

• v_ind = Independent variable names

• v_ind_singular = Independent variables that can be perfectly predicted from the dependent variables (e.g., composite scores)

• v_ind_nonsingular = Independent variables that are not perfectly predicted from the dependent variables

• data = data used in the calculations

• d_ind = independent variable data

• d_inp_p = Assuming normality, cumulative distribution function of the independent variables

• d_dep = dependent variable data

• d_dep_predicted = predicted values of the dependent variables

• d_dep_deviations = d_dep - d_dep_predicted (i.e., residuals of the dependent variables)

• d_dep_residuals_z = standardized residuals of the dependent variables

• d_dep_cp = conditional proportions associated with standardized residuals

• d_dep_p = Assuming normality, cumulative distribution function of the dependent variables

• R2 = Proportion of variance in each dependent variable explained by the independent variables

• SEE = Standard error of the estimate for each dependent variable

• ConditionalCovariance = Covariance matrix of the dependent variables after controlling for the independent variables

• distance_reduction = 1 - (dCM / dM_dep) (Degree to which the independent variables decrease the Mahalanobis distance of the dependent variables. Negative reductions mean that the profile is more unusual after controlling for the independent variables. Returns 0 if dM_dep is 0.)

• variability_reduction = 1 - sum((X_dep - predicted_dep) ^ 2) / sum((X_dep - mu_dep) ^ 2) (Degree to which the independent variables decrease the variability the dependent variables (X_dep). Negative reductions mean that the profile is more variable after controlling for the independent variables. Returns 0 if X_dep == mu_dep)

• mu = Variable means

• sigma = Variable standard deviations

• d_person = Data frame consisting of Mahalanobis distance data for each person

• d_variable = Data frame consisting of variable characteristics

• label = label slot

## Examples

library(unusualprofile)
library(simstandard)

m <- "
Gc =~ 0.85 * Gc1 + 0.68 * Gc2 + 0.8 * Gc3
Gf =~ 0.8 * Gf1 + 0.9 * Gf2 + 0.8 * Gf3
Gs =~ 0.7 * Gs1 + 0.8 * Gs2 + 0.8 * Gs3
Math =~ 0.4 * Math1 + 0.9 * Math2 + 0.7 * Math3
Gc ~ 0.6 * Gf + 0.1 * Gs
Gf ~ 0.5 * Gs
Read ~ 0.4 * Gc + 0.1 * Gf
Math ~ 0.2 * Gc + 0.3 * Gf + 0.1 * Gs"
# Generate 10 cases
d_demo <- simstandard::sim_standardized(m = m, n = 10)

# Get model-implied correlation matrix
R_all <- simstandard::sim_standardized_matrices(m)$Correlations$R_all

cond_maha(data = d_demo,
R = R_all,
v_dep = c("Math", "Read"),
v_ind = c("Gf", "Gs", "Gc"))
#> Conditional Mahalanobis Distance = 0.9344, df = 2, p = 0.3537 Conditional Mahalanobis Distance = 1.6306, df = 2, p = 0.7354 Conditional Mahalanobis Distance = 0.7829, df = 2, p = 0.2640 Conditional Mahalanobis Distance = 1.6495, df = 2, p = 0.7435 Conditional Mahalanobis Distance = 1.7875, df = 2, p = 0.7976 Conditional Mahalanobis Distance = 1.2643, df = 2, p = 0.5503 Conditional Mahalanobis Distance = 0.8847, df = 2, p = 0.3239 Conditional Mahalanobis Distance = 0.4982, df = 2, p = 0.1167 Conditional Mahalanobis Distance = 2.7891, df = 2, p = 0.9795 Conditional Mahalanobis Distance = 0.3942, df = 2, p = 0.0748