Calculate the conditional Mahalanobis distance for any variables.
cond_maha(
data,
R,
v_dep,
v_ind = NULL,
v_ind_composites = NULL,
mu = 0,
sigma = 1,
use_sample_stats = FALSE,
label = NA
)
Data.frame with the independent and dependent variables. Unless mu and sigma are specified, data are assumed to be z-scores.
Correlation among all variables.
Vector of names of the dependent variables in your profile.
Vector of names of independent variables you would like to control for.
Vector of names of independent variables that are composites of dependent variables
A vector of means. A single value means that all variables have the same mean.
A vector of standard deviations. A single value means that all variables have the same standard deviation
If TRUE, estimate R, mu, and sigma from data. Only complete cases are used (i.e., no missing values in v_dep, v_ind, v_ind_composites).
optional tag for labeling output
a list with the conditional Mahalanobis distance
dCM
= Conditional Mahalanobis distance
dCM_df
= Degrees of freedom for the conditional Mahalanobis distance
dCM_p
= A proportion that indicates how unusual this profile is
compared to profiles with the same independent variable values. For example,
if dCM_p
= 0.88, this profile is more unusual than 88 percent of profiles
after controlling for the independent variables.
dM_dep
= Mahalanobis distance of just the dependent variables
dM_dep_df
= Degrees of freedom for the Mahalanobis distance of
the dependent variables
dM_dep_p
= Proportion associated with the Mahalanobis distance
of the dependent variables
dM_ind
= Mahalanobis distance of just the independent variables
dM_ind_df
= Degrees of freedom for the Mahalanobis distance of
the independent variables
dM_ind_p
= Proportion associated with the Mahalanobis distance
of the independent variables
v_dep
= Dependent variable names
v_ind
= Independent variable names
v_ind_singular
= Independent variables that can be perfectly
predicted from the dependent variables (e.g., composite scores)
v_ind_nonsingular
= Independent variables that are not perfectly
predicted from the dependent variables
data
= data used in the calculations
d_ind
= independent variable data
d_inp_p
= Assuming normality, cumulative distribution function
of the independent variables
d_dep
= dependent variable data
d_dep_predicted
= predicted values of the dependent variables
d_dep_deviations = d_dep - d_dep_predicted
(i.e., residuals of
the dependent variables)
d_dep_residuals_z
= standardized residuals of the dependent
variables
d_dep_cp
= conditional proportions associated with
standardized residuals
d_dep_p
= Assuming normality, cumulative distribution function
of the dependent variables
R2
= Proportion of variance in each dependent variable explained
by the independent variables
SEE
= Standard error of the estimate for each dependent variable
ConditionalCovariance
= Covariance matrix of the dependent
variables after controlling for the independent variables
distance_reduction = 1 - (dCM / dM_dep)
(Degree to which the
independent variables decrease the Mahalanobis distance of the dependent
variables. Negative reductions mean that the profile is more unusual
after controlling for the independent variables. Returns 0
if dM_dep
is 0.)
variability_reduction = 1 - sum((X_dep - predicted_dep) ^ 2) / sum((X_dep - mu_dep) ^ 2)
(Degree to which the independent variables
decrease the variability the dependent variables (X_dep
).
Negative reductions mean that the profile is more variable after
controlling for the independent variables. Returns 0 if X_dep == mu_dep
)
mu
= Variable means
sigma
= Variable standard deviations
d_person
= Data frame consisting of Mahalanobis distance data for
each person
d_variable
= Data frame consisting of variable characteristics
label
= label slot
library(unusualprofile)
library(simstandard)
m <- "
Gc =~ 0.85 * Gc1 + 0.68 * Gc2 + 0.8 * Gc3
Gf =~ 0.8 * Gf1 + 0.9 * Gf2 + 0.8 * Gf3
Gs =~ 0.7 * Gs1 + 0.8 * Gs2 + 0.8 * Gs3
Read =~ 0.66 * Read1 + 0.85 * Read2 + 0.91 * Read3
Math =~ 0.4 * Math1 + 0.9 * Math2 + 0.7 * Math3
Gc ~ 0.6 * Gf + 0.1 * Gs
Gf ~ 0.5 * Gs
Read ~ 0.4 * Gc + 0.1 * Gf
Math ~ 0.2 * Gc + 0.3 * Gf + 0.1 * Gs"
# Generate 10 cases
d_demo <- simstandard::sim_standardized(m = m, n = 10)
# Get model-implied correlation matrix
R_all <- simstandard::sim_standardized_matrices(m)$Correlations$R_all
cond_maha(data = d_demo,
R = R_all,
v_dep = c("Math", "Read"),
v_ind = c("Gf", "Gs", "Gc"))
#> Conditional Mahalanobis Distance = 1.3222, df = 2, p = 0.5828 Conditional Mahalanobis Distance = 1.0531, df = 2, p = 0.4256 Conditional Mahalanobis Distance = 0.2780, df = 2, p = 0.0379 Conditional Mahalanobis Distance = 0.6676, df = 2, p = 0.1998 Conditional Mahalanobis Distance = 0.9099, df = 2, p = 0.3390 Conditional Mahalanobis Distance = 0.0803, df = 2, p = 0.0032 Conditional Mahalanobis Distance = 0.9747, df = 2, p = 0.3782 Conditional Mahalanobis Distance = 0.8846, df = 2, p = 0.3238 Conditional Mahalanobis Distance = 2.7419, df = 2, p = 0.9767 Conditional Mahalanobis Distance = 0.9059, df = 2, p = 0.3366