21 Diagnostic Analytics

Diagnostic analytics is the second stage in the analytics maturity model, following descriptive analytics and preceding predictive and prescriptive analytics. While descriptive analytics answers “What happened?”, diagnostic analytics focuses on “Why did it happen?” by identifying relationships, patterns, and root causes in data.

By using data mining, correlations, drill-down analysis, and statistical tests, diagnostic analytics helps businesses and researchers uncover causal relationships and hidden insights.

Importance of Diagnostic Analytics

Identifies the root causes of business outcomes.
Helps optimize operations by understanding key influencers.
Supports better decision-making through data-driven insights.
Provides a bridge between descriptive analytics (what happened?) and predictive analytics (what will happen?).

Techniques Used in Diagnostic Analytics

Drill-Down Analysis

Breaks down aggregated data into smaller subcategories to find specific causes.
Example: If total sales drop, drill-down analysis can reveal whether a specific product category, region, or customer segment is responsible.

Data Mining

Identifies hidden patterns in data using machine learning and AI.
Example: A company might discover that high customer churn is linked to late customer service response times.

Correlation and Regression Analysis

Correlation Analysis measures the strength of relationships between two variables.
Regression Analysis identifies how one or more independent variables influence a dependent variable.
Example: Analyzing how marketing spend impacts sales revenue.

Hypothesis Testing

Uses statistical tests like the t-test, chi-square test, and ANOVA to validate assumptions.
Example: Testing whether customer complaints are significantly higher in one region compared to others.

Time Series Analysis

Examines data over time to detect trends and anomalies.
Example: Website traffic drops after a website redesign, revealing UX issues.

21.0.1 Example Use Cases

Business Intelligence

Retailers analyze customer purchase history to find out why certain products perform better in specific seasons.
Banks detect fraud patterns by analyzing transaction data.

Healthcare

Hospitals use diagnostic analytics to find why patient readmission rates are high by identifying risk factors.
Drug companies analyze clinical trial data to identify side effects and treatment effectiveness.

Human Resources (HR)

HR departments use diagnostic analytics to understand why employee turnover is increasing.
Surveys and performance data are analyzed to correlate engagement levels with attrition rates.

21.1 Parametric VS Non-Parametric Tests

Parametric and non-parametric tests are two broad categories of statistical tests used in hypothesis testing. The choice between them depends on the type of data you’re analyzing and the assumptions you can make about that data. Here’s a comparison of the two:

Parametric Tests

Assumptions: Parametric tests assume that the data follows a specific distribution, usually a normal distribution. They also assume homogeneity of variances and the data is measured on an interval or ratio scale.
Data Requirements: These tests require the data to be quantitative and typically need to meet assumptions about the distribution of the data (e.g., normality). They are more suitable for data measured on an interval or ratio scale.
Examples: Common parametric tests include the t-test (used to compare the means of two groups), ANOVA (used to compare the means of three or more groups), and the Pearson correlation coefficient (used to assess the strength and direction of the linear relationship between two continuous variables).
Advantages: When their assumptions are met, parametric tests are generally more powerful than non-parametric tests, meaning they are more likely to detect a true effect when one exists.
Disadvantages: The main drawback is that if the assumptions are not met, the results of the parametric tests may not be valid.

21.1.1 Non-Parametric Tests

Assumptions: Non-parametric tests do not assume that the data follows a specific distribution. They are distribution-free tests and are less strict about data assumptions.
Data Requirements: These tests can be used on data that is not normally distributed, ordinal data, or when the sample size is small. They are more flexible in terms of the types of data they can handle.
Examples: Common non-parametric tests include the Mann-Whitney U test (used to compare two independent groups), the Kruskal-Wallis test (used to compare three or more independent groups), and the Spearman rank correlation coefficient (used to assess the strength and direction of the relationship between two variables that may not be linear).
Advantages: The main advantage is their flexibility; they can be used when parametric test assumptions are not met. They are also useful for ordinal data or for data with outliers that might affect parametric test results.
Disadvantages: Non-parametric tests are generally less powerful than parametric tests when the assumptions of the latter are met. This means they might not detect an effect that is actually there, especially if the sample size is not large enough.

21.1.2 Choosing Between Them

Data Distribution and Scale: If your data is normally distributed and measured on an interval or ratio scale, a parametric test might be more appropriate. If your data does not meet these criteria, consider a non-parametric test.
Sample Size: Parametric tests typically require larger sample sizes than non-parametric tests, although non-parametric tests might also need larger samples to have adequate power.
Research Question: The choice can also be influenced by the specific research question and the nature of the data.

In summary, the choice between parametric and non-parametric tests depends on the characteristics of your data and your specific research needs. Understanding the assumptions and requirements of each type of test is crucial in making the right choice.

Feature	Parametric Tests	Non-Parametric Tests
Assumptions	Assumes specific distribution (usually normal). Assumes homogeneity of variances. Data measured on interval/ratio scale.	No specific distribution assumed. Less strict about data assumptions.
Data Requirements	Quantitative data that meets distribution assumptions. Interval/ratio scale.	Suitable for non-normally distributed data, ordinal data, or small sample sizes. Flexible.
Examples	t-test, ANOVA, Pearson correlation coefficient	Mann-Whitney U test, Kruskal-Wallis test, Spearman rank correlation
Advantages	More powerful when assumptions are met. Can detect true effects more likely.	Flexible, can be used when parametric assumptions not met. Useful for ordinal data/outliers.
Disadvantages	Results may not be valid if assumptions are not met.	Generally less powerful than parametric tests, might not detect an effect that is actually there.
Choice Considerations	Use if data is normally distributed and measured on an interval or ratio scale.	Use for non-normally distributed data, ordinal data, or when sample size is small.