Minitab Machine Learning Training: Complete details about all topics

Machine Learning

This course in minitab training is designed to enhance your data analysis proficiency through practical problem scenarios. It aims to impart skills in exploring and characterizing relationships between variables using real-world examples. You will gain proficiency in employing supervised machine learning methods, such as CART®, to dissect patterns within historical data. This skill set will aid you in deriving deeper insights, spotting potential risks, identifying avenues for enhancement, and making predictive assessments about the future.

In addition, you will learn to harness the power of unsupervised machine learning tools, including Clustering, to uncover natural divisions within the data and group observations or variables into homogeneous sets. Moreover, you will acquire the capability to condense data dimensionality by transforming the original dataset into a collection of uncorrelated variables.

Topics Included:

Discriminant Analysis

Employ Discriminant Analysis to categorize observations into multiple groups when working with a sample that has pre-defined classifications.
Using this analysis, you can do the following:
- Assess the precision of classifying observations into established groups.
- Analyze the capacity of predictor variables to distinguish between the groups.
- Forecast the groups for observations lacking pre-defined classifications.

Test Set Validation

The Test set validation method is set as the default when the row count exceeds 5000. In various instances, the data is divided with 70% allocated for training and 30% for testing.

K-fold Cross-Validation

K-fold cross-validation is automatically chosen as the default technique in Minitab when the dataset contains 2000 cases or fewer. Given that this process is iterated K times, cross-validation generally takes more time compared to validation using test data.

CART Classification

Utilize CART Classification to construct a decision tree that addresses binomial or multinomial categorical responses, encompassing numerous categorical and continuous predictor variables. This approach effectively showcases significant patterns and connections between a categorical response and crucial predictors within intricate datasets, all without relying on parametric methodologies.

Correlation

Utilize the Correlation tool to gauge the intensity and direction of the connection between two variables. You have the option of selecting between two correlation techniques: the Pearson product-moment correlation and the Spearman rank-order correlation. The Pearson correlation (referred to as "r"), often used, quantifies the linear correlation between two continuous variables.
In cases where the connection isn't linear, the Spearman rank-order correlation (also known as Spearman's rho) can be employed. This method assesses the monotonic relationship between two continuous or ordinal variables.

CART Regression

Leverage CART Regression to generate a decision tree for continuous responses involving a mix of categorical and continuous predictor variables. This method effectively elucidates significant patterns and associations between a continuous response and critical predictors within intricate datasets, all without resorting to parametric techniques.

Cluster Analysis

Cluster Analysis seeks to establish clusters where cases within a cluster exhibit greater similarity to each other than to cases in other clusters. Essentially, this involves utilizing data to categorize objects into distinct groups—a skill that's inherent and common to us all. In everyday life, we group items based on their inherent characteristics. Similarly, in Cluster Analysis, the concepts of "similarity" and "distance" are employed to achieve the same grouping objective. Within Minitab, a hierarchical clustering approach is employed. The process commences with individual member clusters, which are then merged to create larger clusters (referred to as an agglomerative method).