Immersing self in machine learnings, regression and classification problems can be solved through a variety of steps. For this week the focus is on:

What new skills have you learned?

๐Ÿ“ฆ K Nearest Neighbors

๐Ÿ“ฆ Decision Trees

๐Ÿ“ฆ Random Forests

K Nearest Neighbors

KNN is a classification algorithm that classifies elements in a dataset based on features of the closest (nearest) points. K is used to set the no. of nearest neighbors that is used to classify an entity.

Key components used in creating the classifier are;

๐Ÿ“ญ Distance Metric

๐Ÿ”Ž No. of Nearest neighbors to look at.

โ›ฒ Optional weighting function

๐Ÿ’ฅ Method of aggregating neighboring points.;Usually defaults to Simple majority vote

Here is a notebook on fruit classification ๐ŸŽ using K Nearest Neighbors.

Decision Trees.

Decision trees are a widely used models for classification and regression tasks. A set of splitting rules is used to segment the predictor via a hierarchy of โ€œif-elseโ€ questions, leading to a decision.

๐ŸŽ‹ Nodes : Split the value of attributes.

๐ŸŒด Edges : These are outcomes of a split to the next node.

๐ŸŒฒ Root : Node that does the first split.

๐Ÿƒ Terminal nodes that predict the outcome.

Each node in the tree either represents a question, or a terminal node (also called a leaf) which contains the answer.

The edges connect the answers to a question with the next question you would ask.

Random Forests

Random forests incorporates use of many trees with a random sample of features for every single tree at every single split.

Each time a split in a tree is considered, a random sample of m predictors is chosen as split candidates from the full set of p predictors. The split is allowed to use only one of those m predictors.

For classification, m, is typically chosen to be (squareroot of P == m). (that is, the number of predictors considered at each split is approximately equal to the square root of the total number of predictors )

By randomly leaving out features, random forests decorellates the trees providing an improvement over the trees.

Check out this Decision Tree and Random Forests notebook working on sample kyphosis dataset - (excessive outward curvature of the spine) among patients

So that was the Seventh week.. ๐Ÿ”