Immersing self in machine learnings, regression and classification problems can be solved through a variety of steps. For this week the focus is on:
What new skills have you learned?
๐ฆ K Nearest Neighbors
๐ฆ Decision Trees
๐ฆ Random Forests
K Nearest Neighbors
KNN is a classification algorithm that classifies elements in a dataset based on features of the closest (nearest) points. K is used to set the no. of nearest neighbors that is used to classify an entity.
Key components used in creating the classifier are;
๐ญ Distance Metric
๐ No. of Nearest
neighbors to look at.
โฒ Optional weighting function
๐ฅ Method of aggregating neighboring points.;Usually defaults to Simple majority vote
Here is a notebook on fruit classification ๐ using K Nearest Neighbors.
Decision Trees.
Decision trees are a widely used models for classification and regression tasks. A set of splitting rules is used to segment the predictor via a hierarchy of โif-elseโ questions, leading to a decision.
๐ Nodes : Split the value of attributes.
๐ด Edges : These are outcomes of a split to the next node.
๐ฒ Root : Node that does the first split.
๐ Terminal nodes that predict the outcome.
Each node in the tree either represents a question, or a terminal node (also called a leaf) which contains the answer.
The edges connect the answers to a question with the next question you would ask.
Random Forests
Random forests incorporates use of many trees with a random sample of features for every single tree at every single split.
Each time a split in a tree is considered, a random sample of m predictors is chosen as split candidates from the full set of p predictors. The split is allowed to use only one of those m predictors.
For classification, m
, is typically chosen to be (squareroot of P == m).
(that is, the number of predictors considered at each split is approximately equal to the square root of the total number of predictors )
By randomly leaving out features, random forests decorellates the trees providing an improvement over the trees.
Check out this Decision Tree and Random Forests notebook working on sample kyphosis dataset - (excessive outward curvature of the spine) among patients
So that was the Seventh week.. ๐