In the last article, we have seen the broad classification of Machine Learning problems. Supervised and Unsupervised learning are discussed and used widely in appropriate problems. Lets see both in a little detail.
In real-life, we come across a lot of grouping problems. Group students by their percentage of marks into grades from A to D, group vehicles by their size and utility, so that appropriate tax can be charged, group similar fruits from fruit basket etc. Grouping a number of objects into sets is relatively a simple task for a human. A three year old can group fruits into different baskets. But, for a computer to solve this, there are a number of hurdles. First of all, computers have no prior knowledge about any objects at all. But, we can provide the computer with features of objects, like texture of a fruit, or the shape of a vehicle etc. But, as and when the computer sees a new object with a slight difference in the provided features, it fails immediately.
Classification is one type of grouping task. In classification, we have a predefined, finite number of classes into which all objects in the given problem set should belong. Lets see an example of a highway toll gate. There could only be a finite number of gates for all types of vehicles that could possibly go through the toll. Suppose these gates are labelled as CARS, VANS and TRUCKS. And in this case, vehicles are coming to the toll gate one-by-one. Now, suppose a AUTO RICKSHAW comes into the toll gate. Which is the most appropriate gate through which it can get across? Obviously the CARS gate. This decision is made by a human, because he knows the features of auto rickshaw matches the best with cars, not with the vans or with the trucks, even though an auto rickshaw is not precisely fit for a car. Now, to make this decision, the toll operator needs prior knowledge of features of both cars, vans and trucks. Without this knowledge, the toll operator may classify an auto rickshaw as a van or a truck.
Imagine the same example given above. But this time, instead of vehicles coming to the toll gate one-by-one, 1000 vehicles comes into the toll gate, all at once. And, individual gates at the toll has no labels on them. Instead, just N-number of un-labelled gates are provided. Now, even though the toll operator has never seen any vehicles in his life before, he can group them by shape, size number of seats etc. This time also, auto rickshaws gets grouped along with cars, not because it has many features of a class called CARS (in fact there are no classes here), but both cars and auto rickshaws share a number of features like size, weight and number of seats. The toll operator attains a similar result as in classification, even without prior knowledge of vehicles.
The classification discussed above is an example of supervised learning, and clustering is an example of unsupervised learning. Here is a table discussing both.
|Prior knowledge of classes||YES||NO|
|Use cases||Put a new sample into a known class||Suggest groups by looking at the entire sample set at once|
|Data Needs||Samples with class labels for training, and an unlabelled sample for classification.||Samples without any labelling. All samples are provided to the algorithm at once.|
|Number of classes||KNOWN||UNKNOWN|
|Decision making||Based on training obtained||By trying to find patterns in the data|
The term supervised is often confusing. It gives a false impression that, for the learning, someone has to supervise, typically a human. This is simply wrong. And the unsupervised learning, practically, it is not a “learning”. It is just a process of finding patterns in the data to explore(understand) the data better. I hope this article was helpful.