The Wine Dataset

These wine dataset are derived from the results of a chemical analysis of wines grown in Italy by three different types of grapes [8]. The analysis verifies the amount of 13 chemical substances found in each of the three types of wines. The wine data consists of a total of 178 samples for three classes, of 59 instances in class 1, 71 in class 2, and, 48 in class 3, this means that the classes are not well separated. In addition, it has 13 continuous attributes from three classes which are stated as follow: Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, and Proline. There is no missing attribute value for this dataset [9]. The wines are classified into class1, class 2 or class 3 types. Each instance or attribute in each colomn belongs to either class 1, 2 or 3. From the scaled data given in the table below, the sample belong to class 1 which means it is a good quality wine. Each sample shows the percentage constituent of the thirteen attributes found in each of the three types of wine. Class 1 is represented by α , Class 2 is represented by β, and class 3 is represented by γ [2][3].
Table 1: Percentage composition of three different samples for the three classes
Samples Attribute | Class = Types of wine | ||
α | β | γ | |
Alcohol | 0.678948 | 1:-0.152631 | -0.0368422 |
Malic acid | 0.284585 | -0.754941 | -0.758893 |
Ash | 0.229946 | -0.294118 | 0.0267379 |
Alcalinity of ash | -0.731959 | -0.360825 | -0.237113 |
Magnesium | 0.26087 | -0.347826 | 0.130435 |
Total phenols | 0.393103 | -0.282759 | -0.634483 |
Flavanoids | 0.139241 | -0.548523 | -0.616034 |
Nonflavanoid phenols | 0.735849 | 0.509434 | -0.698113 |
Proanthocyanins | 0.0536277 | 9:-0.867508 | -0.665615 |
Color intensity | 0.348123 | 10:-0.237201 | -0.518771 |
Hue | -0.333333 | -0.186992 | -0.544715 |
OD280/OD315 of diluted wines | 0.655678 | -0.765568 | -0.985348 |
Proline | -0.312411 | -0.754636 | -0.49786 |
The table 1 above shows the percentage composition of three different scaled samples in the different classes respectively. In other to distinguish wine in class 1 from the one in class two, the feature mask table helps to explain the important features for differentiation.
Table 2: Feature mask table
Classes | Feature mask/Attributes | ||||||||||||
N0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
2 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 |
3 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
From the feature mask table, the feature subset {2, 4, 5, 6, 7, 9, 11, 12} is important for distinguishing class 1 from other classes, feature subset {3, 4, 5, 6, 7, 10, 11, 12, 13} is used to distinguish class 2 from the other two classes and the feature subset {2, 3, 11, 12, 13} is used to discriminate class 3 from other classes. The determinant power for distinguishing the different classes is found in the subset of each class. Different feature subset are selected for different classes depending on its ability to discriminate one class from the other, this helps to provide a new direction for analysing the relationship between features and classes [1].
LIBSVM version 2.92 was used to perform the training on the wine dataset, testing and prediction. In SVM train, svm-train with (-s 0) which is the default setup type of the SVM was used and (-t 0) which represents the radial base function kernel option parameter.
The radial base function kernel requires the "gamma" and "C", these two are obtained from the cross validation table (Table 3) below. The table 4 is a zoomed table used to get a better gamma and C values.
(Table 3) The cross-validation accuracy table for wine dataset
(Table 4) The cross-validation zoomed in accuracy table
After the model was obtained from the trained data set examples, testing was performed by applying the tested data set to the model / classifier. The classification file result and it's accuracy value was obtained.