The Ionosphere Dataset
The Johns Hopkins University Ionosphere database is taken from the UCI Repository of Machine Learning Databases donated by Vince Sigillito in 1989. This dataset has been used in the past for classification of radar returns from the ionosphere using neural networks by Sigillito. This radar data was collected by a system which consists of phased array of 16 high-frequency antennas together with a total transmitted power on the order of 6.4 kilowatts.
The free electrons in the ionosphere were the target. Those that show the evidence of some structure in the ionosphere are Good radar returns while bad returns are those that do not. There signals were passed through the ionosphere, the received signals were processed with the use of an auto-correlation function with two arguments which are the time of a pulse and the pulse number. The figure shows the constituents of the ionosphere;
Figure 1: Ionosphere components[11]
There Goose Bay system has 17 pulse numbers. Instances in this database are described by 2 attributes per pulse number that corresponds to the complex values returned by the function resulting from the complex electromagnetic signal. In summary, the ionosphere data set consist of a data frame with 351 instances, on 35 variables or attributes. The first 34 continuous attributes are used for the prediction and the last one is the class attribute. Each one of the 35thattribute is either g("good") or b ("bad") according to the definition summarized above [6].
In general, the Ionosphere data set describes a binary classification task where two types of electrons are targeted in the ionosphere by the radar signals, those that show some structure (good) and those that do not (bad). The ionosphere is a small fraction of atmospheric material with great influence on the passage of radio waves. Most of the ionosphere is electrically neutral, but when solar radiation strikes the chemical component of the atmosphere electrons are knocked loose from atoms and molecules to produce the ionospheric plasma. The term “layer” represent the ionization within a region in the ionosphere.
Ionosphere is related to sun radiations emission which is as result of the movement of the Earth about the sun or changes in the sun's activity will result in variations in the ionosphere which are are of two general types: (1) the ones which are more or less regular and occur in cycles and can be predicted in advance with reasonable accuracy, and (2) those which are irregular because of abnormal behavior of the sun and, therefore, cannot be predicted in advance.
Few of the attributes of the ionosphere data set are (The link: http://code.google.com/p/exegete/wiki/TutorialDataset): angle of the sun, distance of ht communication sudden ionization, ionization storms, sporadic E, 11 year sun spot cycle, 27 days sun spot cycle e.t.c. All these attribute has various effects on the radio wave propagation on the ionosphere [5].
Similar to the wine data set, the LIBSVM version 2.92 was used for the training, testing and prediction. For training, the default setting svm-train with (-s 0) was used, where (-t 2) represents the radial base function kernel option. The two variables, gamma and C are needed by the kernel.These two variables are obtained from the cross validation accuracy table shown below. The table 2 is used to obtain a better value of gamma and C.
(Table 1) The cross-validation accuracy table
Testing was performed by applying the test data set to the model / classifier after getting the model from the trained examples. The classification file was obtained and its accuracy value.
Related links :