RF-OEX

Usage description

Outlier detection is available on the Explorer part of basic weka. Once the data set is loaded in Preprocess screen, switch to Outlier Panel. Outlier panel is used for searching most outlying instances and their Outlier Score. Make sure that parameter Number of Trees is hign enough (≈1000) to ensure accurate result. You can deal with overfitting by setting minimal number of instances per node - Min. per Node parameter - or maximum depth of tree - Maximum Depth of Tree parameter. Class parameter must be set appropriately according to data set class attribute. We recommend to keep Count with mistaken class penalty and Count with ambiguous classification penalty checked to o consider similarity of given instance with the rest of samples when calculating outlier factor. Parameter Bootstraping helps to make more varied trees. Once you are ready with setup, click on Start button. After the computation is done, resulting outliers scores appears on the left part of the panel. If Output summary information is checked, you can see the Summary Outlier Score section with instances sorted according to their outlier factor. In Outlier Score section you can find more details about each instance and its outlier score.
Interpreation button becomes available now. Click on it to open the Interpretation panel.

Choose one of two interpretation methods. There are two common parameters for both methods: There are two more parameters for second method:

On the screenshot below we can see result of outlier interptetation of Iris dataset.

On the first rows is overview of parameters settings.The second section describes outliers interpreation. Let's look on the interpretaion of most outlying instance number 71:

Instance number: 71, Class: Iris-versicolor
petalwidth=1.8, 0.88

These lines means that outlierness of instance number 71 is caused from 88% by value 1.8 of attribute petalwidth.

Now let's take a look on the third most outlying instance number 84:

Instance number: 84, Class: Iris-versicolor
petallength=5.1, 0.74
sepallength=6 && petallength=5.1, 0.26

Instance outlierness is caused from 74% by value of petallenght. There is also significant increase in outlierness if we combine attribute petalllenght with attribute sepallength. This combination participates in outlierness with 26%.
On the picture below you see both instances together with other instances from class Iris-versicolor. You can see that attribute petalwidth of instance 71 is really high compared to other instances.
For instance 84, value petallength is relatively high, althought there are other five instances with petallength >= 4.8. If we look at combination of attributes petallength and sepallength, we see that althought there are several instances with sepallength≈6, the combination with high value of petallength is quite unique.
Notice, that interpretations from RF-OEX corresponds with observations above.