|
|
Pedagogical SKE algorithms require a dataset for the extraction of symbolic knowledge from a predictor. The goal of this thesis is to investigate how the knowledge is affected by the choice of the dataset. Very often the same training set used to train the predictor is used in the extraction. After that, scientists compute the fidelity score of the knowledge w.r.t. the predictor (i.e., an accuracy computed not on the original test set, but on the test set with the output labels/values of the predictor).
Some research questions to be answered are:
1) how does the fidelity change if the dataset is not representative of the population?
2) how does the knowledge/fidelity change with different kind of SKE algorithms?
3) there exist SKE algorithms that are (more) robust to dataset changing?
4) how do behave SKE algorithms if a predictor has low accuracy? Does knowledge still have high fidelity?
It would be interesting (and mandatory) to use different SKE algorithms, predictors and datasets. Concerning SKE algorithms choose at least 4 of them (one from each category):
a) one based on decision trees (e.g., CART, C4.5, etc.);
b) one based on hypercubes (e.g., ITER, GridEx, etc.);
c) one NOT based on hypercubes or decision trees (e.g., REAL, Trepan, etc.);
d) one decompositional SKE algorithm to compare it with the pedagogical ones.
keywords
eXplainable AI; symbolic knowledge extraction; PSyKE