Discovering New Gene Functionalities from Random Perturbations of Known Gene Ontological Annotations

Giacomo Domeniconi, Marco Masseroli, Gianluca Moro, Pietro Pinoli

Proceedings of the 6th International Conference on Knowledge Discovery and Information Retrieval

2014

Genomic annotations describing functional features of genes and proteins through controlled terminologies and ontologies are extremely valuable, especially for computational analyses aimed at inferring new biomedical knowledge. Thanks to the biology revolution led by the introduction of the novel DNA sequencing technologies, several repositories of such annotations have becoming available in the last decade; among them, the ones including Gene Ontology annotations are the most relevant. Nevertheless, the available set of genomic annotations is incomplete, and only some of the available annotations represent highly reliable human curated information.
In this paper we propose a novel representation of the annotation discovery problem, so as to enable applying supervised algorithms to predict Gene Ontology annotations of different organism genes.
In order to use supervised algorithms despite labeled data to train the prediction model are not available, we propose a random perturbation method of the training set, which creates a new annotation matrix to be used to train the model to recognize new annotations.
We tested the effectiveness of our approach on nine Gene Ontology annotation datasets.
Obtained results demonstrated that our technique is able to improve novel annotation predictions with respect to state of the art unsupervised methods.

keywords Gene ontology, Biomolecular annotation prediction, Bioinformatics, Knowledge discovery, Supervised learning, Data representation