Iterative Refining of Category Profiles for Nearest Centroid Cross-Domain Text Classification


Giacomo Domeniconi, Gianluca Moro, Roberto Pasolini, Claudio Sartori

In cross-domain text classification, topic labels for documents of a target domain are predicted by leveraging knowledge of labeled documents of a source domain, having equal or similar topics with possibly different words. Existing methods either adapt documents of the source domain to the target or represent both domains in a common space. These methods are mostly based on advanced statistical techniques and often require tuning of parameters in order to obtain optimal performances. We propose a more straightforward approach based on nearest centroid classification: profiles of topic categories are extracted from the source domain and are then adapted by iterative refining steps using most similar documents in the target domain. Experiments on common benchmark datasets show that this approach, despite its simplicity, obtains accuracy measures better or comparable to other methods, obtained with fixed empirical values for its few parameters.

Knowledge Discovery, Knowledge Engineering and Knowledge Management, Communications in Computer and Information Science 553, pp. 50-67,  2015.
Ana Fred, Jan L. G. Dietz, David Aveiro, Kecheng Liu, Joaquim Filipe (a cura di), Springer International Publishing.

@article{,
booktitle = {Knowledge Discovery, Knowledge Engineering and Knowledge Management},
year = 2015,
status = {Published},
url = {http://dx.doi.org/10.1007/978-3-319-25840-9_4},
editor = {Fred, Ana and Dietz, Jan L. G. and Aveiro, David and Liu, Kecheng and Filipe, Joaquim},
series = {Communications in Computer and Information Science},
publisher = {Springer International Publishing},
author = {Domeniconi, Giacomo and Moro, Gianluca and Pasolini, Roberto and Sartori, Claudio},
title = {Iterative Refining of Category Profiles for Nearest Centroid Cross-Domain Text Classification},
isbn = {978-3-319-25839-3},
abstract = {In cross-domain text classification, topic labels for documents of a target domain are predicted by leveraging knowledge of labeled documents of a source domain, having equal or similar topics with possibly different words. Existing methods either adapt documents of the source domain to the target or represent both domains in a common space. These methods are mostly based on advanced statistical techniques and often require tuning of parameters in order to obtain optimal performances. We propose a more straightforward approach based on nearest centroid classification: profiles of topic categories are extracted from the source domain and are then adapted by iterative refining steps using most similar documents in the target domain. Experiments on common benchmark datasets show that this approach, despite its simplicity, obtains accuracy measures better or comparable to other methods, obtained with fixed empirical values for its few parameters.},
pages = {50-67},
volume = 553,
doi = {10.1007/978-3-319-25840-9_4}}

Riviste & collane

Tags:

Pubblicazione

— autori/autrici

Giacomo Domeniconi, Gianluca Moro, Roberto Pasolini, Claudio Sartori

— a cura di

Ana Fred, Jan L. G. Dietz, David Aveiro, Kecheng Liu, Joaquim Filipe

— stato

pubblicato

— tipo

articolo in atti

Sede di pubblicazione

— volume

Knowledge Discovery, Knowledge Engineering and Knowledge Management

— collana

Communications in Computer and Information Science 553

— data di pubblicazione

2015

— pagine

50-67

— collana

Communications in Computer and Information Science 553

— data di pubblicazione

2015

URL & ID

pagina originale

— DOI

10.1007/978-3-319-25840-9_4

— print ISBN

978-3-319-25839-3

BibTeX

— BibTeX category
article

Partita IVA: 01131710376 - Copyright © 2008-2021 APICe@DISI Research Group - PRIVACY