Automatic domain-specific learning: towards a methodology for ontology enrichment

Pedro Ureña Gómez-Moreno; Eva M. Mestre-Mestre

Autores/as

Pedro Ureña Gómez-Moreno Universidad de Las Palmas de Gran Canaria
Eva M. Mestre-Mestre Universitat Politècnica de València

Palabras clave:

Ontology learning, FunGramKB, Corpus, Terminology, Biology

Resumen

At the current rate of technological development, in a world where enormous amount of data are constantly created and in which the Internet is used as the primary means for information exchange, there exists a need for tools that help processing, analyzing and using that information. However, while the growth of information poses many opportunities for social and scientific advance, it has also highlighted the difficulties of extracting meaningful patterns from massive data. Ontologies have been claimed to play a major role in the processing of large-scale data, as they serve as universal models of knowledge representation, and are being studied as possible solutions to this. This paper presents a method for the automatic expansion of ontologies based on corpus and terminological data exploitation. The proposed “ontology enrichment method” (OEM) consists of a sequence of tasks aimed at classifying an input keyword automatically under its corresponding node within a target ontology. Results prove that the method can be successfully applied for the automatic classification of specialized units into a reference ontology.

Descargas

Biografía del autor/a

Pedro Ureña Gómez-Moreno, Universidad de Las Palmas de Gran Canaria

Pedro Ureña Gómez-Moreno is Assistant professor at the Department of Didactics of Language and Literature at the University of Granada (Spain), where he develops most of his teaching and research activity. His teaching focuses on Natural Language Processing, Corpus Linguistics and English as a Second Language, both at the University of Granada and the UNED. His main areas of research are Morphosyntax and Lexicology within the frameworks of Corpus Linguistics and Natural Language Processing, with a special interest in Terminology and Knowledge Engineering applied to the development of FunGramKB Knowledge Base. A second line of research concerns the application of new technologies to language teaching and the development of virtual courses. He has authored and co-authored a number of refereed book chapters in Mouton de Gruyter and John Benjamins, as well as several articles in national and international journals, including The International Journal of Corpus Linguistics, Onomázein or The LSP Journal.

Eva M. Mestre-Mestre, Universitat Politècnica de València

Eva M. Mestre-Mestre works as associate professor at Universitat Politècnica de València. Since her Ph.D. thesis on the pragmatic implications of errors in English as a second language, her research has focused on Pragmatics, English learning in higher education, and corpus management, including computational linguistics, resulting in publications indexed in nationally and internationally prestigious journals, such as RESLA, or the Yearbook of Pragmatics. Apart from several book chapters, she has co-edited Understanding Meaning and Knowledge Representation for Cambridge Scholars Press. She was a visitor researcher in several European and American universities. She is currently the director of the panel on pragmatics in the Spanish Society for Applied Linguistics, and director of the panel on ESP in the Spanish Society for Corpus Linguistics.

Citas

Alfonseca, E. & Manandhar, S. (2002). Extending a lexical ontology by a combination of distributional semantics signatures. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (pp. 1–7). Berlin: Springer

Baltimore, D. (1971). Expression of animal virus genomes. Bacteriological Review, 35(3), 235-241.

Bendaoud, R., Toussaint, Y. & Napoli, A. (2008). PACTOLE: A methodology and a system for semi-automatically enriching an ontology from a collection of texts. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5113 LNAI, 203–216.

Biemann, C. (2005). Ontology learning from text: A survey of methods. LDV-Forum, 20(2), 75–93.

Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In W. Brinkman, J. Broekens & D. Heylen (Eds.), In Proceedings of the Biennial GSCL Conference (pp. 31-40). Potsdam.

Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing (pp. 152-155).

Ciaramita, M., Gangemi, A., Ratsch, E., Saric, J. & Rojas, I. (2005). Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI) (pp. 659–664). Professional Book Center.

Davies, M. (2008-). The corpus of contemporary American English (COCA): 520 million words, 1990-present. <http://corpus.byu.edu/coca/> [24/03/2017].

Dorr, B. & Jones, D. (1996). Acquisition of semantic lexicons: Using word sense disambiguation to improve precision. In Proceedings of the SIGLEX Workshop on Breadth and Depth of Semantic Lexicons, (pp. 42–50).

Faatz, A. & Steinmetz, R. (2002). Ontology enrichment with texts from the WWW. In the Semantic Web Mining Conference, WS02.

Gacitua, R., Sawyer, P. & Rayson, P. (2008). A flexible framework to experiment with ontology learning techniques. Knowledge-Based Systems, 21(3), 192–199.

Gamallo, P., Gonzalez, M., Agustini, A., Lopes, G. & Delima, V. (2002). Mapping syntactic dependencies onto semantic relations. In Proceedings of the ECAI Workshop on Machine Learning and Natural Language Processing for Ontology (pp. 15-22).

Gasteiger E., Gattiker A., Hoogland C., Ivanyi I., Appel R. D. & Bairoch A. (2003). ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Research, 31(13), 3784–3788.

Gómez-Pérez, A. & Manzano-Macho, D. (2004). An overview of methods and tools for ontology learning from texts. The Knowledge Engineering Review, 19(3), 187–212.

Gruber, T. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), 199–220.

Gupta, K. M., Aha, D., Marsh, E. & Maney, T. (2002). An architecture for engineering sublanguage WordNets. In Proceedings of the First International Conference on Global WordNet (pp. 207–215). Central Institute of Indian Languages, Mysore.

Haase, P. & Stojanovic, L. (2005). Consistent Evolution of OWL Ontologies. In A. Gómez-Pérez & J. Euzenat (Eds.), ESWC 2005. LNCS, vol. 3532, (pp. 182–197). Heidelberg: Springer.

Hahn, U. & Schnattinger, K. (1998). Towards text knowledge engineering. In Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence American Association for Artificial Intelligence (pp. 524–531).

Hippisley, A., Cheng, D. & Ahmad, K. (2005). The head-modifier principle and multilingual term extraction. Natural Language Engineering, 11(2), 129–157.

Hotho, A. Madche, A. & Staab, S. (2001). Ontology-based text clustering. In Proceedings of the IJCAI-2001 Workshop Text Learning: Beyond Supervision (pp. 48–54). Seattle, USA.

Hwang, C. (1999). Incompletely and imprecisely speaking: Using dynamic ontologies for representing and retrieving information. In Proceedings of the 6th International Workshop on Knowledge Representation meets Databases.

Ide, N. & Véronis, J. (1998). Word sense disambiguation: The state of the art. Computational Linguistics, 24(1), 1–40.

Khan, L. & Luo, F. (2002). Ontology construction for information selection. In Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence (pp. 122–127). Crystal City, Virginia.

Klein, D. & Manning, C. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Meeting of the Association for Computational Linguistics (pp. 423-430).

Lee, C. S., Kao, Y. F., Kuo, Y. H. & Wang, M. H. (2007). Automated ontology construction for unstructured text documents. Data and Knowledge Engineering, 60(3), 547–566.

Lee, J., Kim, J. & Park, J. (2006). Automatic extension of gene ontology with flexible identification of candidate terms. Bioinformatics, 22(6), 665–670.

Lima, R., Oliveira, H., Freitas, F. & Espinasse, B. (2014). Ontology population from the web: An inductive logic programming-based approach. ITNG 2014. In Proceedings of the 11th International Conference on Information Technology: New Generations (pp. 473–478).

Lin, D. (1994). Principar: An efficient, broad-coverage, principle-based parser. In Proceedings of the 15th International Conference on Computational Linguistics (COLING) (pp.482-488). Kyoto, Japan.

Lisi, F. A. (2005). Principles of inductive reasoning on the semantic web: A framework for learning in AL-Log. In F. Fages, & S. Soliman (Eds.), PPSWR 2005. LNCS, vol. 3703 (pp. 118–132). Heidelberg: Springer.

Liu, W., Weichselbraun, A. & Chang, E. (2005). Semi-automatic ontology extension using spreading activation. Journal of Universal Knowledge Management, 0(1), 50–58.

MacNamara, J. (1982). Names for things: A study of human learning. Cambridge, MA: MIT Press.

Missikoff, M., Navigli, R. & Velardi, P. (2002). Integrated approach to Web ontology learning and engineering. IEEE Computer, 35(11), 60–63.

Mitchell T. (1997). Machine Learning. New York: McGraw-Hill.

Navigli, R. & Velardi, P. (2002). Automatic adaptation of WordNet to domains. In Proceedings of 3rd International Conference on Language Resources and Evaluation (pp. 1023-1027).

Navigli, R. & Velardi, P. (2004). Learning domain ontologies from document warehouses and dedicated web sites. Computational Linguistics, 30(2), 151–179.

Navigli, R., Velardi, P., Cucchiarelli, A. & Neri, F. (2004). Quantitative and qualitative evaluation of the OntoLearn ontology learning system. In Proceedings of the 20th International Conference on Computational Linguistics.

O’Hara, T., Mahesh, K. & Nirenburg, S. (1998). Lexical acquisition with WordNet and the Mikrokosmos Ontology. In Proceedings of the ACL Workshop on the Use of WordNet in NLP (pp. 94–101).

Pedersen, T., Patwardhan, S. & Michelizzi, J. (2004). WordNet:similarity: Measuring the relatedness of concepts. In Proceedings of the Demonstration Papers at the Conference of the North American Chapter of the Association for Computational and Linguistics: Human Language Technologies (HLT-NAACL).

Periñán-Pascual, C. (2013). Towards a model of constructional meaning for natural language understanding. In B. Nolan & E. Diedrichsen (eds.) Linking constructions into Functional Linguistics: The role of constructions in grammar (pp. 205–230). Amsterdam/Philadelphia: John Benjamins.

Periñán-Pascual, C. (2017). Bridging the gap within text-data analytics: A computer environment for data analysis in linguistic research. Revista de Lenguas para Fines Específicos, 23(2), 111-132.

Periñán-Pascual, C. & Arcas Túnez, F. (2010). The architecture of FunGramKB. 7th International Conference on Language Resources and Evaluation, Valeta (Malta). In Proceedings of the Seventh International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA) (pp. 2667–2674).

Periñán-Pascual, C. & Mairal Usón, R. (2009). Bringing Role and Reference Grammar to natural language understanding. Procesamiento del Lenguaje Natural 43, 265–273.

Periñán-Pascual, C. & Mairal Usón, R. (2010). Enhancing UniArab with FunGramKB. Procesamiento del Lenguaje Natural, 44, 19–26.

Poon, H. & Domingos, P. (2010). Unsupervised ontology induction from text. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 296–305).

Princeton University (2010) About WordNet. WordNet. Princeton University <http://wordnet.princeton.edu> [24/03/2017].

Roussinov, D. & Zhao, J. L. (2003). Automatic discovery of similarity relationships through web mining. Decision Support Systems, 35(1), 149–166.

Salton, G. & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing Management, 24(5), 513–523.

Salton, G., Wong, A. & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.

Shamsfard, M. & Barforoush, A. (2004). Learning ontologies from natural language texts. International Journal of Human-Computer Studies, 60(1), 17–63.

Studer, R., Benjamins, V. R. & Fensel, D. (1998). Knowledge engineering: Principles and methods. Data & Knowledge Engineering, 25(1-2), 161–197.

Udrea, O. & Getoor, L. (2007). Combining statistical and logical inference for ontology alignment. In Proceedings of the Workshop on Semantic Web for Colaborative Knnowledge Acquisition, IJCAI (pp. 51–58). Hyderabad, India.

Widdows, D., Dorow, B. & Chan, Ch. (2002). Using parallel corpora to enrich multilingual lexical resources. In Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 240–245). Las Palmas, Spain.

Wong, W., Liu, W. & Bennamoun, M. (2012). Ontology learning from text. ACM Computing Surveys, 44(4), 1–36.

Word Spy. Logophilia Limited. < https://www.wordspy.com/> [24/03/2017].