Automatic domain-specific learning: towards a methodology for ontology enrichment
Keywords:
Ontology learning, FunGramKB, Corpus, Terminology, BiologyAbstract
At the current rate of technological development, in a world where enormous amount of data are constantly created and in which the Internet is used as the primary means for information exchange, there exists a need for tools that help processing, analyzing and using that information. However, while the growth of information poses many opportunities for social and scientific advance, it has also highlighted the difficulties of extracting meaningful patterns from massive data. Ontologies have been claimed to play a major role in the processing of large-scale data, as they serve as universal models of knowledge representation, and are being studied as possible solutions to this. This paper presents a method for the automatic expansion of ontologies based on corpus and terminological data exploitation. The proposed “ontology enrichment method” (OEM) consists of a sequence of tasks aimed at classifying an input keyword automatically under its corresponding node within a target ontology. Results prove that the method can be successfully applied for the automatic classification of specialized units into a reference ontology.Downloads
References
Alfonseca, E. & Manandhar, S. (2002). Extending a lexical ontology by a combination of distributional semantics signatures. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (pp. 1–7). Berlin: Springer
Baltimore, D. (1971). Expression of animal virus genomes. Bacteriological Review, 35(3), 235-241.
Bendaoud, R., Toussaint, Y. & Napoli, A. (2008). PACTOLE: A methodology and a system for semi-automatically enriching an ontology from a collection of texts. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5113 LNAI, 203–216.
Biemann, C. (2005). Ontology learning from text: A survey of methods. LDV-Forum, 20(2), 75–93.
Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In W. Brinkman, J. Broekens & D. Heylen (Eds.), In Proceedings of the Biennial GSCL Conference (pp. 31-40). Potsdam.
Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing (pp. 152-155).
Ciaramita, M., Gangemi, A., Ratsch, E., Saric, J. & Rojas, I. (2005). Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI) (pp. 659–664). Professional Book Center.
Davies, M. (2008-). The corpus of contemporary American English (COCA): 520 million words, 1990-present. <http://corpus.byu.edu/coca/> [24/03/2017].
Dorr, B. & Jones, D. (1996). Acquisition of semantic lexicons: Using word sense disambiguation to improve precision. In Proceedings of the SIGLEX Workshop on Breadth and Depth of Semantic Lexicons, (pp. 42–50).
Faatz, A. & Steinmetz, R. (2002). Ontology enrichment with texts from the WWW. In the Semantic Web Mining Conference, WS02.
Gacitua, R., Sawyer, P. & Rayson, P. (2008). A flexible framework to experiment with ontology learning techniques. Knowledge-Based Systems, 21(3), 192–199.
Gamallo, P., Gonzalez, M., Agustini, A., Lopes, G. & Delima, V. (2002). Mapping syntactic dependencies onto semantic relations. In Proceedings of the ECAI Workshop on Machine Learning and Natural Language Processing for Ontology (pp. 15-22).
Gasteiger E., Gattiker A., Hoogland C., Ivanyi I., Appel R. D. & Bairoch A. (2003). ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Research, 31(13), 3784–3788.
Gómez-Pérez, A. & Manzano-Macho, D. (2004). An overview of methods and tools for ontology learning from texts. The Knowledge Engineering Review, 19(3), 187–212.
Gruber, T. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2), 199–220.
Gupta, K. M., Aha, D., Marsh, E. & Maney, T. (2002). An architecture for engineering sublanguage WordNets. In Proceedings of the First International Conference on Global WordNet (pp. 207–215). Central Institute of Indian Languages, Mysore.
Haase, P. & Stojanovic, L. (2005). Consistent Evolution of OWL Ontologies. In A. Gómez-Pérez & J. Euzenat (Eds.), ESWC 2005. LNCS, vol. 3532, (pp. 182–197). Heidelberg: Springer.
Hahn, U. & Schnattinger, K. (1998). Towards text knowledge engineering. In Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence American Association for Artificial Intelligence (pp. 524–531).
Hippisley, A., Cheng, D. & Ahmad, K. (2005). The head-modifier principle and multilingual term extraction. Natural Language Engineering, 11(2), 129–157.
Hotho, A. Madche, A. & Staab, S. (2001). Ontology-based text clustering. In Proceedings of the IJCAI-2001 Workshop Text Learning: Beyond Supervision (pp. 48–54). Seattle, USA.
Hwang, C. (1999). Incompletely and imprecisely speaking: Using dynamic ontologies for representing and retrieving information. In Proceedings of the 6th International Workshop on Knowledge Representation meets Databases.
Ide, N. & Véronis, J. (1998). Word sense disambiguation: The state of the art. Computational Linguistics, 24(1), 1–40.
Khan, L. & Luo, F. (2002). Ontology construction for information selection. In Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence (pp. 122–127). Crystal City, Virginia.
Klein, D. & Manning, C. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Meeting of the Association for Computational Linguistics (pp. 423-430).
Lee, C. S., Kao, Y. F., Kuo, Y. H. & Wang, M. H. (2007). Automated ontology construction for unstructured text documents. Data and Knowledge Engineering, 60(3), 547–566.
Lee, J., Kim, J. & Park, J. (2006). Automatic extension of gene ontology with flexible identification of candidate terms. Bioinformatics, 22(6), 665–670.
Lima, R., Oliveira, H., Freitas, F. & Espinasse, B. (2014). Ontology population from the web: An inductive logic programming-based approach. ITNG 2014. In Proceedings of the 11th International Conference on Information Technology: New Generations (pp. 473–478).
Lin, D. (1994). Principar: An efficient, broad-coverage, principle-based parser. In Proceedings of the 15th International Conference on Computational Linguistics (COLING) (pp.482-488). Kyoto, Japan.
Lisi, F. A. (2005). Principles of inductive reasoning on the semantic web: A framework for learning in AL-Log. In F. Fages, & S. Soliman (Eds.), PPSWR 2005. LNCS, vol. 3703 (pp. 118–132). Heidelberg: Springer.
Liu, W., Weichselbraun, A. & Chang, E. (2005). Semi-automatic ontology extension using spreading activation. Journal of Universal Knowledge Management, 0(1), 50–58.
MacNamara, J. (1982). Names for things: A study of human learning. Cambridge, MA: MIT Press.
Missikoff, M., Navigli, R. & Velardi, P. (2002). Integrated approach to Web ontology learning and engineering. IEEE Computer, 35(11), 60–63.
Mitchell T. (1997). Machine Learning. New York: McGraw-Hill.
Navigli, R. & Velardi, P. (2002). Automatic adaptation of WordNet to domains. In Proceedings of 3rd International Conference on Language Resources and Evaluation (pp. 1023-1027).
Navigli, R. & Velardi, P. (2004). Learning domain ontologies from document warehouses and dedicated web sites. Computational Linguistics, 30(2), 151–179.
Navigli, R., Velardi, P., Cucchiarelli, A. & Neri, F. (2004). Quantitative and qualitative evaluation of the OntoLearn ontology learning system. In Proceedings of the 20th International Conference on Computational Linguistics.
O’Hara, T., Mahesh, K. & Nirenburg, S. (1998). Lexical acquisition with WordNet and the Mikrokosmos Ontology. In Proceedings of the ACL Workshop on the Use of WordNet in NLP (pp. 94–101).
Pedersen, T., Patwardhan, S. & Michelizzi, J. (2004). WordNet:similarity: Measuring the relatedness of concepts. In Proceedings of the Demonstration Papers at the Conference of the North American Chapter of the Association for Computational and Linguistics: Human Language Technologies (HLT-NAACL).
Periñán-Pascual, C. (2013). Towards a model of constructional meaning for natural language understanding. In B. Nolan & E. Diedrichsen (eds.) Linking constructions into Functional Linguistics: The role of constructions in grammar (pp. 205–230). Amsterdam/Philadelphia: John Benjamins.
Periñán-Pascual, C. (2017). Bridging the gap within text-data analytics: A computer environment for data analysis in linguistic research. Revista de Lenguas para Fines Específicos, 23(2), 111-132.
Periñán-Pascual, C. & Arcas Túnez, F. (2010). The architecture of FunGramKB. 7th International Conference on Language Resources and Evaluation, Valeta (Malta). In Proceedings of the Seventh International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA) (pp. 2667–2674).
Periñán-Pascual, C. & Mairal Usón, R. (2009). Bringing Role and Reference Grammar to natural language understanding. Procesamiento del Lenguaje Natural 43, 265–273.
Periñán-Pascual, C. & Mairal Usón, R. (2010). Enhancing UniArab with FunGramKB. Procesamiento del Lenguaje Natural, 44, 19–26.
Poon, H. & Domingos, P. (2010). Unsupervised ontology induction from text. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 296–305).
Princeton University (2010) About WordNet. WordNet. Princeton University <http://wordnet.princeton.edu> [24/03/2017].
Roussinov, D. & Zhao, J. L. (2003). Automatic discovery of similarity relationships through web mining. Decision Support Systems, 35(1), 149–166.
Salton, G. & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing Management, 24(5), 513–523.
Salton, G., Wong, A. & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.
Shamsfard, M. & Barforoush, A. (2004). Learning ontologies from natural language texts. International Journal of Human-Computer Studies, 60(1), 17–63.
Studer, R., Benjamins, V. R. & Fensel, D. (1998). Knowledge engineering: Principles and methods. Data & Knowledge Engineering, 25(1-2), 161–197.
Udrea, O. & Getoor, L. (2007). Combining statistical and logical inference for ontology alignment. In Proceedings of the Workshop on Semantic Web for Colaborative Knnowledge Acquisition, IJCAI (pp. 51–58). Hyderabad, India.
Widdows, D., Dorow, B. & Chan, Ch. (2002). Using parallel corpora to enrich multilingual lexical resources. In Proceedings of the Third International Conference on Language Resources and Evaluation (pp. 240–245). Las Palmas, Spain.
Wong, W., Liu, W. & Bennamoun, M. (2012). Ontology learning from text. ACM Computing Surveys, 44(4), 1–36.
Word Spy. Logophilia Limited. < https://www.wordspy.com/> [24/03/2017].
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Revista de Lenguas para fines específicos is licensed under a Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional License.