A diachronic approach to scientific lexicon in English: Evidence from Late Modern English corpora

Pascual Cantos; Nila Vázquez; Nila Vázquez

Authors

Pascual Cantos Universidad de Murcia
Nila Vázquez Universidad de Murcia
Nila Vázquez Universidad de Murcia

Keywords:

Diachronic corpus-based research, corpus linguistics, astronomy, lexical complexity, lexical specificity

Abstract

This research focuses on the Corpus of English Texts on Astronomy (CETA), the first sub-corpus of CC, presenting a diachronic approach to astronomy specific lexicon found in texts from 1710 to 1920. The goal of this research is trying to determine the evolution of the lexical astronomy-domain specificity in the CETA. That is, how many astronomy-like lexical features occur in the English astronomic texts gathered in the CETA. This might shed some light on: (i) the introduction rate of new astronomic specific vocabulary along time, (ii) lexical richness in English astronomic texts, (iii) the rate of new astronomic specific vocabulary along time, (iv) the potential lexical specific features of English astronomic texts, and (v) lexico-semantic text difficulty of English astronomic texts.

Downloads

References

Alonso-Almeida, F. & Sánchez-Cuervo, M. (2009). The vernacularisation of Medieval medical texts. In Bravo, S. et al. Estudios de traducción (pp. 191-207). Frankfurt am Main: Peter Lang.

Ananiadou, S. (1988). Towards a methodology for automatic term recognition. PhD thesis, University of Manchester Institute of Science and Technology.

Ananiadou, S. (1994). A methodology for automatic term recognition. Proceedings of the 15th International conference on computational linguistics, COLING, 94, 1034-1038.

Atkinson, D. (1992). The evolution of medical research writing from 1735 to 1985: The case of the Edinburgh medical journal. Applied Linguistics, 113, 337-374.

Atkinson, D. (1996). The Philosophical Transactions of the Royal Society of London, 1675-1975: A sociohistorical discourse analysis. Language and Society, 5, 333-371.

Biber, D. & Jones, J. K. (2005). Merging corpus linguistic and discourse analytic research goals: Discourse units in biology research articles. Corpus Linguistics and Linguistic Theory, 1, 151-182.

Bourigaul, D. (1992). Surface grammatical analysis for the extraction of terminological noun phrase. Proceedings of the fifteenth international conference on computational linguistics.

Brown, G. & Yule , G. (1983). Discourse analysis. Cambridge: CUP.

Bruce, B. & Rubin, A. (1988). Readability formulas: Matching tool and task. In A. Davison & G. M. Green (Eds.), Linguistic complexity and text comprehension: Readability issues reconsidered (pp. 5-22). Hillsdale, New Jersey: Erlbaum.

Cabré, M. T. (1993). La terminología: Teoría, metodología, aplicaciones. Barcelona: Ed. Antártida.

Cabré, M. T. (1998). Terminology. Theory, methods and applications. Amsterdam: John Benjamins.

Cabré, M. T. & Sager, J. C. (1999). Terminology. Theory, methods and applications. Amsterdam: John Benjamins.

Dagan, I. & Church, K. (1994). Termight: Coordinating humans and machines in bilingual terminology acquisition. Machine Translation, vol. 12 (1/2), 89-107.

Dagan, I. & Church, K. (1995). Termight: Identifying and translating technical terminology. Proceedings of the 4th conference on applied natural language processing, 34-40.

Daille, B., Gaussier, E. & Langé, JM. (1994). Towards automatic extraction of monolingual and bilingual terminology. In Proceedings of COLING, 94, 515-521.

Denison, D. (1998). Syntax. In Romaine, S. (Ed.) The Cambridge history of the English language IV: 1776-1997 (pp. 92-329). Cambridge: Cambridge University Press.

Enguehard, C. & Pantera, L. (1994). Automatic natural acquisition of a terminology. Journal of Quantitative Linguistics, 2(1), 27-32.

Frantzi, K., Ananiadou, S. & Mima, H. (2000). Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries, 3, 115-130.

Gravetter, F. & Wallnau, L. (2007). Essentials of statistics for the behavioral science. Belmont, CA: Thomson Higer Education.

Justeson, J. S. & Katz, S. M. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1, 9-27.

Kelih, E., Grzybek, P., Antic, G. & Stadlober, E. (2006). Quantitative text typology: The impact of sentence length. In M. Spiliopoulou et al. (Eds.) From data and information analysis to knowledge engineering (pp. 382–389). Berlin: Springer Verlag.

Lareo-Martín, I. & Montoya-Reyes, A. (2007). Scientific writing: Following Robert Boyle’s principles in experimental essays, 1704 & 1998. Revista Alicantina de Estudios Ingleses, 20, 119-137.

Lauriston, A. (1996). Automatic term recognition: Performance of linguistic and statistical learning techniques (PhD thesis). Manchester: University of Manchester Institute of Science and Technology.

Malvern, D. D. & Richards, B. J. (1997). Quantifying lexical diversity in the study of language development. Reading: University of Reading, The New Bulmershe Papers.

Milroy, J. (1992). Linguistic variation & change. Oxford: Basil Blackwell.

Moskowich-Spiegel Fandiño, I. (Forthcoming) “A smooth homogeneous globe” in CETA: Compiling Late Modern Astronomy texts in English. In Vázquez, N. (Ed.) Creation and use of historical English corpora in Spain. Newcastle: Cambridge Scholars.

Moskowich-Spiegel Fandiño, I. & Crespo-García, B. (2007). Presenting the Coruña Corpus: A collection of samples for the historical study of English scientific writing. In Pérez Guerra, J. et al. (Eds.) Of varying language and opposing creed: New insights into Late Modern English (pp. 341-357). Bern: Peter Lang.

Moskowich-Spiegel Fandiño, I. & Crespo-García, B. (2009). The limits of my language are the limits of my world: The scientific lexicon from 1350 to 1640. SKASE Journal of Theoretical Linguistics, 6(1), 45-58.

Nam, Y. H., Park, S. H., Ha, T. K. & Jeon, Y. H. (2004). Preprocessing of digital audio data for mobile audio codecs. [<http://www.freepatentsonline.com/ y2004/0128126.html>]

Nevalainen, T. (1999). Early Modern English lexis and semantics. In R. Lass, The Cambridge history of the English language Vol. 3 (pp. 1476-1776). Cambridge: CUP.

Norri, J. (1992). Names of sicknesses in English, 1400-1550: An exploration of the lexical field. Annales academiae scientiarum fennicae dissertationes humanarum litterarum 63. Helsinki.

Norri, J. (1998). Names of body parts in English, 1400-1550. Annales academiae scientiarum fennicae dissertatione. Humaniora 291. Helsinki.

Oakes, M. P. (2009). Corpus linguistics and stylometry. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics: An international handbook (pp. 1070-1090). Berlin: Mouton de Gruyter.

Richards, B. J. & Malvern, D. D. (1996). Swedish verb morphology in language impaired children: Interpreting the type-token ratios. Logopedics Phoniatrics Vocology, 21(2), 109-111.

Romaine, S. (1994). Language in society: An introduction to sociolinguistics. Oxford: OUP.

Rydén, M. (1984). The study of eighteenth century English syntax. In J. Fisiak (Ed.) Historical syntax (pp. 509-520). Berlin: Mouton Publishers.

Sager, J. C. (1990). A practical course in terminology processing. Amsterdam: John Benjamins Publishing Company.

Sánchez, A. & Cantos, P. (1997). Predictability and representativeness of words, word forms and lemmas in linguistic corpora. A case study based on the analysis of the CUMBRE corpus: An 8-million word corpus of contemporary Spanish”. International Journal of Corpus Linguistics, 2(2), 259-280.

Sánchez, A. & Cantos, P. (1998). El ritmo incremental de palabras nuevas en los repertorios de textos. Estudio experimental comparativo basado en dos corpus lingüísticos equivalentes de cuatro millones de palabras de las lenguas inglesa y española y en cinco autores de ambas lenguas. ATLANTIS, 19(2), 205-223.

Scott, M. (2008). Wordsmith tools version 5. Liverpool: Lexical Analysis Software. Scott, M. (2010). WordSmith tools. [<http://www.lexically.net/downloads/

version5/WordSmith.pdf>]

Spasic, I., Ananiadou, S., McNaught, J. & Kumar, A. (2005). Text mining and ontologies in biomedicine: Making sense of raw text. Briefings in Bioinformatics, 6(3), 239-251.

Taavitsainen, I. (2001). Language history and the scientific register. In H. J. Diller & M. Görlach (Eds.), Towards a history of English as a history of genres (pp. 185-202). Heidelberg: Winter.

Taavitsainen, I., Pahta, P., Leskinen, N., Ratia, M. & Suhr, C. (2002). Analysing scientific thought-styles: What can linguistic research reveal about the History of Science? In H. Raumolin-Brunberg, M. Nevala, A. Nurmi & M. Rissanen (Eds.), Variation past and present, VARIENG Studies on English for Terttu Nevalainen (pp. 251-270). Helsinki: Société Néophilologique.

Tweedie, F. & Baayen, H. (1998). How variable may a constant be? Measures of lexical richness in perspective. Computers in the Humanities, 32, 323-352.

Urda, T. (2005). Statistics in plain English. Mahwah, New Jersey: Lawrence Erlbaum.

Vázquez, N. et al. (Forthcoming). A descriptive approach to English historical corpora in the 21st Century. IJES, 11(2). New Developments in Corpus Linguistics.

Woods, A., Fletcher, P., Hughes, A., Austin, P., Bresnan, J., Comrie, B., Crain, S., Dressler, W., Ewen, C. & Lass, R. (1986). Statistics in language studies. Cambridge: CUP.