A lexical database for public textual cyberbullying detection

Authors

  • Aurelia Power Institute of Technology Blanchardstown, Dublin
  • Anthony Keane Institute of Technology Blanchardstown, Dublin
  • Brian Nolan Institute of Technology Blanchardstown, Dublin
  • Brian O'Neill Dublin Institute of Technology

Keywords:

cyberbullying, lexical database, linguistic analysis, natural language processing

Abstract

Public textual cyberbullying has become one of the most prevalent issues associated with online safety of young people, particularly on social networks. To address this issue, we argue that the boundaries of what constitutes public textual cyberbullying needs to be first identified and a corresponding linguistically motivated definition needs to be advanced. Thus, we propose a definition of public textual cyberbullying that contains three necessary and sufficient elements: the personal marker, the dysphemistic element and the cyberbullying link between the previous two elements. Subsequently, we argue that one of the cornerstones in the overall process of mitigating the effects of cyberbullying is the design of a cyberbullying lexical database that specifies what linguistic and cyberbullying specific information is relevant to the detection process. In this vein, we propose a novel cyberbullying lexical database based on the definition of public textual cyberbullying. The overall architecture of our cyberbullying lexical database is determined semantically, and, in order to facilitate cyberbullying detection, the lexical entry encapsulates two new semantic dimensions that are derived from our definition: cyberbullying function and cyberbullying referential domain. In addition, the lexical entry encapsulates other semantic and syntactic information, such as sense and syntactic category, information that, not only aids the process of detection, but also allows us to expand the cyberbullying database using WordNet (Miller, 1993).

Downloads

Download data is not yet available.

Author Biographies

Aurelia Power, Institute of Technology Blanchardstown, Dublin

Aurelia Power is an assistant lecturer in the Department of Informatics and Creative Digital Media at the Institute of Technology Blanchardstown, Dublin. She has recently completed her PhD in computational linguistics which is entitled “A Linguistic Approach to Detecting Public Textual Cyberbullying”. Her research interests include computational linguistics, linguistic theory, language acquisition, philosophy of language, artificial intelligence, data mining, cyberbullying and behaviour analysis.  She is also a Board Certified Behaviour Analyst and a certified Professional Java Programmer.

Anthony Keane, Institute of Technology Blanchardstown, Dublin

Dr. Anthony Keane is currently the Head of School of Informatics and Engineering at the Institute of Technology Blanchardstown, Dublin. He is also a principle investigator in the Security Research Lab, located in the Learning & Innovation Centre in ITB where he has several doctoral research students working with industrial partners both SME and International.  Main research areas cover Network Resilience, Cyber Security, Digital & Cloud Forensics, Cyber Bullying and Security Intelligence.  Dr. Keane has multiple conference and journal publications and is a frequent invited speaker at technology conferences and industrial seminars.

Brian Nolan, Institute of Technology Blanchardstown, Dublin

Dr. Brian Nolan is Head of Department of Informatics and Creative Digital Media at the Institute of Technology Blanchardstown Dublin, in Ireland. His research interests include linguistic theory at the morpho-syntactic semantic interface, argument structure and valence, constructions in grammar, event structure in language, the architecture of the lexicon and computational approaches to language processing, computational linguistics. His linguistic work has been in the functional linguistic model of Role and Reference Grammar and he has published extensively internationally. Computing/computational linguistic research has concentrated on: 1) The development of a framework and supporting application suite for mobile and distributed command and control of robotic devices using speech recognition as the core enabling technology; 2) The development of a rule–based Arabic to English machine translation engine that uses Role and Reference Grammar as the linguistic model supporting an interlingua bridge; and 3) the investigation of linguistic models to underpin conversational agents. In 2012 Dr. Nolan published his book with Equinox UK on the linguistic structure of Irish in a Role and Reference Grammar account entitled: ‘The structure of Modern Irish: A functional account’. In 2013, Benjamins published his co-edited volume: ‘Linking constructions into functional linguistics – The role of constructions in grammar’ in their Studies in Language Companion series. His co-edited Benjamin volume on computational linguistics and linguistic theory, ‘Language processing and grammars: The role of functionally oriented computational models’ was published in 2014, also in their Studies in Language Companion series. He also co-edited a Benjamins book on ‘Causation, transfer and permission’ in linguistic theory, which appeared in early 2015. In January 2017, Benjamins published his co-edited book on complex predication entitled ‘Argument realisation in complex predicates and complex events: Verb verb constructions at the syntax semantic interface’. Dr. Nolan has over 40 years’ experience nationally and internationally within the computer industry, with over 2 decades in academia, in a variety of senior roles, and is also a widely published professional linguist. Dr. Nolan is a Fellow of the Irish Computer Society.

Brian O'Neill, Dublin Institute of Technology

Professor Brian O’Neill is Director of the Research, Enterprise and Innovation Services at Dublin Institute of Technology.  His research is focused on media policy and digital technologies; media and information literacy, e-safety and information society policy for children and he has an international profile as a researcher on young people’s use of new media and the internet.  He is a member of the Management Group for the EU Kids Online network responsible overseeing its policy work package. He sits on Ireland's Internet Safety Advisory Committee and also chaired the Irish government’s task force on Internet Content Governance, reporting to the Minister for Communications, Energy and Natural Resources. He has undertaken research for the European Commission, UNICEF, the Broadcasting Authority of Ireland and the ICT Coalition. He is a member of the Council of Europe’s Expert Group on Digital Citizenship Education.

References

Allan, K. & Burridge, K. (2006). Forbidden words: Taboo and censoring of language. Cambridge: Cambridge University Press.

Babble, D. (2016). Baby Names. <http://www.babble.com/baby-names/> [12/02/2017].

Birner, B. (2013). Introduction to pragmatics. Oxford: Wiley-Blackwell Publishing.

Boyd, D. (2007). Why youth (heart) social network sites: The role of networked publics in teenage social life. In D. Buckingham (Ed.), MacArthur foundation series on digital learning, youth, identity, and digital media (pp. 1 – 26). Cambridge, MA: MIT Press.

Chen, Y., Zhou, Y., Zhu, S. & Xu, H. (2012). Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the 4th ASE/IEEE International Conference on Social Computing (pp. 71–80). <http://www.cse.psu.edu/~sxz16/papers/SocialCom 2012.pdf> [12/02/2017].

Dadvar, M., de Jong, F., Ordelman, R. & Trieschnigg, D. (2012). Improved cyberbullying detection using gender information. In Proceedings of the 21st International Conference Companion on World Wide Web, ACM (pp. 121–126). <http://wwwhome.ewi. utwente.nl/~dadvarm/Maral/uploads/Main/DIR2012.pdf> [12/02/2017].

Dinakar, K., Jones, B., Havasi, C., Lieberman, H. & Picard, R. (2012). Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems, 2(3), 18:1-18:30, DOI 10.1145/2362394.2362400.

Dinakar, K., Reichart, R. & Lieberman, H. (2011). Modeling the detection of textual cyberbullying. In Proceedings of the 5th AAAI International Conference on Weblog and Social Media (pp.11–17). <http://web.media.mit.edu/~kdinakar/3841-16937-1-PB.pdf> [12/02/ 2017].

Fellbaum, C. (1993). English verbs as a semantic net. <http://wordnetcode.princeton.edu/ 5papers.pdf> [12/02/2017].

Fellbaum, C., Gross, D. & Miller, K. (1993). Adjectives in WordNet. [12/02/2017].

Free Web Headers (2016). Full list of bad words and top swear words banned by google. <http://www.freewebheaders.com/full-list-of-bad-words-banned-by-google/> [12/02/2017].

Hinduja, S. & Patchin, J.W. (2009). Bullying beyond the schoolyard: preventing and responding to cyber-bullying. Thousand Oaks, CA: Corwin Press.

Honjo, M., Hasegawa, T., Hasegawa, T., Mishima, K., Suda, T. & Yoshida, T. (2011). A framework to identify relationships among students in school bullying using digital communication media. In Proceedings of the 3rd IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing (pp.1474–1479). <http://www.proceedings.com/13745.html> [12/02/2017].

Hosseinmardi, H., Han, R., Lv, Q., Mishra, S. & Ghasemianlangroodi, A. (2014). Towards understanding cyberbullying behavior in a semi-anonymous social network. In Proceedings of 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 244–252), <https://arxiv.org/pdf/1404.3839.pdf> [12/02/2017].

Huddleston, R. & Pullum, G.K. (2005): A student’s introduction to English grammar. Cambridge: Cambridge University Press.

Kansara, K.B. & Shekokar, N.M. (2015). A framework for cyberbullying detection in social network. International Journal of Current Engineering and Technology, 5(1), 494 – 498.

Kontostathis, A., Reynolds, K., Garron, A. & Edwards, L. (2013). Detecting cyberbullying: Query terms and techniques. In Proceedings of the WebSci’13 Conference. [12/02/2017].

Lagos, C. (2012). Cyberbullying: The challenge to define. Cyberpsychology, Behavior, and Social Networks, 15(6), 285–289.

Li, M. & Tagami, A. (2014). A Study of contact network generation for cyber-bullying detection. In Proceedings of the 28th International Conference on Advanced Information Networking and Applications Workshops (pp. 431 – 437). <http://ieeexplore.ieee.org/document/6844675/> [12/02/2017].

Lipka, L. (1992). An outline of English lexicology: Lexical structure, word semantics and word-formation. 2nd ed. Tübingen: Niemeyer.

Livingstone S., Haddon L., Görzig A. & Ólafsson K. (2011). Risks and safety on the Internet: the perspective of European children, full findings. EU Kids Online, London School of Economics and Political Science, <http://eprints.lse.ac.uk/33731/> [12/02/2017].

Luis von Ahn's Research Group (2016). Useful resources: Offensive/Profane word list. <http://www.cs.cmu.edu/~biglou/resources/bad-words.txt> [12/02/2017].

Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D. & Miller, K. (1993). Introduction to WordNet: An on-line lexical database. <http://wordnetcode.princeton.edu/ 5papers.pdf> [12/02/2017].

Miller, G.A. (1993). Nouns in WordNet: A lexical inheritance system. Available at <http://wordnetcode.princeton.edu/5papers.pdf> [12/02/2017].

Miller, G. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.

Munezero, M., Mozgovoy, M., Kakkonen, T., Klyuev, V. & Sutinen, E. (2013). Antisocial behaviour corpus for harmful language detection. In Proceedings of the 2013 Federated Conference on Computer Science and Information Systems (pp. 261–265). [12/02/2017].

Nahar, V., Li, X. & Pang, C. (2013). An effective approach for cyberbullying detection. Communications in Information Science and Management Engineering, 3(5), 238–247.

Nitta, T., Masui, F., Ptaszynski, M., Kimura, Y., Rzepka, R. & Araki, K. (2013). Detecting cyberbullying entries on informal school websites based on category relevance maximization. In Proceedings of the 6th International Joint Conference on Natural Language Processing (pp. 579–586). <http://www.aclweb.org/anthology/I13-1066> [12/02/2017].

Nocentini, A., Calmaestra, J., Schultze-Krumbholz, A., Scheithauer, H., Ortega, R. & Menesini, E. (2010). Cyberbullying: Labels, behaviours and definition in three European countries. Australian Journal of Guidance and Counselling, 20(2), 129–142.

Norvig (2007). How to write a spelling corrector. <http://norvig.com/spell-correct.html> [12/02/2017].

NoSwearing (2016). List of swear words and curse words. <http://www.noswearing.com/ dictionary> [12/02/2017].

Princeton University (2016). WordNet: A lexical database for English. [12/02/2017].

Ptaszynski, M., Dybala, P., Matsuba, T., Rzepka, R. & Araki, K. (2010). Machine learning and affect analysis against cyber-bullying. In Proceedings of the Linguistic and Cognitive Approaches to Dialog Agents Symposium, at the AISB 2010 Convention (pp. 7–16). <http://arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/AISB2010_Cyberbullying_paper.pdf> [12/02/2017].

Reynolds, K., Kontostathis, A. & Edwards, L. (2011). Using machine learning to detect cyberbullying. In Proceedings of the 10th International Conference on Machine Learning and Applications Workshops (ICMLA 2011) (pp. 241-244). <http://webpages.ursinus.edu/ akontostathis/ReynoldsKontostathisEdwardsFINAL.pdf> [12/02/2017].

Sebastiani, F. (2002). Machine learning in automated text categorisation. ACM Computing Surveys, 34(1), –47.

Slonje, R., & Smith, P.K. (2008). Cyberbullying: Another main type of bullying? Scandinavian Journal of Psychology, 49(2), 147–154.

Sullivan, K.S. (2007). Grammar in metaphor: A construction grammar account of metaphoric language. PhD. Thesis with University of California, Berkley, <http://linguistics.berkeley. edu/dissertations/Sullivan_dissertation_2007.pdf> [12/02/2017].

Urban Dictionary (2016). Urban Dictionary. <http://www.urbandictionary.com/> [12/02/2017].

Van Valin, R.D. Jr. (2001). An introduction to syntax. Cambridge: Cambridge University Press.

Vocabulary University (2016). Violence vocabulary word list. <https://myvocabulary.com/ word-list/violence-vocabulary/> [12/02/2017].

Wikipedia the Free Encyclopaedia (2016). List of ethnic slurs by ethnicity. [12/02/2017].

Wikipedia the Free Encyclopaedia (2016). List of religious slurs. <https://en.wikipedia.org/ wiki/List_of_religious_slurs> [12/02/2017].

Wilks, Y. (1978). Making preferences more active. Artificial Intelligence, 11(3), 197–223.

Wilks, Y. (2007). Making preferences more active. In K. Ahmad, C. Brewster & M. Stevenson (Eds.), Words and intelligence I (pp. 141–166). New York: Springer.

Xu, J., Jun, K., Zhu, X. & Bellmore, A. (2012). Learning from bullying traces in social media. In Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (pp. 656–666). <http://pages.cs.wisc.edu/~jerryzhu/pub/naaclhlt2012.pdf> [12/02/2017].

Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A. & Edwards, L. (2009). Detection of harassment on Web 2.0. In Proceedings of the Content Analysis in the WEB 2.0 Workshop, <http://webpages.ursinus.edu/akontostathis/harassment.pdf> [12/02/2017].

Published

2017-12-05

How to Cite

Power, A., Keane, A., Nolan, B., & O'Neill, B. (2017). A lexical database for public textual cyberbullying detection. Revista De Lenguas Para Fines Específicos, 23(2), 157–186. Retrieved from https://ojsspdc.ulpgc.es/ojs/index.php/LFE/article/view/923

Issue

Section

Sección Monográfica/Special Issue