A lexical database for public textual cyberbullying detection
Keywords:
cyberbullying, lexical database, linguistic analysis, natural language processingAbstract
Public textual cyberbullying has become one of the most prevalent issues associated with online safety of young people, particularly on social networks. To address this issue, we argue that the boundaries of what constitutes public textual cyberbullying needs to be first identified and a corresponding linguistically motivated definition needs to be advanced. Thus, we propose a definition of public textual cyberbullying that contains three necessary and sufficient elements: the personal marker, the dysphemistic element and the cyberbullying link between the previous two elements. Subsequently, we argue that one of the cornerstones in the overall process of mitigating the effects of cyberbullying is the design of a cyberbullying lexical database that specifies what linguistic and cyberbullying specific information is relevant to the detection process. In this vein, we propose a novel cyberbullying lexical database based on the definition of public textual cyberbullying. The overall architecture of our cyberbullying lexical database is determined semantically, and, in order to facilitate cyberbullying detection, the lexical entry encapsulates two new semantic dimensions that are derived from our definition: cyberbullying function and cyberbullying referential domain. In addition, the lexical entry encapsulates other semantic and syntactic information, such as sense and syntactic category, information that, not only aids the process of detection, but also allows us to expand the cyberbullying database using WordNet (Miller, 1993).Downloads
References
Allan, K. & Burridge, K. (2006). Forbidden words: Taboo and censoring of language. Cambridge: Cambridge University Press.
Babble, D. (2016). Baby Names. <http://www.babble.com/baby-names/> [12/02/2017].
Birner, B. (2013). Introduction to pragmatics. Oxford: Wiley-Blackwell Publishing.
Boyd, D. (2007). Why youth (heart) social network sites: The role of networked publics in teenage social life. In D. Buckingham (Ed.), MacArthur foundation series on digital learning, youth, identity, and digital media (pp. 1 – 26). Cambridge, MA: MIT Press.
Chen, Y., Zhou, Y., Zhu, S. & Xu, H. (2012). Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the 4th ASE/IEEE International Conference on Social Computing (pp. 71–80). <http://www.cse.psu.edu/~sxz16/papers/SocialCom 2012.pdf> [12/02/2017].
Dadvar, M., de Jong, F., Ordelman, R. & Trieschnigg, D. (2012). Improved cyberbullying detection using gender information. In Proceedings of the 21st International Conference Companion on World Wide Web, ACM (pp. 121–126). <http://wwwhome.ewi. utwente.nl/~dadvarm/Maral/uploads/Main/DIR2012.pdf> [12/02/2017].
Dinakar, K., Jones, B., Havasi, C., Lieberman, H. & Picard, R. (2012). Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems, 2(3), 18:1-18:30, DOI 10.1145/2362394.2362400.
Dinakar, K., Reichart, R. & Lieberman, H. (2011). Modeling the detection of textual cyberbullying. In Proceedings of the 5th AAAI International Conference on Weblog and Social Media (pp.11–17). <http://web.media.mit.edu/~kdinakar/3841-16937-1-PB.pdf> [12/02/ 2017].
Fellbaum, C. (1993). English verbs as a semantic net. <http://wordnetcode.princeton.edu/ 5papers.pdf> [12/02/2017].
Fellbaum, C., Gross, D. & Miller, K. (1993). Adjectives in WordNet. [12/02/2017].
Free Web Headers (2016). Full list of bad words and top swear words banned by google. <http://www.freewebheaders.com/full-list-of-bad-words-banned-by-google/> [12/02/2017].
Hinduja, S. & Patchin, J.W. (2009). Bullying beyond the schoolyard: preventing and responding to cyber-bullying. Thousand Oaks, CA: Corwin Press.
Honjo, M., Hasegawa, T., Hasegawa, T., Mishima, K., Suda, T. & Yoshida, T. (2011). A framework to identify relationships among students in school bullying using digital communication media. In Proceedings of the 3rd IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing (pp.1474–1479). <http://www.proceedings.com/13745.html> [12/02/2017].
Hosseinmardi, H., Han, R., Lv, Q., Mishra, S. & Ghasemianlangroodi, A. (2014). Towards understanding cyberbullying behavior in a semi-anonymous social network. In Proceedings of 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 244–252), <https://arxiv.org/pdf/1404.3839.pdf> [12/02/2017].
Huddleston, R. & Pullum, G.K. (2005): A student’s introduction to English grammar. Cambridge: Cambridge University Press.
Kansara, K.B. & Shekokar, N.M. (2015). A framework for cyberbullying detection in social network. International Journal of Current Engineering and Technology, 5(1), 494 – 498.
Kontostathis, A., Reynolds, K., Garron, A. & Edwards, L. (2013). Detecting cyberbullying: Query terms and techniques. In Proceedings of the WebSci’13 Conference. [12/02/2017].
Lagos, C. (2012). Cyberbullying: The challenge to define. Cyberpsychology, Behavior, and Social Networks, 15(6), 285–289.
Li, M. & Tagami, A. (2014). A Study of contact network generation for cyber-bullying detection. In Proceedings of the 28th International Conference on Advanced Information Networking and Applications Workshops (pp. 431 – 437). <http://ieeexplore.ieee.org/document/6844675/> [12/02/2017].
Lipka, L. (1992). An outline of English lexicology: Lexical structure, word semantics and word-formation. 2nd ed. Tübingen: Niemeyer.
Livingstone S., Haddon L., Görzig A. & Ólafsson K. (2011). Risks and safety on the Internet: the perspective of European children, full findings. EU Kids Online, London School of Economics and Political Science, <http://eprints.lse.ac.uk/33731/> [12/02/2017].
Luis von Ahn's Research Group (2016). Useful resources: Offensive/Profane word list. <http://www.cs.cmu.edu/~biglou/resources/bad-words.txt> [12/02/2017].
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D. & Miller, K. (1993). Introduction to WordNet: An on-line lexical database. <http://wordnetcode.princeton.edu/ 5papers.pdf> [12/02/2017].
Miller, G.A. (1993). Nouns in WordNet: A lexical inheritance system. Available at <http://wordnetcode.princeton.edu/5papers.pdf> [12/02/2017].
Miller, G. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.
Munezero, M., Mozgovoy, M., Kakkonen, T., Klyuev, V. & Sutinen, E. (2013). Antisocial behaviour corpus for harmful language detection. In Proceedings of the 2013 Federated Conference on Computer Science and Information Systems (pp. 261–265). [12/02/2017].
Nahar, V., Li, X. & Pang, C. (2013). An effective approach for cyberbullying detection. Communications in Information Science and Management Engineering, 3(5), 238–247.
Nitta, T., Masui, F., Ptaszynski, M., Kimura, Y., Rzepka, R. & Araki, K. (2013). Detecting cyberbullying entries on informal school websites based on category relevance maximization. In Proceedings of the 6th International Joint Conference on Natural Language Processing (pp. 579–586). <http://www.aclweb.org/anthology/I13-1066> [12/02/2017].
Nocentini, A., Calmaestra, J., Schultze-Krumbholz, A., Scheithauer, H., Ortega, R. & Menesini, E. (2010). Cyberbullying: Labels, behaviours and definition in three European countries. Australian Journal of Guidance and Counselling, 20(2), 129–142.
Norvig (2007). How to write a spelling corrector. <http://norvig.com/spell-correct.html> [12/02/2017].
NoSwearing (2016). List of swear words and curse words. <http://www.noswearing.com/ dictionary> [12/02/2017].
Princeton University (2016). WordNet: A lexical database for English. [12/02/2017].
Ptaszynski, M., Dybala, P., Matsuba, T., Rzepka, R. & Araki, K. (2010). Machine learning and affect analysis against cyber-bullying. In Proceedings of the Linguistic and Cognitive Approaches to Dialog Agents Symposium, at the AISB 2010 Convention (pp. 7–16). <http://arakilab.media.eng.hokudai.ac.jp/~ptaszynski/data/AISB2010_Cyberbullying_paper.pdf> [12/02/2017].
Reynolds, K., Kontostathis, A. & Edwards, L. (2011). Using machine learning to detect cyberbullying. In Proceedings of the 10th International Conference on Machine Learning and Applications Workshops (ICMLA 2011) (pp. 241-244). <http://webpages.ursinus.edu/ akontostathis/ReynoldsKontostathisEdwardsFINAL.pdf> [12/02/2017].
Sebastiani, F. (2002). Machine learning in automated text categorisation. ACM Computing Surveys, 34(1), –47.
Slonje, R., & Smith, P.K. (2008). Cyberbullying: Another main type of bullying? Scandinavian Journal of Psychology, 49(2), 147–154.
Sullivan, K.S. (2007). Grammar in metaphor: A construction grammar account of metaphoric language. PhD. Thesis with University of California, Berkley, <http://linguistics.berkeley. edu/dissertations/Sullivan_dissertation_2007.pdf> [12/02/2017].
Urban Dictionary (2016). Urban Dictionary. <http://www.urbandictionary.com/> [12/02/2017].
Van Valin, R.D. Jr. (2001). An introduction to syntax. Cambridge: Cambridge University Press.
Vocabulary University (2016). Violence vocabulary word list. <https://myvocabulary.com/ word-list/violence-vocabulary/> [12/02/2017].
Wikipedia the Free Encyclopaedia (2016). List of ethnic slurs by ethnicity. [12/02/2017].
Wikipedia the Free Encyclopaedia (2016). List of religious slurs. <https://en.wikipedia.org/ wiki/List_of_religious_slurs> [12/02/2017].
Wilks, Y. (1978). Making preferences more active. Artificial Intelligence, 11(3), 197–223.
Wilks, Y. (2007). Making preferences more active. In K. Ahmad, C. Brewster & M. Stevenson (Eds.), Words and intelligence I (pp. 141–166). New York: Springer.
Xu, J., Jun, K., Zhu, X. & Bellmore, A. (2012). Learning from bullying traces in social media. In Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (pp. 656–666). <http://pages.cs.wisc.edu/~jerryzhu/pub/naaclhlt2012.pdf> [12/02/2017].
Yin, D., Xue, Z., Hong, L., Davison, B.D., Kontostathis, A. & Edwards, L. (2009). Detection of harassment on Web 2.0. In Proceedings of the Content Analysis in the WEB 2.0 Workshop, <http://webpages.ursinus.edu/akontostathis/harassment.pdf> [12/02/2017].
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
Revista de Lenguas para fines específicos is licensed under a Creative Commons Reconocimiento-NoComercial-SinObraDerivada 4.0 Internacional License.