Authors:
(1) Yi-Ling Chung, The Alan Turing Institute (ychung@turing.ac.uk);
(2) Gavin Abercrombie, The Interaction Lab, Heriot-Watt University (g.abercrombie@hw.ac.uk);
(3) Florence Enock, The Alan Turing Institute (fenock@turing.ac.uk);
(4) Jonathan Bright, The Alan Turing Institute (jbright@turing.ac.uk);
(5) Verena Rieser, The Interaction Lab, Heriot-Watt University and now at Google DeepMind (v.t.rieser@hw.ac.uk).
Table of Links
6 Computational Approaches to Counterspeech and 6.1 Counterspeech Datasets
6.2 Approaches to Counterspeech Detection and 6.3 Approaches to Counterspeech Generation
8 Conclusion, Acknowledgements, and References
7 Future Perspectives
Of the many promising abuse intervention experiments that we review, results are not always consistent, demonstrating weak claims or limited success (applicable only to certain settings). Possible reasons include short-term experiments, small sample sizes and non-standardised experimental designs. To improve this, effective interventions should come with the characteristics of scalability, durability, reliability, and specificity. In this section, we highlight key distinctions and overlaps across areas that have and have not been explored in social sciences and computer science, discuss ethical issues related to evaluating counterspeech in real-life settings and automating the task of counterspeech generation, and identify best practices for future research.
Distinctions and overlaps across areas By recognizing the commonalities and differences between social sciences and computer science, we pinpoint the unique contributions of each discipline and encourage interdisciplinary collaborations to address complex societal challenges and better understand human behaviour with the help of computational systems.
• Terminological clarity. Throughout the counterspeech literature, terminology is used inconsistently. Terms such as counterspeech and counter-narratives are often used interchangeably or used to refer to similar concepts. In social science, counterspeech is used to refer to content that disagrees with abusive discourses and counter-narratives often entail criticism of an ideology with logical reasoning. As a result, counter-narrative stimuli designed in social experiments are generally long form (Bélanger et al., 2020). In computer science on the other hand, the distinctions between counterspeech and counter-narratives have been vague, and training data is generally short form (while this may be bound by character limit on social media platforms). For instance, short and generic responses such as ‘How can you say that about a faith of 1.6 billion people?’ can be commonly found in counter-narrative datasets (Chung et al., 2019).
• The focus of evaluation. Social scientists and counterspeech practitioners generally attempt to understand and assess the impact of counterspeech on reducing harms (e.g., which strategies are effective and public perception towards counterspeech), whereas computer scientists focus more on technical exploration of automated systems and testing their performance in producing counterspeech (e.g., comparing system outputs with a pre-established ground truth or supposedly ideal output). One commonality between the social science and computer science studies is that most findings are drawn from controlled and small-scale studies. Applying interventions to real-world scenarios is a critical next step.
• Datasets. Dataset creation is an important component in computer science for developing machine learning models for generating counterspeech, while such contributions are less commonly considered in social sciences which rely on experiments using hand-crafted stimuli and one-time analyses of their effectiveness.
• Scope of research. We observe that, while computer scientists have focused on responses to abusive language and hate speech, social science studies address a wider range of phenomena, in particular radicalisation and terrorist extremism. It can be difficult to measure the effectiveness of counterspeech in challenging these over the short term, leading to some of the differences in evaluation metrics across disciplines.
• Lack of standardised methodologies. A variety of methodologies have been adopted in the literature, making comparisons across studies difficult. Without standardised evaluations, it is difficult to situate the results and draw robust findings. Ethical Issues, Risks and Challenges of Conducting Counterspeech Studies Effective evaluation of counterspeech not only identifies users who may need help, but also safeguards human rights and reinforces a stronger sense of responsibility in the community. This discussion is based on the authors’ opinion and not stemming from the review.
• Evaluating counterspeech in real-life settings Conducting the evaluation of counterspeech in real-world scenarios appears to provide a proactive and quick overview of its performance on hate mitigation. However, from an ethical perspective, the debate surrounding it is ongoing and reaching an agreement can be difficult. For instance, one side argues about the morality of exposing participants to harm, while another points to the importance of internet safety. Exercising counterspeech can offer mitigation of online abuse in good faith and may be exempt from liability based on several legal groundings. As an example, Good Samaritan laws provide indemnity to people who assist others in danger (Smits, 2000). In 2017 the EU Commission released a communication on tackling illegal content online, stating that ‘This Communication ... aims to provide clarifications to platforms on their liability when they take proactive steps to detect, remove or disable access to illegal content (the so-called “Good Samaritan” actions)’ (Commission, 2017). Section 230(c)(2) of Title 47 of the United States Code extents this protection to the good faith removal or moderation of third-party material they deem “obscene, lewd, lascivious, filthy, excessively violent, harassing, or otherwise objectionable, whether or not such material is constitutionally protected.” and stresses liability towards online hate speech. It protects online computer services from liability for moderating thirdparty materials that are harmful (Ardia, 2009; Goldman, 2018). The aim of these safeguards is to ensure that individuals are not hesitant to help others in distress due to the fear of facing legal consequences in case of unintentionally making errors in their efforts to provide support.
Responsible open-source research can facilitate reproducibility and transparency of science. Recently, reproducible research has been deemed critical in both social sciences (Stroebe et al., 2012; Derksen and Morawski, 2022) and computer science, and low replication success is found despite using materials provided in the original papers (Belz et al., 2023; Collaboration, 2015). To tackle this issue, a few initiatives for transparent research have been proposed, advocating researchers to state succinctly in papers how experiments are conducted (e.g., stimuli, mechanisms for data selection) and evaluated, including A 21 Word Solution (Simmons et al., 2012) and Open Science Framework.[5] Furthermore, practising data sharing encourages researchers to be responsible for fair and transparent experimental designs, and to avoid subtle selection biases that might affect substantive research questions under investigation (Dennis et al., 2019). At the same time, when handling sensitive or personal information, data sharing should adhere to research ethics and privacy standards (Dennis et al., 2019; de la Cueva and Méndez, 2022). For instance, in the case of hate speech, using synthetic examples or de-identification techniques is considered a good general practice for ensuring the safety of individuals (Kirk et al., 2022).
• Automating counterspeech generation There are several ethical challenges related to automating the task of counterspeech generation. First of all, there is the danger of dual-use: the same methodology could also be used to silence other voices.
Furthermore, effective and ethical counterspeech relies on the accuracy and robustness of detecting online hate speech: an innocent speaker may be publicly targeted and shamed if an utterance is falsely classified as hate speech – either directly or indirectly as in end-to-end response generation. For example, Google’s Jigsaw API (Google Jigsaw, 2022), a widely used tool for detecting toxic language, makes predictions that are aligned with racist beliefs and biases—for example it is less likely to rate anti-Black language as toxic, but more likely to mark African American English as toxic (Sap et al., 2022). It is thus important to make sure that the underlying tool is not biased and well-calibrated to the likelihood that an utterance was indeed intended as hate speech. For example, the ‘tone’ of counterspeech could be used to reflect the model’s confidence.
Furthermore, effective and ethical counterspeech relies on the accuracy and robustness of detecting online hate speech: an innocent speaker may be publicly targeted and shamed if an utterance is falsely classified as hate speech – either directly or indirectly as in end-to-end response generation. For example, Google’s Jigsaw API (Google Jigsaw, 2022), a widely used tool for detecting toxic language, makes predictions that are aligned with racist beliefs and biases—for example it is less likely to rate anti-Black language as toxic, but more likely to mark African American English as toxic (Sap et al., 2022). It is thus important to make sure that the underlying tool is not biased and well-calibrated to the likelihood that an utterance was indeed intended as hate speech. For example, the ‘tone’ of counterspeech could be used to reflect the model’s confidence.
In sum, there is a trade-off between risks and benefits of counterspeech generation. Following the ‘Good Samaritan’ law: automating counterspeech provides timely help to victims in an emergency which is protected against prosecution (even if it goes wrong). Similar legislation is adopted by other countries, including the European Union, Australia and the UK. Under this interpretation, well-intentional counterspeech (by humans and machines) is better than doing nothing at all.
Best practices We provide best practices for developing successful intervention tools.
-
Bear in mind practical use cases and scenarios of hate-countering tools. A single intervention strategy is unlikely to diminish online harm. To design successful counterspeech tools, it is important to consider the purposes of counter messages (e.g., support victims and debunk stereotypes), the speakers (e.g., practitioners, authorities and high-profile people), recipients (e.g., ingroup/outgroup, political background and education level), the content (e.g., strategy, style, and tones), intensity (e.g., one message per week/month), and the communication medium (e.g., videos, text, and platforms).
-
Look beyond automated metrics and consider deployment settings for evaluating the performance of generation systems. Generation systems are generally evaluated on test sets in a controlled environment using accuracy-based metrics (e.g., ROUGE and BLEU) that cannot address social implications of a system. Drawn from social science studies, metrics assessing social impact (e.g., user engagement), behavioural change (e.g., measure abuse reduction in online discourse) and attitude change (e.g., through self-description questionnaires) can be considered. A good intervention system is expected to pertain long-lasting effects.
-
Be clear about the methodology employed in experiments, open-source experimental materials (e.g., stimuli, questionnaires and codebook), and describe the desirable criteria for evaluating counterspeech intervention. As standardised procedures are not yet established for the assessment of counterspeech interventions, examining the impact of interventions becomes difficult. A meaningful description of experimental design would therefore enhance reproducible research and help capture the limitation of existing research.
-
Establish interdisciplinary collaboration across areas such as counter-terrorism, political science, psychology and computer science. AI researchers can help guide policymakers and practitioners to, for instance, identify long-term interventions by performing large-scale data analysis using standardized procedures on representative and longitudinal samples. With expertise in theories of human behaviour change and experimental design, social science researchers can conduct qualitative evaluations of AI intervention tools in real-life scenarios to understand their social impact.
8 Conclusion
Online hate speech is a pressing global issue, prompting scientists and practitioners to examine potential solutions. Counterspeech, content that directly rebuts hateful content, is one promising avenue. While AI researchers are already beginning to explore opportunities to automate the generation of counterspeech for the mitigation of hate at scale, research from the social sciences points to many nuances that need to be considered regarding the impact of counterspeech before this intervention is deployed. Taking an interdisciplinary approach, we have attempted to synthesize the growing body of work in the field. Through our analysis of extant work, we suggest that findings regarding the efficacy of counterspeech are highly dependent on several factors, including methodological ones such as study design and outcome measures, and features of counterspeech such as the speaker, target of hate, and strategy employed. While some work finds counterspeech to be effective in lowering further hate generation from the perpetrator and raising feelings of empowerment in bystanders and targets, others find that counterspeech can backfire and encourage more hate. To understand the advantages and disadvantages of counterspeech more deeply, we suggest that empirical research should focus on testing counterspeech interventions in real-world settings which are scalable, durable, reliable, and specific. Researchers should agree on key outcome variables of interest in order to understand the optimal social conditions for producing counterspeech at scale by automating its generation. We hope that this review helps make sense of the variety of types of counterspeech that have been studied to date and prompts future collaborations between social and computer scientists working to ameliorate the negative effects of online hate.
Acknowledgements
We thank Bertie Vidgen for the valuable feedback on the initial structure of this manuscript and Hannah Rose Kirk for her help with the collection of target literature.
References
Adak, S., Chakraborty, S., Das, P., Das, M., Dash, A., Hazra, R., Mathew, B., Saha, P., Sarkar, S., and Mukherjee, A. (2022). Mining the online infosphere: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(5):e1453.
Alsagheer, D., Mansourifar, H., and Shi, W. (2022). Counter hate speech in social media: A survey. arXiv preprint arXiv:2203.03584.
Ardia, D. S. (2009). Free speech savior or shield for scoundrels: an empirical study of intermediary immunity under section 230 of the communications decency act. Loy. LAL Rev., 43:373.
Ashida, M. and Komachi, M. (2022). Towards automatic generation of messages countering online hate speech and microaggressions. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)
Bartlett, J. and Krasodomski-Jones, A. (2015). Counter-speech examining content that challenges extremism online. DEMOS, October.
Belz, A., Thomson, C., and Reiter, E. (2023). Missing information, unresponsive authors, experimental flaws: The impossibility of assessing the reproducibility of previous human evaluations in NLP. In The Fourth Workshop on Insights from Negative Results in NLP, pages 1–10, Dubrovnik, Croatia. Association for Computational Linguistics.
Benesch, S. (2014a). Countering dangerous speech: New ideas for genocide prevention. Washington, DC: US Holocaust Memorial Museum
Benesch, S. (2014b). Defining and diminishing hate speech. State of the world’s minorities and indigenous peoples, 2014:18–25.
Benesch, S., Ruths, D., Dillon, K. P., Saleem, H. M., and Wright, L. (2016). Counterspeech on Twitter: A field study. Dangerous Speech Project.
Berend, G. (2022). Combating the curse of multilinguality in cross-lingual WSD by aligning sparse contextualized word representations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2459–2471, Seattle, United States. Association for Computational Linguistics.
Bertoldi, N., Cettolo, M., and Federico, M. (2013). Cache-based online adaptation for machine translation enhanced computer assisted translation. In MT-Summit, pages 35–42.
Bilewicz, M., Tempska, P., Leliwa, G., Dowgiałło, M., Tanska, M., Urbaniak, R., and Wroczy ´ nski, M. ´ (2021). Artificial intelligence against hate: Intervention reducing verbal aggression in the social network environment. Aggressive Behavior, 47(3):260–266.
Birhane, A., Isaac, W., Prabhakaran, V., Diaz, M., Elish, M. C., Gabriel, I., and Mohamed, S. (2022). Power to the people? Opportunities and challenges for participatory AI. In Equity and Access in Algorithms, Mechanisms, and Optimization, EAAMO ’22, New York, NY, USA. Association for Computing Machinery.
Blodgett, S. L., Barocas, S., Daumé III, H., and Wallach, H. (2020). Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
Bonaldi, H., Dellantonio, S., Tekiroglu, S. S., and Guerini, M. (2022). Human-machine collaboration ˘ approaches to build a dialogue dataset for hate speech countering. arXiv preprint arXiv:2211.03433.
Buerger, C. (2021a). Counterspeech: A literature review. Available at SSRN 4066882.
Buerger, C. (2021b). #iamhere: Collective counterspeech and the quest to improve online discourse. Social Media + Society, 7(4):20563051211063843.
Buerger, C. (2022). Why they do it: Counterspeech theories of change. Available at SSRN 4245211.
Bélanger, J. J., Nisa, C. F., Schumpe, B. M., Gurmu, T., Williams, M. J., and Putra, I. E. (2020). Do counter-narratives reduce support for isis? yes, but not for their target audience. Frontiers in Psychology, 11.
Carthy, S. L., Doody, C. B., Cox, K., O’Hora, D., and Sarma, K. M. (2020). Counter-narratives for the prevention of violent radicalisation: A systematic review of targeted interventions. Campbell Systematic Reviews, 16(3):e1106.
Carthy, S. L. and Sarma, K. M. (2021). Countering terrorist narratives: Assessing the efficacy and mechanisms of change in counter-narrative strategies. Terrorism and Political Violence, 0(0):1–25.
Cettolo, M., Bertoldi, N., and Federico, M. (2014). The repetition rate of text as a predictor of the effectiveness of machine translation adaptation. In Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), pages 166–179.
Chakravarthi, B. R. (2022). Multilingual hope speech detection in english and dravidian languages. International journal of data science and analytics, 14(4):389—406.
Chaudhary, M., Saxena, C., and Meng, H. (2021). Countering online hate speech: An nlp perspective. arXiv preprint arXiv:2109.02941.
Chung, Y.-L., Guerini, M., and Agerri, R. (2021a). Multilingual counter narrative type classification. In Proceedings of the 8th Workshop on Argument Mining, pages 125–132, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Chung, Y.-L., Kuzmenko, E., Tekiroglu, S. S., and Guerini, M. (2019). CONAN - COunter NArratives through nichesourcing: a multilingual dataset of responses to fight online hate speech. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2819–2829, Florence, Italy. Association for Computational Linguistics.
Chung, Y.-L., Tekiroglu, S. S., and Guerini, M. (2020). Italian counter narrative generation to fight ˘ online hate speech. In Proceedings of the Seventh Italian Conference on Computational Linguistics, Online.
Chung, Y.-L., Tekiroglu, S. S., and Guerini, M. (2021b). Towards knowledge-grounded counter ˘ narrative generation for hate speech. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 899–914, Online. Association for Computational Linguistics.
Chung, Y.-L., Tekiroglu, S. S., Tonelli, S., and Guerini, M. (2021c). Empowering ngos in countering ˘ online hate messages. Online Social Networks and Media, 24:100150.
Citron, D. K. and Norton, H. (2011). Intermediaries and hate speech: Fostering digital citizenship for our information age. BUL Rev., 91:1435.
Collaboration, O. S. (2015). Estimating the reproducibility of psychological science. Science, 349(6251):aac4716.
Commission, E. (2017). Communication from the commission to the european parliament, the council, the european economic and social committee and the committee of the regions.
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
de la Cueva, J. and Méndez, E. (2022). Open science and intellectual property rights. how can they better interact? state of the art and reflections. report of study. european commission.
Dennis, S., Garrett, P., Yim, H., Hamm, J., Osth, A. F., Sreekumar, V., and Stone, B. (2019). Privacy versus open science. Behavior research methods, 51:1839–1848.
Derksen, M. and Morawski, J. (2022). Kinds of replication: Examining the meanings of “conceptual replication” and “direct replication”. Perspectives on Psychological Science, 17(5):1490–1505. PMID: 35245130.
Dinan, E., Abercrombie, G., Bergman, A., Spruit, S., Hovy, D., Boureau, Y.-L., and Rieser, V. (2022). SafetyKit: First aid for measuring safety in open-domain conversational systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4113–4133, Dublin, Ireland. Association for Computational Linguistics.
Dušek, O. and Kasner, Z. (2020). Evaluating semantic accuracy of data-to-text generation with natural language inference. In Proceedings of the 13th International Conference on Natural Language Generation, pages 131–137, Dublin, Ireland. Association for Computational Linguistics.
Ernst, J., Schmitt, J. B., Rieger, D., Beier, A. K., Vorderer, P., Bente, G., and Roth, H.-J. (2017). Hate beneath the counter speech? A qualitative content analysis of user comments on youtube related to counter speech videos. Journal for Deradicalization, (10):1–49.
Fanton, M., Bonaldi, H., Tekiroglu, S. S., and Guerini, M. (2021). Human-in-the-loop for data ˘ collection: a multi-target counter narrative dataset to fight online hate speech. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3226–3240, Online. Association for Computational Linguistics.
Ferguson, K. (2016). Countering violent extremism through media and communication strategies: A review of the evidence.
Fortuna, P., Soler-Company, J., and Wanner, L. (2021). How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets? Information Processing & Management, 58(3):102524.
Frenkel, S. and Conger, K. (2022). Hate Speech’s Rise on Twitter Is Unprecedented, Researchers Find. The New York Times.
Garland, J., Ghazi-Zahedi, K., Young, J.-G., Hébert-Dufresne, L., and Galesic, M. (2020). Countering hate on social media: Large scale classification of hate and counter speech. In Proceedings of the Fourth Workshop on Online Abuse and Harms, pages 102–112, Online. Association for Computational Linguistics.
Garland, J., Ghazi-Zahedi, K., Young, J.-G., Hébert-Dufresne, L., and Galesic, M. (2022). Impact and dynamics of hate and counter speech online. EPJ Data Science, 11(1):3.
Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N. A. (2020). RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics.
Goffredo, P., Basile, V., Cepollaro, B., and Patti, V. (2022). Counter-TWIT: An Italian corpus for online counterspeech in ecological contexts. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 57–66, Seattle, Washington (Hybrid). Association for Computational Linguistics.
Goldman, E. (2018). An overview of the united states’ section 230 internet immunity. Available at SSRN 3306737.
Google Jigsaw (2022). Perspective API. Accessed: 26 May 2023.
Hangartner, D., Gennaro, G., Alasiri, S., Bahrich, N., Bornhoft, A., Boucher, J., Demirci, B. B., Derksen, L., Hall, A., Jochum, M., Munoz, M. M., Richter, M., Vogel, F., Wittwer, S., Wüthrich, F., Gilardi, F., and Donnay, K. (2021). Empathy-based counterspeech can reduce racist hate speech in a social media field experiment. Proceedings of the National Academy of Sciences, 118(50):e2116310118.
He, B., Ziems, C., Soni, S., Ramakrishnan, N., Yang, D., and Kumar, S. (2022). Racism is a virus: Anti-asian hate and counterspeech in social media during the covid-19 crisis. In Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’21, page 90–94, New York, NY, USA. Association for Computing Machinery.
Iqbal, K., Zafar, S. K., and Mehmood, Z. (2019). Critical evaluation of pakistan’s counter-narrative efforts. Journal of Policing, Intelligence and Counter Terrorism, 14(2):147–163.
Jay, T. (2009). Do offensive words harm people? Psychology, public policy, and law, 15(2):81.
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., and Fung, P. (2023). Survey of hallucination in natural language generation. ACM Comput. Surv., 55(12).
Kennedy, C. J., Bacon, G., Sahn, A., and von Vacano, C. (2020). Constructing interval variables via faceted rasch measurement and multitask deep learning: a hate speech application. arXiv preprint arXiv:2009.10277.
Kirk, H., Birhane, A., Vidgen, B., and Derczynski, L. (2022). Handling and presenting harmful text in NLP research. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 497–510, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Leader Maynard, J. and Benesch, S. (2016). Dangerous speech and dangerous ideology: An integrated model for monitoring and prevention. Genocide Studies and Prevention, 9(3).
Lee, H., NA, Y. J., Song, H., Shin, J., and Park, J. C. (2022). Elf22: A context-based counter trolling dataset to combat internet trolls. In Proceedings of the 13th Language Resources and Evaluation, LREC 2022, Marseille, France, June 20-25, 2022, pages 3530–3541. European Language Resources Association.
Leonhard, L., Rueß, C., Obermaier, M., and Reinemann, C. (2018). Perceiving threat and feeling responsible. how severity of hate speech, number of bystanders, and prior reactions of others affect bystanders’ intention to counterargue against hate speech on facebook. Studies in Communication and Media, 7(4):555–579.
Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
Lin, H., Nalluri, P., Li, L., Sun, Y., and Zhang, Y. (2022). Multiplex anti-Asian sentiment before and during the pandemic: Introducing new datasets from Twitter mining. In Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis, pages 16–24, Dublin, Ireland. Association for Computational Linguistics.
Litaker, J. R., Lopez Bray, C., Tamez, N., Durkalski, W., and Taylor, R. (2022). Covid-19 vaccine acceptors, refusers, and the moveable middle: A qualitative study from central texas. Vaccines, 10(10).
Liu, C.-W., Lowe, R., Serban, I., Noseworthy, M., Charlin, L., and Pineau, J. (2016). How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132, Austin, Texas. Association for Computational Linguistics.
Mathew, B., Kumar, N., Goyal, P., Mukherjee, A., et al. (2018). Analyzing the hate and counter speech accounts on Twitter. arXiv:1812.02712.
Mathew, B., Saha, P., Tharad, H., Rajgaria, S., Singhania, P., Maity, S. K., Goyal, P., and Mukherjee, A. (2019). Thou shalt not hate: Countering online hate speech. In Proceedings of the International AAAI Conference on Web and Social Media, volume 13, pages 369–380.
Moher, D., Liberati, A., Tetzlaff, J., and Altman, D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Annals of Internal Medicine, 151(4):264–269. PMID: 19622511.
Munger, K. (2017). Tweetment effects on the tweeted: Experimentally reducing racist harassment. Political Behavior, 39(3):629–649.
News, B. (2018). MPs ‘being advised to quit Twitter’ to avoid online abuse. BBC News.
Nie, F., Yao, J.-G., Wang, J., Pan, R., and Lin, C.-Y. (2019). A simple recipe towards reducing hallucination in neural surface realisation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2673–2679, Florence, Italy. Association for Computational Linguistics.
Novikova, J., Dušek, O., Cercas Curry, A., and Rieser, V. (2017). Why we need new evaluation metrics for NLG. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2241–2252, Copenhagen, Denmark. Association for Computational Linguistics.
Obermaier, M., Schmuck, D., and Saleem, M. (2021). I’ll be there for you? effects of islamophobic online hate speech and counter speech on muslim in-group bystanders’ intention to intervene. New Media & Society, 0(0):14614448211017527.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
Poole, E., Giraud, E. H., and de Quincey, E. (2021). Tactical interventions in online hate speech: The case of #stopislam. New Media & Society, 23(6):1415–1442.
Priyadharshini, R., Chakravarthi, B. R., Cn, S., Durairaj, T., Subramanian, M., Shanmugavadivel, K., U Hegde, S., and Kumaresan, P. (2022). Overview of abusive comment detection in Tamil-ACL 2022. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 292–298, Dublin, Ireland. Association for Computational Linguistics.
Procter, R., Webb, H., Jirotka, M., Burnap, P., Housley, W., Edwards, A., and Williams, M. (2019). A study of cyber hate on twitter with implications for social media governance strategies. arXiv preprint arXiv:1908.11732.
Qian, J., Bethke, A., Liu, Y., Belding, E., and Wang, W. Y. (2019). A benchmark dataset for learning to intervene in online hate speech. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4755–4764, Hong Kong, China. Association for Computational Linguistics.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
Reynolds, L. and Tuck, H. (2016). The counter-narrative monitoring & evaluation handbook. Institute for Strategic Dialogue.
Riedl, M. J., Masullo, G. M., and Whipple, K. N. (2020). The downsides of digital labor: Exploring the toll incivility takes on online comment moderators. Computers in Human Behavior, 107:106262.
Saha, K., Chandrasekharan, E., and De Choudhury, M. (2019). Prevalence and psychological effects of hateful speech in online college communities. In Proceedings of the 10th ACM Conference on Web Science, WebSci ’19, page 255–264, New York, NY, USA. Association for Computing Machinery.
Saha, P., Singh, K., Kumar, A., Mathew, B., and Mukherjee, A. (2022). Countergedi: A controllable approach to generate polite, detoxified and emotional counterspeech.
Saltman, E., Kooti, F., and Vockery, K. (2021). New models for deploying counterspeech: Measuring behavioral change and sentiment analysis. Studies in Conflict & Terrorism, 0(0):1–24.
Saltman, E. M. and Russell, J. (2014). White paper–the role of Prevent in countering online extremism. Quilliam publication.
Sap, M., Swayamdipta, S., Vianna, L., Zhou, X., Choi, Y., and Smith, N. A. (2022). Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5884–5906, Seattle, United States. Association for Computational Linguistics.
Schieb, C. and Preuss, M. (2016). Governing hate speech by means of counterspeech on Facebook. In 66th ICA annual conference, at Fukuoka, Japan, pages 1–23.
Siegel, A. A. (2020). Online hate speech. Social media and democracy: The state of the field, prospects for reform, pages 56–88.
Silverman, T., Stewart, C. J., Birdwell, J., and Amanullah, Z. (2016). The impact of counter-narratives. Institute for Strategic Dialogue.
Simmons, J. P., Nelson, L. D., and Simonsohn, U. (2012). A 21 word solution. Available at SSRN 2160588.
Smits, J. M. (2000). The good samaritan in european private law; on the perils of principles without a programme and a programme for the future.
Snyder, C. R., Rand, K. L., and Sigmon, D. R. (2018). Hope Theory: A Member of the Positive Psychology Family. In The Oxford Handbook of Hope, pages 257–276. Oxford University Press.
Solaiman, I., Brundage, M., Clark, J., Askell, A., Herbert-Voss, A., Wu, J., Radford, A., Krueger, G., Kim, J. W., Kreps, S., et al. (2019). Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203.
Stroebe, W. (2008). Strategies of attitude and behaviour change.
Stroebe, W., Postmes, T., and Spears, R. (2012). Scientific misconduct and the myth of self-correction in science. Perspectives on Psychological Science, 7(6):670–688.
Stroud, S. R. and Cox, W. (2018). The Varieties of Feminist Counterspeech in the Misogynistic Online World, pages 293–310. Springer International Publishing, Cham.
Tekiroglu, S. S., Bonaldi, H., Fanton, M., and Guerini, M. (2022). Using pre-trained language ˘ models for producing counter narratives against hate speech: a comparative study. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3099–3114, Dublin, Ireland. Association for Computational Linguistics.
Tekiroglu, S. S., Chung, Y.-L., and Guerini, M. (2020). Generating counter narratives against online ˘ hate speech: Data and strategies. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1177–1190, Online. Association for Computational Linguistics.
Toliyat, A., Levitan, S. I., Peng, Z., and Etemadpour, R. (2022). Asian hate speech detection on twitter during covid-19. Frontiers in Artificial Intelligence, 5.
Tuck, H. and Silverman, T. (2016). The counter-narrative handbook. Institute for Strategic Dialogue.
Vidgen, B., Hale, S., Guest, E., Margetts, H., Broniatowski, D., Waseem, Z., Botelho, A., Hall, M., and Tromble, R. (2020). Detecting East Asian prejudice on social media. In Proceedings of the Fourth Workshop on Online Abuse and Harms, pages 162–172, Online. Association for Computational Linguistics.
Vidgen, B., Margetts, H., and Harris, A. (2019). How much online abuse is there? A systematic review of evidence for the UK. Alan Turing Institute Policy Briefing.
Vidgen, B., Nguyen, D., Margetts, H., Rossini, P., and Tromble, R. (2021). Introducing CAD: the contextual abuse dataset. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2289–2303, Online. Association for Computational Linguistics.
Wang, K. and Wan, X. (2018). Sentigan: Generating sentimental texts via mixture adversarial networks. In IJCAI, pages 4446–4452.
Wright, L., Ruths, D., Dillon, K. P., Saleem, H. M., and Benesch, S. (2017). Vectors for counterspeech on twitter. In Proceedings of the First Workshop on Abusive Language Online, pages 57–62.
Yin, W. and Zubiaga, A. (2021). Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Computer Science, 7:e598.
Yu, X., Blanco, E., and Hong, L. (2022). Hate speech and counter speech detection: Conversational context does matter. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5918–5930, Seattle, United States. Association for Computational Linguistics.
Zellers, R., Holtzman, A., Rashkin, H., Bisk, Y., Farhadi, A., Roesner, F., and Choi, Y. (2019). Defending against neural fake news. In Wallach, H., Larochelle, H., Beygelzimer, A., d'AlchéBuc, F., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
Zhou, C., Neubig, G., Gu, J., Diab, M., Guzmán, F., Zettlemoyer, L., and Ghazvininejad, M. (2021). Detecting hallucinated content in conditional neural sequence generation. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1393–1404, Online. Association for Computational Linguistics.
Zhu, W. and Bhat, S. (2021). Generate, prune, select: A pipeline for counterspeech generation against online hate speech. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 134–149, Online. Association for Computational Linguistics.
This paper is available on arxiv under CC BY-SA 4.0 DEED license.
[5] https://osf.io/