ABSTRACT
The past 3 years of work in NLP have been characterized by the development and deployment of ever larger language models, especially for English. BERT, its variants, GPT-2/3, and others, most recently Switch-C, have pushed the boundaries of the possible both through architectural innovations and through sheer size. Using these pretrained models and the methodology of fine-tuning them for specific tasks, researchers have extended the state of the art on a wide array of tasks as measured by leaderboards on specific benchmarks for English. In this paper, we take a step back and ask: How big is too big? What are the possible risks associated with this technology and what paths are available for mitigating those risks? We provide recommendations including weighing the environmental and financial costs first, investing resources into curating and carefully documenting datasets rather than ingesting everything on the web, carrying out pre-development exercises evaluating how the planned approach fits into research and development goals and supports stakeholder values, and encouraging research directions beyond ever larger language models.
- Hussein M Adam, Robert D Bullard, and Elizabeth Bell. 2001. Faces of environmental racism: Confronting issues of global justice. Rowman & Littlefield.Google Scholar
- Chris Alberti, Kenton Lee, and Michael Collins. 2019. A BERT Baseline for the Natural Questions. arXiv:1901.08634 [cs.CL]Google Scholar
- Larry Alexander. 1992. What makes wrongful discrimination wrong? Biases, preferences, stereotypes, and proxies. University of Pennsylvania Law Review 141, 1 (1992), 149--219.Google ScholarCross Ref
- American Psychological Association. 2019. Discrimination: What it is, and how to cope. https://www.apa.org/topics/discrimination (2019).Google Scholar
- Dario Amodei and Daniel Hernandez. 2018. AI and Compute. https://openai. com/blog/ai-and-compute/Google Scholar
- David Anthoff, Robert J Nicholls, and Richard SJ Tol. 2010. The economic impact of substantial sea-level rise. Mitigation and Adaptation Strategies for Global Change 15, 4 (2010), 321--335.Google ScholarCross Ref
- Mikhail J Atallah, Victor Raskin, Christian F Hempelmann, Mercan Karahan, Radu Sion, Umut Topkara, and Katrina E Triezenberg. 2002. Natural Language Watermarking and Tamperproofing. In International Workshop on Information Hiding. Springer, 196--212.Google Scholar
- Alexei Baevski and Abdelrahman Mohamed. 2020. Effectiveness of Self-Supervised Pre-Training for ASR. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 7694--7698.Google Scholar
- Michael Barera. 2020. Mind the Gap: Addressing Structural Equity and Inclusion on Wikipedia. (2020). Accessible at http://hdl.handle.net/10106/29572.Google Scholar
- Russel Barsh. 1990. Indigenous peoples, racism and the environment. Meanjin 49, 4 (1990), 723.Google Scholar
- Christine Basta, Marta R Costa-jussà , and Noe Casas. 2019. Evaluating the Underlying Gender Bias in Contextualized Word Embeddings. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing. 33--39.Google ScholarCross Ref
- Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3615--3620. https://doi.org/10.18653/v1/D19-1371Google Scholar
- Emily M. Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6 (2018), 587--604.Google ScholarCross Ref
- Emily M. Bender and Alexander Koller. 2020. Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5185--5198. https://doi.org/10.18653/v1/2020.acl-main.463Google Scholar
- Ruha Benjamin. 2019. Race After Technology: Abolitionist Tools for the New Jim Code. Polity Press, Cambridge, UK.Google Scholar
- Elettra Bietti and Roxana Vatanparast. 2020. Data Waste. Harvard International Law Journal 61 (2020).Google Scholar
- Steven Bird. 2016. Social Mobile Technologies for Reconnecting Indigenous and Immigrant Communities.. In People.Policy.Place Seminar. Northern Institute, Charles Darwin University, Darwin, Australia. https://www.cdu.edu.au/sites/default/files/the-northern-institute/ppp-bird-20160128-4up.pdfGoogle Scholar
- Abeba Birhane and Vinay Uday Prabhu. 2021. Large Image Datasets: A Pyrrhic Win for Computer Vision?. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1537--1547.Google ScholarCross Ref
- Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) is Power: A Critical Survey of "Bias" in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5454--5476. https://doi.org/10.18653/v1/2020.acl-main.485Google ScholarCross Ref
- Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large Language Models in Machine Translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, Prague, Czech Republic, 858--867. https://www.aclweb.org/anthology/D07-1090Google Scholar
- Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E Peters, Ashish Sabharwal, and Yejin Choi. 2020. Adversarial Filters of Dataset Biases. In Proceedings of the 37th International Conference on Machine Learning.Google Scholar
- Luke Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov. 2019. Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 1664--1674. https://doi.org/10.18653/v1/D19-1176Google Scholar
- Susan E Brennan and Herbert H Clark. 1996. Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory, and Cognition 22, 6 (1996), 1482.Google ScholarCross Ref
- Robin Brewer and Anne Marie Piper. 2016. "Tell It Like It Really Is" A Case of Online Content Creation and Sharing Among Older Adult Bloggers. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 5529--5542.Google ScholarDigital Library
- Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.htmlGoogle Scholar
- Cristian Buciluă, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model Compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Philadelphia, PA, USA) (KDD '06). Association for Computing Machinery, New York, NY, USA, 535--541. https://doi.org/10.1145/1150402.1150464Google ScholarDigital Library
- Robert D Bullard. 1993. Confronting environmental racism: Voices from the grassroots. South End Press.Google Scholar
- Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. 2020. Extracting Training Data from Large Language Models. arXiv:2012.07805 [cs.CR]Google Scholar
- Herbert H. Clark. 1996. Using Language. Cambridge University Press, Cambridge.Google Scholar
- Herbert H. Clark and Adrian Bangerter. 2004. Changing ideas about reference. In Experimental Pragmatics. Springer, 25--49.Google Scholar
- Herbert H. Clark and Meredyth A Krych. 2004. Speaking while monitoring addressees for understanding. Journal of Memory and Language 50, 1 (2004), 62--81.Google ScholarCross Ref
- Herbert H. Clark, Robert Schreuder, and Samuel Buttrick. 1983. Common ground at the understanding of demonstrative reference. Journal of Verbal Learning and Verbal Behavior 22, 2 (1983), 245--258. https://doi.org/10.1016/S0022-5371(83)90189-5Google ScholarCross Ref
- Herbert H. Clark and Deanna Wilkes-Gibbs. 1986. Referring as a collaborative process. Cognition 22, 1 (1986), 1--39. https://doi.org/10.1016/0010-0277(86) 90010-7Google ScholarCross Ref
- Kimberlé Crenshaw. 1989. Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. The University of Chicago Legal Forum (1989), 139.Google Scholar
- Benjamin Dangl. 2019. The Five Hundred Year Rebellion: Indigenous Movements and the Decolonization of History in Bolivia. AK Press.Google Scholar
- Christian Davenport. 2009. Media bias, perspective, and state repression: The Black Panther Party. Cambridge University Press.Google Scholar
- Ferdinand de Saussure. 1959. Course in General Linguistics. The Philosophical Society, New York. Translated by Wade Baskin.Google ScholarDigital Library
- Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. 2019. Does object recognition work for everyone?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 52--59.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423Google Scholar
- Maeve Duggan. 2017. Online Harassment 2017. Pew Research Center.Google Scholar
- Jennifer Earl, Andrew Martin, John D. McCarthy, and Sarah A. Soule. 2004. The use of newspaper data in the study of collective action. Annual Review of Sociology 30 (2004), 65--80.Google ScholarCross Ref
- Ethan Fast, Tina Vachovsky, and Michael Bernstein. 2016. Shirtless and Dangerous: Quantifying Linguistic Signals of Gender Bias in an Online Fiction Writing Community. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 10.Google Scholar
- William Fedus, Barret Zoph, and Noam Shazeer. 2021. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. arXiv:2101.03961 [cs.LG]Google Scholar
- Anjalie Field, Doron Kliger, Shuly Wintner, Jennifer Pan, Dan Jurafsky, and Yulia Tsvetkov. 2018. Framing and Agenda-setting in Russian News: a Computational Analysis of Intricate Political Strategies. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 3570--3580. https://doi.org/10.18653/v1/D18-1393Google ScholarCross Ref
- Darja Fišer, Ruihong Huang, Vinodkumar Prabhakaran, Rob Voigt, Zeerak Waseem, and Jacqueline Wernimont (Eds.). 2018. Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Association for Computational Linguistics, Brussels, Belgium. https://www.aclweb.org/anthology/W18-5100Google Scholar
- Susan T Fiske. 2017. Prejudices in cultural contexts: shared stereotypes (gender, age) versus variable stereotypes (race, ethnicity, religion). Perspectives on psychological science 12, 5 (2017), 791--799.Google Scholar
- Antigoni Founta, Constantinos Djouvas, Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Gianluca Stringhini, Athena Vakali, Michael Sirivianos, and Nicolas Kourtellis. 2018. Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12.Google ScholarCross Ref
- Batya Friedman and David Hendry. 2012. The Envisioning Cards: A Toolkit for Catalyzing Humanistic and Technical Imaginations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, Texas, USA) (CHI '12). Association for Computing Machinery, New York, NY, USA, 1145--1148. https://doi.org/10.1145/2207676.2208562Google ScholarDigital Library
- Batya Friedman and David G. Hendry. 2019. Value Sensitive Design: Shaping Technology with Moral Imagination. MIT Press.Google Scholar
- Batya Friedman, Peter H. Kahn, Jr., and Alan Borning. 2006. Value sensitive design and information systems. In Human-Computer Interaction in Management Information Systems: Foundations, P Zhang and D Galletta (Eds.). M. E. Sharpe, Armonk NY, 348--372.Google Scholar
- Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. 2020. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv:2101.00027 [cs.CL]Google Scholar
- Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2020. Datasheets for Datasets. arXiv:1803.09010 [cs.DB]Google Scholar
- Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. 2020. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 3356--3369. https://doi.org/10.18653/v1/2020.findings-emnlp.301Google Scholar
- Wei Guo and Aylin Caliskan. 2020. Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases. arXiv preprint arXiv:2006.03955 (2020).Google Scholar
- Melissa Hart. 2004. Subjective decisionmaking and unconscious discrimination. Alabama Law Review 56 (2004), 741.Google Scholar
- Deborah Hellman. 2008. When is Discrimination Wrong? Harvard University Press.Google Scholar
- Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, and Joelle Pineau. 2020. Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning. Journal of Machine Learning Research 21, 248 (2020), 1--43. http://jmlr.org/papers/v21/20-312.htmlGoogle Scholar
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).Google Scholar
- Chao-Wei Huang and Yun-Nung Chen. 2019. Adapting Pretrained Transformer to Lattices for Spoken Language Understanding. In Proceedings of 2019 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2019). Sentosa, Singapore, 845--852.Google ScholarCross Ref
- Hongzhao Huang and Fuchun Peng. 2019. An Empirical Study of Efficient ASR Rescoring with Transformers. arXiv:1910.11450 [cs.CL]Google Scholar
- Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. 2020. Social Biases in NLP Models as Barriers for Persons with Disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5491--5501. https://doi.org/10.18653/v1/2020.acl-main.487Google ScholarCross Ref
- Eun Seo Jo and Timnit Gebru. 2020. Lessons from archives: strategies for collecting sociocultural data in machine learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 306--316.Google ScholarDigital Library
- Leslie Kay Jones. 2020. #BlackLivesMatter: An Analysis of the Movement as Social Drama. Humanity & Society 44, 1 (2020), 92--110.Google ScholarCross Ref
- Leslie Kay Jones. 2020. Twitter wants you to know that you're still SOL if you get a death threat --- unless you're President Donald Trump. (2020). https://medium.com/@agua.carbonica/twitter-wants-you-to-know-that-youre-still-sol-if-you-get-a-death-threat-unless-you-re-a5cce316b706.Google Scholar
- Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. 2020. The State and Fate of Linguistic Diversity and Inclusion in the NLP World. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 6282--6293. https://doi.org/10.18653/v1/2020.acl-main.560Google ScholarCross Ref
- Nurul Shamimi Kamaruddin, Amirrudin Kamsin, Lip Yee Por, and Hameedur Rahman. 2018. A Review of Text Watermarking: Theory, Methods, and Applications. IEEE Access 6 (2018), 8011--8028. https://doi.org/10.1109/ACCESS.2018. 2796585Google ScholarCross Ref
- Brendan Kennedy, Drew Kogon, Kris Coombs, Joseph Hoover, Christina Park, Gwenyth Portillo-Wightman, Aida Mostafazadeh Davani, Mohammad Atari, and Morteza Dehghani. 2018. A typology and coding manual for the study of hate-based rhetoric. PsyArXiv. July 18 (2018).Google Scholar
- Gary Klein. 2007. Performing a project premortem. Harvard business review 85, 9 (2007), 18--19.Google Scholar
- Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. 2019. Measuring Bias in Contextualized Word Representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing. 166--172.Google ScholarCross Ref
- Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv preprint arXiv:1909.11942 (2019).Google Scholar
- Amanda Lazar, Mark Diaz, Robin Brewer, Chelsea Kim, and Anne Marie Piper. 2017. Going gray, failure to hire, and the ick factor: Analyzing how older bloggers talk about ageism. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 655--668.Google ScholarDigital Library
- Christopher A Le Dantec, Erika Shehan Poole, and Susan P Wyche. 2009. Values as lived experience: evolving value sensitive design in support of value discovery. In Proceedings of the SIGCHI conference on human factors in computing systems. 1141--1150.Google ScholarDigital Library
- Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. 2020. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. arXiv:2006.16668 [cs.CL]Google Scholar
- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
- Kadan Lottick, Silvia Susai, Sorelle A. Friedler, and Jonathan P. Wilson. 2019. Energy Usage Reports: Environmental awareness as part of algorithmic accountability. arXiv:1911.08354 [cs.LG]Google Scholar
- Mette Edith Lundsfryd. 2017. Speaking Back to a World of Checkpoints: Oral History as a Decolonizing Tool in the Study of Palestinian Refugees from Syria in Lebanon. Middle East Journal of Refugee Studies 2, 1 (2017), 73--95.Google ScholarCross Ref
- Marianna Martindale and Marine Carpuat. 2018. Fluency Over Adequacy: A Pilot Study in Measuring User Trust in Imperfect MT. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). Association for Machine Translation in the Americas, Boston, MA, 13--25. https://www.aclweb.org/anthology/W18-1803Google Scholar
- Sally McConnell-Ginet. 1984. The Origins of Sexist Language in Discourse. Annals of the New York Academy of Sciences 433, 1 (1984), 123--135.Google ScholarCross Ref
- Sally McConnell-Ginet. 2020. Words Matter: Meaning and Power. Cambridge University Press.Google ScholarCross Ref
- Kris McGuffie and Alex Newhouse. 2020. The Radicalization Risks of GPT-3 and Advanced Neural Language Models. Technical Report. Center on Terrorism, Extremism, and Counterterrorism, Middlebury Institute of International Studies at Monterrey. https://www.middlebury.edu/institute/sites/www.middlebury.edu.institute/files/2020-09/gpt3-article.pdf.Google Scholar
- Douglas M McLeod. 2007. News coverage and social protest: How the media's protect paradigm exacerbates social conflict. Journal of Dispute Resolution (2007), 185.Google Scholar
- Oren Melamud, Jacob Goldberger, and Ido Dagan. 2016. context2vec: Learning Generic Context Embedding with Bidirectional LSTM. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Association for Computational Linguistics, Berlin, Germany, 51--61. https://doi.org/10. 18653/v1/K16-1006Google ScholarCross Ref
- Julia Mendelsohn, Yulia Tsvetkov, and Dan Jurafsky. 2020. A Framework for the Computational Linguistic Analysis of Dehumanization. Frontiers in Artificial Intelligence 3 (2020), 55. https://doi.org/10.3389/frai.2020.00055Google ScholarCross Ref
- Kaitlynn Mendes, Jessica Ringrose, and Jessalynn Keller. 2018. # MeToo and the promise and pitfalls of challenging rape culture through digital feminist activism. European Journal of Women's Studies 25, 2 (2018), 236--246.Google ScholarCross Ref
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS'13). Curran Associates Inc., Red Hook, NY, USA, 3111--3119.Google ScholarDigital Library
- Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220--229.Google ScholarDigital Library
- Robert C. Moore and William Lewis. 2010. Intelligent Selection of Language Model Training Data. In Proceedings of the ACL 2010 Conference Short Papers. Association for Computational Linguistics, Uppsala, Sweden, 220--224. https://www.aclweb.org/anthology/P10-2041Google Scholar
- Kevin L. Nadal. 2018. Microaggressions and Traumatic Stress: Theory, Research, and Clinical Treatment. American Psychological Association. https://books.google.com/books?id=ogzhswEACAAJGoogle Scholar
- Clifford Nass, Jonathan Steuer, and Ellen R Tauber. 1994. Computers are social actors. In Proceedings of the SIGCHI conference on Human factors in computing systems. 72--78.Google ScholarDigital Library
- Lisa P. Nathan, Predrag V. Klasnja, and Batya Friedman. 2007. Value Scenarios: A Technique for Envisioning Systemic Effects of New Technologies. In CHI'07 Extended Abstracts on Human Factors in Computing Systems. ACM, 2585--2590.Google ScholarDigital Library
- Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Muhammad, Salomon Kabongo Kabenamualu, Salomey Osei, Freshia Sackey, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa Berhe, Mofetoluwa Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkadir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, Ghollah Kioko, Murhabazi Espoir, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Chinenye Emezue, Bonaventure F. P. Dossou, Blessing Sibanda, Blessing Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Alp Öktem, Adewale Akinfaderin, and Abdallah Bashir. 2020. Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 2144--2160. https://doi.org/10.18653/v1/2020.findings-emnlp.195Google Scholar
- Maggie Nelson. 2015. The Argonauts. Graywolf Press, Minneapolis.Google Scholar
- Timothy Niven and Hung-Yu Kao. 2019. Probing Neural Network Comprehension of Natural Language Arguments. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4658--4664. https://doi.org/10.18653/v1/P19-1459Google ScholarCross Ref
- Safiya Umoja Noble. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. NYU Press.Google Scholar
- Debora Nozza, Federico Bianchi, and Dirk Hovy. 2020. What the [MASK]? Making Sense of Language-Specific BERT Models. arXiv:2003.02912 [cs.CL]Google Scholar
- David Ortiz, Daniel Myers, Eugene Walls, and Maria-Elena Diaz. 2005. Where do we stand with newspaper data? Mobilization: An International Quarterly 10, 3 (2005), 397--419.Google Scholar
- Charlotte Pennington, Derek Heim, Andrew Levy, and Derek Larkin. 2016. Twenty Years of Stereotype Threat Research: A Review of Psychological Mediators. PloS one 11 (01 2016), e0146487. https://doi.org/10.1371/journal.pone.0146487Google Scholar
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1532--1543. https://doi.org/10.3115/v1/D14-1162Google ScholarCross Ref
- Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227--2237. https://doi.org/10.18653/v1/N18-1202Google ScholarCross Ref
- Pew. 2018. Internet/Broadband Fact Sheet. (2 2018). https://www.pewinternet. org/fact-sheet/internet-broadband/Google Scholar
- Aidan Pine and Mark Turin. 2017. Language Revitalization. Oxford Research Encyclopedia of Linguistics.Google Scholar
- Francesca Polletta. 1998. Contending stories: Narrative in social movements. Qualitative sociology 21, 4 (1998), 419--446.Google Scholar
- Vinodkumar Prabhakaran, Ben Hutchinson, and Margaret Mitchell. 2019. Perturbation Sensitivity Analysis to Detect Unintended Model Biases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5740--5745. https://doi.org/10.18653/v1/D19-1578Google Scholar
- Laura Pulido. 2016. Flint, environmental racism, and racial capitalism. Capitalism Nature Socialism 27, 3 (2016), 1--16.Google ScholarCross Ref
- Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, and Xuanjing Huang. 2020. Pre-trained Models for Natural Language Processing: A Survey. arXiv:2003.08271 [cs.CL]Google Scholar
- Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.Google Scholar
- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1--67. http://jmlr.org/papers/v21/20-074.htmlGoogle Scholar
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, 2383--2392. https://doi.org/10.18653/v1/D16-1264Google ScholarCross Ref
- Sarah T. Roberts, Joel Tetreault, Vinodkumar Prabhakaran, and Zeerak Waseem (Eds.). 2019. Proceedings of the Third Workshop on Abusive Language Online. Association for Computational Linguistics, Florence, Italy. https://www.aclweb.org/anthology/W19-3500Google Scholar
- Anna Rogers, Olga Kovaleva, and Anna Rumshisky. 2021. A Primer in BERTology: What We Know About How BERT Works. Transactions of the Association for Computational Linguistics 8 (2021), 842--866.Google ScholarCross Ref
- Ronald Rosenfeld. 2000. Two decades of statistical language modeling: Where do we go from here? Proc. IEEE 88, 8 (2000), 1270--1278.Google ScholarCross Ref
- Corby Rosset. 2020. Turing-NLG: A 17-billion-parameter language model by Microsoft. Microsoft Blog (2020).Google Scholar
- Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google Scholar
- Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. 2020. Social Bias Frames: Reasoning about Social and Power Implications of Language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5477--5490. https://doi.org/10.18653/v1/2020.acl-main.486Google ScholarCross Ref
- Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM 63, 12 (Nov. 2020), 54--63. https://doi.org/10.1145/3381831Google ScholarDigital Library
- Sabine Sczesny, Janine Bosak, Daniel Neff, and Birgit Schyns. 2004. Gender stereotypes and the attribution of leadership traits: A cross-cultural comparison. Sex roles 51, 11-12 (2004), 631--645.Google Scholar
- Claude Elwood Shannon. 1949. The Mathematical Theory of Communication. University of Illinois Press, Urbana.Google Scholar
- Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2019. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. arXiv:1909.05840 [cs.CL]Google Scholar
- Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. 2019. The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3407--3412. https://doi.org/10.18653/v1/D19-1339Google Scholar
- Katie Shilton, Jes A Koepfler, and Kenneth R Fleischmann. 2014. How to see values in social computing: methods for studying values dimensions. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. 426--435.Google ScholarDigital Library
- Joonbo Shin, Yoonhyung Lee, and Kyomin Jung. 2019. Effective Sentence Scoring Method Using BERT for Speech Recognition. In Asian Conference on Machine Learning. 1081--1093.Google Scholar
- Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multi-billion parameter language models using gpu model parallelism. arXiv preprint arXiv:1909.08053 (2019).Google Scholar
- Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, et al. 2019. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203 (2019).Google Scholar
- Karen Spärck Jones. 2004. Language modelling's generative model: Is it rational? Technical Report. Computer Laboratory, University of Cambridge.Google Scholar
- Robyn Speer. 2017. ConceptNet Numberbatch 17.04: better, less-stereotyped word vectors. (2017). Blog post, https://blog.conceptnet.io/2017/04/24/conceptnet-numberbatch-17-04-better-less-stereotyped-word-vectors/.Google Scholar
- Steven J. Spencer, Christine Logel, and Paul G. Davies. 2016. Stereotype Threat. Annual Review of Psychology 67, 1 (2016), 415--437. https://doi.org/10.1146/annurev-psych-073115-103235 arXiv:https://doi.org/10.1146/annurev-psych-073115-103235 PMID: 26361054.Google ScholarCross Ref
- Katrina Srigley and Lorraine Sutherland. 2019. Decolonizing, Indigenizing, and Learning Biskaaybiiyang in the Field: Our Oral History Journey1. The Oral History Review (2019).Google Scholar
- Greg J. Stephens, Lauren J. Silbert, and Uri Hasson. 2010. Speaker-listener neural coupling underlies successful communication. Proceedings of the National Academy of Sciences 107, 32 (2010), 14425--14430. https://doi.org/10.1073/pnas. 1008662107 arXiv:https://www.pnas.org/content/107/32/14425.full.pdfGoogle ScholarCross Ref
- Emma Strubell, Ananya Ganesh, and Andrew McCallum. 2019. Energy and Policy Considerations for Deep Learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 3645--3650.Google ScholarCross Ref
- Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, and Hua Wu. 2019. ERNIE: Enhanced Representation through Knowledge Integration. arXiv:1904.09223 [cs.CL]Google Scholar
- Yu Sun, Shuohuan Wang, Yu-Kun Li, Shikun Feng, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 8968--8975. https://aaai.org/ojs/index.php/AAAI/article/view/6428Google ScholarCross Ref
- Yi Chern Tan and L Elisa Celis. 2019. Assessing social and intersectional biases in contextualized word representations. In Advances in Neural Information Processing Systems. 13230--13241.Google Scholar
- Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 4593--4601. https://doi.org/10.18653/v1/P19-1452Google ScholarCross Ref
- Trieu H. Trinh and Quoc V. Le. 2019. A Simple Method for Commonsense Reasoning. arXiv:1806.02847 [cs.AI]Google Scholar
- Marlon Twyman, Brian C Keegan, and Aaron Shaw. 2017. Black Lives Matter in Wikipedia: Collective memory and collaboration around online social movements. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. 1400--1412.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Å?ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google Scholar
- Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, and Yulia Tsvetkov. 2018. RtGender: A Corpus for Studying Differential Responses to Gender. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. https://www.aclweb.org/anthology/L18-1445Google Scholar
- Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. 2018. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, Brussels, Belgium, 353--355. https://doi.org/10. 18653/v1/W18-5446Google ScholarCross Ref
- Zeerak Waseem, Thomas Davidson, Dana Warmsley, and Ingmar Weber. 2017. Understanding Abuse: A Typology of Abusive Language Detection Subtasks. In Proceedings of the First Workshop on Abusive Language Online. Association for Computational Linguistics, Vancouver, BC, Canada, 78--84. https://doi.org/10.18653/v1/W17-3012Google ScholarCross Ref
- Joseph Weizenbaum. 1976. Computer Power and Human Reason: From Judgment to Calculation. WH Freeman & Co.Google ScholarDigital Library
- Monnica T Williams. 2019. Psychology Cannot Afford to Ignore the Many Harms Caused by Microaggressions. Perspectives on Psychological Science 15 (2019), 38--43.Google ScholarCross Ref
- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://doi.org/10.18653/v1/2020.emnlp-demos.6Google ScholarCross Ref
- World Bank. 2018. Indiviuals Using the Internet. (2018). https://data.worldbank. org/indicator/IT.NET.USER.ZS?end=2017amp;locations=USamp;start=2015Google Scholar
- Shijie Wu and Mark Dredze. 2020. Are All Languages Created Equal in Multilingual BERT?. In Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, Online, 120--130. https://doi.org/10.18653/v1/2020.repl4nlp-1.16Google ScholarCross Ref
- Dongling Xiao, Han Zhang, Yukun Li, Yu Sun, Hao Tian, Hua Wu, and Haifeng Wang. 2020. ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation. arXiv preprint arXiv:2001.11314 (2020).Google Scholar
- Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, and Ming Zhou. 2020. BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 7859--7869. https://doi.org/10.18653/v1/2020.emnlp-main.633Google ScholarCross Ref
- Peng Xu, Chien-Sheng Wu, Andrea Madotto, and Pascale Fung. 2019. Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement Learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3065--3075. https://doi.org/10.18653/v1/D19-1303Google Scholar
- Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. 2020. mT5: A massively multilingual pre-trained text-to-text transformer. arXiv:2010.11934 [cs.CL]Google Scholar
- Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. End-to-End Open-Domain Question Answering with BERTserini. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, Minneapolis, Minnesota, 72--77. https://doi.org/10.18653/v1/N19-4013Google Scholar
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems. 5753--5763.Google Scholar
- Ze Yang, Can Xu, Wei Wu, and Zhoujun Li. 2019. Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 5077--5089. https://doi.org/10.18653/v1/D19-1512Google Scholar
- Meg Young, Lassana Magassa, and Batya Friedman. 2019. Toward Inclusive Tech Policy Design: A Method for Underrepresented Voices to Strengthen Tech Policy Documents. Ethics and Information Technology (2019), 1--15.Google Scholar
- Ofir Zafrir, Guy Boudoukh, Peter Izsak, and Moshe Wasserblat. 2019. Q8BERT: Quantized 8Bit BERT. arXiv:1910.06188 [cs.CL]Google Scholar
- Nico Zazworka, Rodrigo O. SpÃnola, Antonio Vetro', Forrest Shull, and Carolyn Seaman. 2013. A Case Study on Effectively Identifying Technical Debt. In Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering (Porto de Galinhas, Brazil) (EASE '13). Association for Computing Machinery, New York, NY, USA, 42--47. https://doi.org/10.1145/2460999.2461005Google ScholarDigital Library
- Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 93--104. https://doi.org/10.18653/v1/D18-1009Google ScholarCross Ref
- Haoran Zhang, Amy X Lu, Mohamed Abdalla, Matthew McDermott, and Marzyeh Ghassemi. 2020. Hurtful words: quantifying biases in clinical contextual word embeddings. In Proceedings of the ACM Conference on Health, Inference, and Learning. 110--120.Google ScholarDigital Library
- Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. 2019. Gender Bias in Contextualized Word Embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 629--634. https://doi.org/10.18653/v1/N19-1064Google ScholarCross Ref
- Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. 2020. The Design and Implementation of XiaoIce, an Empathetic Social Chatbot. Computational Linguistics 46, 1 (March 2020), 53--93. https://doi.org/10.1162/coli_a_00368Google ScholarDigital Library
Index Terms
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜
Recommendations
POS tagger for Urdu using Stochastic approaches
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive StrategiesPart-of-Speech tagging is a problem of Natural language processing. It is a process of labeling an accurate part of speech for each word of a given corpus sentence. There are various approaches like rule based, stochastic and hybrid that are mainly used ...
Multilingual stochastic n-gram class language models
ICASSP '96: Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01Stochastic language models are widely used in continuous speech recognition systems where a priori probabilities of word sequences are needed. These probabilities are usually given by n-gram word models, estimated on very large training texts. When n ...
Automatic stochastic tagging of natural language texts
Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of ...
Comments