Contact: esindurmus AT cs DOT stanford DOT edu
[Google Scholar] [Semantic Scholar] [CV]
Hi! I am Esin Durmus. I am a Research Scientist at Anthropic Societal Impacts team. Previously, I was a Postdoctoral Scholar at Stanford NLP group working with Tatsunori Hashimoto and Dan Jurafsky. I received my PhD from Cornell University where I was advised by Claire Cardie.
My research interests lie at the intersection of Natural Language Processing, Machine Learning, and Computational Social Science. I am interested in developing evaluation methods and metrics to study the reliability and social impact of NLP/AI systems.
Publications
-
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus, Karina Nyugen, Thomas I Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli
Preprint, 2023.
[paper]
-
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Christopher T Small, Ivan Vendrov, Esin Durmus, Hadjar Homaei, Elizabeth Barry, Julien Cornebise, Ted Suzman, Deep Ganguli, Colin Megill
Preprint, 2023.
[paper]
-
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
Myra Cheng, Esin Durmus, Dan Jurafsky
In Proceedings of ACL, 2023.
Social Impact Award
[paper]
-
Tracing and Removing Data Errors in Natural Language Generation Datasets
Faisal Ladhak, Esin Durmus, Tatsunori Hashimoto
In Proceedings of ACL, 2023.
[paper]
-
Whose opinions do language models reflect?
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
In Proceedings of ICML, 2023.
[paper]
-
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan
In Proceedings of FAccT, 2023.
[paper]
-
Benchmarking large language models for news summarization
Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B Hashimoto
Preprint, 2023.
[paper]
-
When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization
Faisal Ladhak, Esin Durmus, Mirac Suzgun, Tianyi Zhang, Dan Jurafsky, Kathleen Mckeown, Tatsunori B Hashimoto
In Proceedings of EACL, 2023.
[paper]
-
Evaluating Human-Language Model Interaction
Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang
Preprint, 2022.
[paper]
-
Holistic Evaluation of Language Models
Preprint, 2022.
[paper]
-
Improving Faithfulness by Augmenting Negative Summaries from Fake Documents
Tianshu Wang, Faisal Ladhak, Esin Durmus, He He
In Proceedings of EMNLP, 2022.
-
Spurious Correlations in Reference-Free Evaluation of Text Generation
Esin Durmus, Faisal Ladhak, Tatsunori Hashimoto
In Proceedings of ACL, 2022.
[paper]
-
Gemv2: Multilingual nlg benchmarking in a single line of code
2022.
[paper]
-
Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization
Faisal Ladhak, Esin Durmus, He He, Claire Cardie, Kathleen McKeown
In Proceedings of ACL, 2022.
[paper]
-
Language Modeling via Stochastic Processes
Rose E Wang, Esin Durmus, Noah Goodman, Tatsunori Hashimoto
In Proceedings of ICLR, 2022.
[paper]
-
On the Opportunities and Risks of Foundation Models
[paper] [bib] -
Towards Understanding Persuasion in Computational Argumentation
PhD Dissertation
[paper] [bib] -
Leveraging Topic Relatedness for Argument Persuasion
Xinran Zhao, Esin Durmus, Hongming Zhang, Claire Cardie
In Findings of ACL, 2021.
[paper] [bib] -
The Gem Benchmark: Natural Language Generation, its Evaluation and Metrics
[Team] [paper] [bib] [website] -
WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown.
In Findings of EMNLP, 2020.
[paper] [data] [bib] - Exploring the Role of Argument Structure in Online Debate Persuasion
Jialu Li, Esin Durmus and Claire Cardie.
In Proceedings of EMNLP, 2020.
[paper] [bib] - FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive
Summarization
Esin Durmus, He He and Mona Diab.
In Proceedings of ACL, 2020.
[paper] [code] [bib] - The Role of Pragmatic and Discourse Context in Determining Argument Impact
Esin Durmus, Faisal Ladhak and Claire Cardie.
In Proceedings of EMNLP, 2019.
[paper] [bib] -
Determining Relative Argument Specificity and Stance for Complex Argumentative
Structures
Esin Durmus, Faisal Ladhak and Claire Cardie.
In Proceedings of ACL, 2019.
[paper] [bib] -
A Corpus for Modeling User and Language Effects in Argumentation on Online Debating
Esin Durmus and Claire Cardie.
In Proceedings of ACL, 2019.
[paper] [bib] [dataset] -
Persuasion of the Undecided: Language vs. the Listener
Liane Longpre, Esin Durmus and Claire Cardie.
In Proceedings of the 6th Workshop in Argumentation Mining 2019.
[paper] [bib] [dataset] -
Modeling the Factors of User Success in Online Debate
Esin Durmus and Claire Cardie.
In Proceedings of the World Wide Web Conference (WWW), 2019.
[paper] [bib] [dataset]
Cornell Chronicle Story
-
Exploring the Role of Prior Beliefs for Argument Persuasion
Esin Durmus and Claire Cardie.
In Proceedings of NAACL, 2018.
[paper] [bib] [dataset] -
Understanding the Effect of Gender and Stance on Opinion Expression in Debates on
"Abortion”.
Esin Durmus and Claire Cardie.
In Proceedings of PEOPLES2018 workshop (co-organized with NAACL) on computational modeling of peoples opinions, personality, and emotions in social media.
[paper] [bib] - Cornell Belief and Sentiment System at TAC 2016
Vlad Niculae, Kai Sun, Xilun Chen, Yao Cheng, Xinya Du, Esin Durmus, Arzoo Katiyar and Claire Cardie.
Text Analysis Conference (TAC), 2016.
[paper] [bib]
Published Datasets
- WikiLingua
- DDO (Debate.org) corpus
- Kialo Dataset: get access via email.
Teaching
- Instructor for Introduction to Natural Language Processing, Cornell University. Fall 2020.
- Teaching Assistant for Introduction to Natural Language Processing, Cornell University. Fall 2016, Fall 2017, Fall 2019.
- Teaching Assistant for Machine Learning for Data Science, Cornell University. Spring 2016.
- Teaching Assistant for Introduction to Web Design, Cornell University. Fall 2015.
Industry Experience
- Research Intern in Google AI Research. Summer 2020 - December 2020.
- Applied Scientist Intern in Amazon AWS. Summer 2019 - December 2019.
- Applied Scientist Intern in Amazon Alexa. Summer 2017.