Esin Durmus

esin

Contact: esindurmus AT cs DOT stanford DOT edu

[Google Scholar] [Semantic Scholar] [CV]

Hi! I am Esin Durmus. I am a Research Scientist at Anthropic Societal Impacts team. Previously, I was a Postdoctoral Scholar at Stanford NLP group working with Tatsunori Hashimoto and Dan Jurafsky. I received my PhD from Cornell University where I was advised by Claire Cardie.

My research interests lie at the intersection of Natural Language Processing, Machine Learning, and Computational Social Science. I am interested in developing evaluation methods and metrics to study the reliability and social impact of NLP/AI systems.

Publications

  1. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
    Myra Cheng, Esin Durmus, Dan Jurafsky
    To appear at ACL 2023.
    [paper]

  2. Tracing and Removing Data Errors in Natural Language Generation Datasets
    Faisal Ladhak, Esin Durmus, Tatsunori Hashimoto
    To appear at ACL 2023.
    [paper]

  3. Whose opinions do language models reflect?
    Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
    To appear at ICML 2023.
    [paper]

  4. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
    Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan
    To appear at FaccT, 2023.
    [paper]

  5. Benchmarking large language models for news summarization
    Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B Hashimoto
    Preprint, 2023.
    [paper]

  6. When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization
    Faisal Ladhak, Esin Durmus, Mirac Suzgun, Tianyi Zhang, Dan Jurafsky, Kathleen Mckeown, Tatsunori B Hashimoto
    EACL, 2023.
    [paper]

  7. Evaluating Human-Language Model Interaction
    Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang
    Preprint, 2022.
    [paper]

  8. Holistic Evaluation of Language Models
    Preprint, 2022.
    [paper]

  9. Improving Faithfulness by Augmenting Negative Summaries from Fake Documents
    Tianshu Wang, Faisal Ladhak, Esin Durmus, He He
    EMNLP 2022.

  10. Spurious Correlations in Reference-Free Evaluation of Text Generation
    Esin Durmus, Faisal Ladhak, Tatsunori Hashimoto
    In Proceedings of ACL 2022 Main conference.
    [paper]

  11. Gemv2: Multilingual nlg benchmarking in a single line of code
    2022.
    [paper]

  12. Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization
    Faisal Ladhak, Esin Durmus, He He, Claire Cardie, Kathleen McKeown
    In Proceedings of ACL 2022 Main conference.
    [paper]

  13. Language Modeling via Stochastic Processes
    Rose E Wang, Esin Durmus, Noah Goodman, Tatsunori Hashimoto
    In Proceedings of ICLR 2022.
    [paper]

  14. On the Opportunities and Risks of Foundation Models
    [paper] [bib]

  15. Towards Understanding Persuasion in Computational Argumentation
    PhD Dissertation
    [paper] [bib]

  16. Leveraging Topic Relatedness for Argument Persuasion
    Xinran Zhao, Esin Durmus, Hongming Zhang, Claire Cardie
    In Findings of ACL, 2021.
    [paper] [bib]

  17. The Gem Benchmark: Natural Language Generation, its Evaluation and Metrics
    [Team] [paper] [bib] [website]

  18. WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
    Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown.
    In Findings of EMNLP, 2020.
    [paper] [data] [bib]

  19. Exploring the Role of Argument Structure in Online Debate Persuasion
    Jialu Li, Esin Durmus and Claire Cardie.
    In Proceedings of EMNLP, 2020.
    [paper] [bib]

  20. FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization
    Esin Durmus, He He and Mona Diab.
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2020.
    [paper] [code] [bib]

  21. The Role of Pragmatic and Discourse Context in Determining Argument Impact
    Esin Durmus, Faisal Ladhak and Claire Cardie.
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019.
    [paper] [bib]

  22. Determining Relative Argument Specificity and Stance for Complex Argumentative Structures
    Esin Durmus, Faisal Ladhak and Claire Cardie.
    In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
    [paper] [bib]

  23. A Corpus for Modeling User and Language Effects in Argumentation on Online Debating
    Esin Durmus and Claire Cardie.
    In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
    [paper] [bib] [dataset]

  24. Persuasion of the Undecided: Language vs. the Listener
    Liane Longpre, Esin Durmus and Claire Cardie.
    In Proceedings of the 6th Workshop in Argumentation Mining 2019.
    [paper] [bib] [dataset]

  25. Modeling the Factors of User Success in Online Debate
    Esin Durmus and Claire Cardie.
    In Proceedings of the World Wide Web Conference (WWW), 2019.
    [paper] [bib] [dataset]
    Cornell Chronicle Story

  26. Exploring the Role of Prior Beliefs for Argument Persuasion
    Esin Durmus and Claire Cardie.
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), 2018.
    [paper] [bib] [dataset]

  27. Understanding the Effect of Gender and Stance on Opinion Expression in Debates on "Abortion”.
    Esin Durmus and Claire Cardie.
    In Proceedings of PEOPLES2018 workshop (co-organized with NAACL) on computational modeling of peoples opinions, personality, and emotions in social media.
    [paper] [bib]

  28. Cornell Belief and Sentiment System at TAC 2016
    Vlad Niculae, Kai Sun, Xilun Chen, Yao Cheng, Xinya Du, Esin Durmus, Arzoo Katiyar and Claire Cardie.
    Text Analysis Conference (TAC), 2016.
    [paper] [bib]

Published Datasets

Teaching

Industry Experience