Esin Durmus

About me

Hi! I am Esin Durmus. I am a Research Scientist at Anthropic. Previously, I was a Postdoctoral Scholar at Stanford NLP group working with Tatsunori Hashimoto and Dan Jurafsky. I received my PhD from Cornell University where I was advised by Claire Cardie.

I am interested in understanding how language models may impact our society and how can we build models that are safe and helpful. In particular, my research interests include:

  • Socio-technical alignment: I explore the question of what values are (and should be) incorporated into AI systems. I study mechanisms for incorporating more diverse values into AI development. Additionally, I build evaluation frameworks to assess real-world impact of these systems.
  • Policy-relevant evaluations: I work on building policy-relevant evaluations on topics such as election-integrity, persuasion, political bias and misinformation. I work closely with our Policy and Trust & Safety teams in order to build reliable evaluations and improve the models.
  • Evaluating consistency and faithfulness of generated text: I did a lot of work on building methods to assess the consistency and faithfulness of generated text, particularly in the context of summarization. I proposed evaluation frameworks and metrics to quantify the degree to which generated summaries accurately capture and convey the key information from the source material.

Selected Work

Measuring Model Persuasiveness

Esin Durmus, Liane Lovitt, Alex Tamkin, Stuart Ritchie, Jack Clark, Deep Ganguli

Blog Post, 2024.

Many-shot Jailbreaking

Cem Anil, Esin Durmus and other contributors from Anthropic, University of Toronto, Vector Institute, Constellation, Stanford and Harvard.

Preprint, 2024.

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Esin Durmus, Karina Nguyen, Thomas I Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli

Preprint, 2023.

Opportunities and Risks of LLMs for Scalable Deliberation with Polis

Christopher T Small, Ivan Vendrov, Esin Durmus, Hadjar Homaei, Elizabeth Barry, Julien Cornebise, Ted Suzman, Deep Ganguli, Colin Megill

Preprint, 2023.

Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models

Myra Cheng, Esin Durmus, Dan Jurafsky

In Proceedings of ACL, 2023.

Social Impact Award

Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale

Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan

In Proceedings of FAccT, 2023.

Benchmarking large language models for news summarization

Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B Hashimoto

In Proceedings of TACL, 2024.

Evaluating Human-Language Model Interaction

Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang

In Proceedings of TMLR, 2023.

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Faisal Ladhak, Esin Durmus, He He, Claire Cardie, Kathleen McKeown

In Proceedings of ACL, 2022.

Spurious Correlations in Reference-Free Evaluation of Text Generation

Esin Durmus, Faisal Ladhak, Tatsunori Hashimoto

In Proceedings of ACL, 2022.

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

Faisal Ladhak, Esin Durmus, Claire Cardie, Kathleen McKeown

EMNLP Findings, 2020.






DDO ( corpus