Esin Durmus

About me

Hi! I am Esin Durmus. I am a Research Scientist at Anthropic. Previously, I was a Postdoctoral Scholar at Stanford NLP group working with Tatsunori Hashimoto and Dan Jurafsky. I received my PhD from Cornell University where I was advised by Claire Cardie.

I am interested in understanding how language models may impact our society and how can we build models that are safe and helpful. In particular, my research interests include:

  • Socio-technical alignment: I explore the question of what values are (and should be) incorporated into AI systems. I study mechanisms for incorporating more diverse values into AI development. Additionally, I build evaluation frameworks to assess how these values impact model behavior in real-world settings.
  • Economic and Social Impact of AI systems: I'm interested in understanding how AI systems impact the economy, reshape our conception of work, and transform our society -- particularly how we derive meaning and purpose in a world where we increasingly incorporate these systems in our lives.
  • Policy-relevant evaluations: I work on building policy-relevant evaluations on topics such as election-integrity, persuasion, political bias and misinformation. I work closely with our Policy and Safeguards teams in order to build reliable evaluations and improve the models.
  • Evaluating consistency and faithfulness of generated text: I did a lot of work on building methods to assess the consistency and faithfulness of generated text, particularly in the context of summarization. I proposed evaluation frameworks and metrics to quantify the degree to which generated summaries accurately capture and convey the key information from the source material.

Selected Work

Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations

Kunal Handa, Alex Tamkin, Miles McCain, Saffron Huang, Esin Durmus, Sarah Heck, Jared Mueller, Jerry Hong, Stuart Ritchie, Tim Belonax, Kevin K Troy, Dario Amodei, Jared Kaplan, Jack Clark, Deep Ganguli

Preprint, 2024.

Evaluating Feature Steering: A Case Study in Mitigating Social Biases

Esin Durmus, Alex Tamkin, Jack Clark, Jerry Wei, Jonathan Marcus, Joshua Batson, Kunal Handa, Liane Lovitt, Meg Tong, Miles McCain, Oliver Rausch, Saffron Huang, Sam Bowman, Stuart Ritchie, Tom Hennighan, Deep Ganguli

Anthropic Blog Post, 2024.

Collective Constitutional AI: Aligning a Language Model with Public Input

Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, Deep Ganguli

In Proceedings of FAccT, 2024.

Measuring Model Persuasiveness

Esin Durmus, Liane Lovitt, Alex Tamkin, Stuart Ritchie, Jack Clark, Deep Ganguli

Anthropic Blog Post, 2024.

Many-shot Jailbreaking

Cem Anil, Esin Durmus and other contributors from Anthropic, University of Toronto, Vector Institute, Constellation, Stanford and Harvard.

NeurIPS, 2024.

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Esin Durmus, Karina Nguyen, Thomas I Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli

COLM, 2024.

Opportunities and Risks of LLMs for Scalable Deliberation with Polis

Christopher T Small, Ivan Vendrov, Esin Durmus, Hadjar Homaei, Elizabeth Barry, Julien Cornebise, Ted Suzman, Deep Ganguli, Colin Megill

Preprint, 2023.

Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models

Myra Cheng, Esin Durmus, Dan Jurafsky

In Proceedings of ACL, 2023.

Social Impact Award

Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale

Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan

FAccT, 2023.

Benchmarking large language models for news summarization

Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B Hashimoto

TACL, 2024.

Evaluating Human-Language Model Interaction

Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang

TMLR, 2023.

Faithful or Extractive? On Mitigating the Faithfulness-Abstractiveness Trade-off in Abstractive Summarization

Faisal Ladhak, Esin Durmus, He He, Claire Cardie, Kathleen McKeown

ACL, 2022.

Spurious Correlations in Reference-Free Evaluation of Text Generation

Esin Durmus, Faisal Ladhak, Tatsunori Hashimoto

ACL, 2022.

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

Faisal Ladhak, Esin Durmus, Claire Cardie, Kathleen McKeown

EMNLP Findings, 2020.

Datasets

LLM persuasiveness

Download

GlobalOpinionQA

Download

WikiLingua

Download

DDO (Debate.org) corpus

Download

Career

Feb 2023 - Present

Anthropic

Research Scientist

May 2021 - Feb 2023

Stanford NLP Group

Postdoc

Worked with Tatsunori Hashimoto and Dan Jurafsky

August 2015 - May 2021

Cornell University

CS PhD

Advised by Claire Cardie

Sep 2010 - May 2015

Koc University

Undergrad

Contact