
About me
Hi! I am Esin Durmus. I am a Research Scientist at Anthropic. Previously, I was a Postdoctoral Scholar at Stanford NLP group working with Tatsunori Hashimoto and Dan Jurafsky. I received my PhD from Cornell University where I was advised by Claire Cardie.
I am interested in understanding how language models may impact our society and how can we build models that are safe and helpful. In particular, my research interests include:
- Socio-technical alignment: I explore the question of what values are (and should be) incorporated into AI systems. I study mechanisms for incorporating more diverse values into AI development. Additionally, I build evaluation frameworks to assess how these values impact model behavior in real-world settings.
- Economic and Social Impact of AI systems: I'm interested in understanding how AI systems impact the economy, reshape our conception of work, and transform our society -- particularly how we derive meaning and purpose in a world where we increasingly incorporate these systems in our lives.
- Policy-relevant evaluations: I work on building policy-relevant evaluations on topics such as election-integrity, persuasion, political bias and misinformation. I work closely with our Policy and Safeguards teams in order to build reliable evaluations and improve the models.
- Evaluating consistency and faithfulness of generated text: I did a lot of work on building methods to assess the consistency and faithfulness of generated text, particularly in the context of summarization. I proposed evaluation frameworks and metrics to quantify the degree to which generated summaries accurately capture and convey the key information from the source material.
Selected Work
Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations
Kunal Handa, Alex Tamkin, Miles McCain, Saffron Huang, Esin Durmus, Sarah Heck, Jared Mueller, Jerry Hong, Stuart Ritchie, Tim Belonax, Kevin K Troy, Dario Amodei, Jared Kaplan, Jack Clark, Deep Ganguli
Preprint, 2024.
Evaluating Feature Steering: A Case Study in Mitigating Social Biases
Esin Durmus, Alex Tamkin, Jack Clark, Jerry Wei, Jonathan Marcus, Joshua Batson, Kunal Handa, Liane Lovitt, Meg Tong, Miles McCain, Oliver Rausch, Saffron Huang, Sam Bowman, Stuart Ritchie, Tom Hennighan, Deep Ganguli
Anthropic Blog Post, 2024.
Collective Constitutional AI: Aligning a Language Model with Public Input
Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, Deep Ganguli
In Proceedings of FAccT, 2024.
Measuring Model Persuasiveness
Esin Durmus, Liane Lovitt, Alex Tamkin, Stuart Ritchie, Jack Clark, Deep Ganguli
Anthropic Blog Post, 2024.
Cem Anil, Esin Durmus and other contributors from Anthropic, University of Toronto, Vector Institute, Constellation, Stanford and Harvard.
NeurIPS, 2024.
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus, Karina Nguyen, Thomas I Liao, Nicholas Schiefer, Amanda Askell, Anton Bakhtin, Carol Chen, Zac Hatfield-Dodds, Danny Hernandez, Nicholas Joseph, Liane Lovitt, Sam McCandlish, Orowa Sikder, Alex Tamkin, Janel Thamkul, Jared Kaplan, Jack Clark, Deep Ganguli
COLM, 2024.
Opportunities and Risks of LLMs for Scalable Deliberation with Polis
Christopher T Small, Ivan Vendrov, Esin Durmus, Hadjar Homaei, Elizabeth Barry, Julien Cornebise, Ted Suzman, Deep Ganguli, Colin Megill
Preprint, 2023.
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
Myra Cheng, Esin Durmus, Dan Jurafsky
In Proceedings of ACL, 2023.
Social Impact Award
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan
FAccT, 2023.
Benchmarking large language models for news summarization
Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B Hashimoto
TACL, 2024.
Evaluating Human-Language Model Interaction
Mina Lee, Megha Srivastava, Amelia Hardy, John Thickstun, Esin Durmus, Ashwin Paranjape, Ines Gerard-Ursin, Xiang Lisa Li, Faisal Ladhak, Frieda Rong, Rose E Wang, Minae Kwon, Joon Sung Park, Hancheng Cao, Tony Lee, Rishi Bommasani, Michael Bernstein, Percy Liang
TMLR, 2023.
Faisal Ladhak, Esin Durmus, He He, Claire Cardie, Kathleen McKeown
ACL, 2022.
Spurious Correlations in Reference-Free Evaluation of Text Generation
Esin Durmus, Faisal Ladhak, Tatsunori Hashimoto
ACL, 2022.
WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
Faisal Ladhak, Esin Durmus, Claire Cardie, Kathleen McKeown
EMNLP Findings, 2020.
Career
Feb 2023 - Present |
AnthropicResearch Scientist |
May 2021 - Feb 2023 |
Stanford NLP GroupPostdoc Worked with Tatsunori Hashimoto and Dan Jurafsky |
August 2015 - May 2021 |
Cornell UniversityCS PhD Advised by Claire Cardie |
Sep 2010 - May 2015 |
Koc UniversityUndergrad |
Contact
- Email: esin at company_name dot com
- Twitter: @esindurmusnlp
- LinkedIn: Esin Durmus