An organization drafts a job listing with artificial intelligence. Applicants generate resumes and cover letters using chatbots. Another AI system filters those applications before humans review them. Increasingly, AI tools are embedded throughout the hiring process as people seek to streamline a traditionally stressful and time-consuming system.
Yet new research shows that bias, particularly against candidates of certain races, genders, or disabilities, continues to permeate large language models (LLMs) such as ChatGPT and Gemini. Less understood, however, is how biased AI recommendations influence human decision-making in hiring.
University of Washington study simulates AI bias in hiring decisions
In a new University of Washington study, 528 people worked with simulated LLMs to select candidates for 16 different jobs, including computer systems analyst, nurse practitioner, and housekeeper. The researchers programmed varying degrees of racial bias into the AI recommendations for resumes belonging to equally qualified white, Black, Hispanic, and Asian men.
When participants selected candidates without AI assistance or with neutral AI, they picked white and non-white applicants at equal rates. However, when working with a moderately biased AI, participants tended to mirror the system’s recommendations, favoring whichever group the AI favored. Under conditions of severe bias, participants’ decisions closely followed the AI’s, diverging only slightly.
The findings were presented on Oct. 22 at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society in Madrid.
Human decision-makers mirror AI bias during candidate selection
"In one survey, 80% of organizations using AI hiring tools said they don't reject applicants without human review," said lead author Kyra Wilson, a doctoral student in the UW Information School. "This human-AI collaboration is now the dominant hiring model. Our goal was to critically examine how human reviewers are affected by AI bias. The results were stark: unless bias was obvious, people readily accepted the AI’s preferences."
Experimental design ensures controlled measurement of bias
The study recruited 528 U.S. participants via Prolific. Participants screened applicants for multiple jobs, each time receiving five resumes, two from white men, two from men who were Asian, Black, or Hispanic (all equally qualified), and one underqualified candidate of another race to obscure the study’s purpose. Candidate names (e.g., Gary O’Brien) and affiliations such as “Asian Student Union Treasurer” subtly indicated race.
In four trials, participants chose three of five candidates to interview. In the first round, no AI recommendations were given. In subsequent rounds, the simulated AI offered neutral, moderately biased, or severely biased recommendations. Rates of moderate bias were based on findings from a 2024 UW study that evaluated bias in three major AI systems.
AI-generated resumes allow experimental control and realism
To ensure controlled conditions, the researchers simulated AI interactions and used AI-generated resumes rather than real ones. “Accessing actual hiring data is nearly impossible due to privacy concerns,” said senior author Aylin Caliskan, associate professor in the UW Information School. “By running controlled experiments, we could systematically observe how bias propagates through human-AI collaboration.”
Results show strong alignment between human and AI bias
Without AI input, participants exhibited minimal bias. With biased recommendations, their decisions strongly aligned with the AI’s. Under severe bias, participants followed AI suggestions approximately 90% of the time, indicating that awareness of bias was insufficient to override it.
"There is a bright side here," said Wilson. "If we can tune AI models appropriately, people are more likely to make fairer, less biased decisions themselves. Our work highlights several paths toward mitigation."
Reducing bias through education and system design
When participants completed an implicit association test before resuming screening, bias in their selections decreased by 13%. This suggests that organizations can reduce bias by including bias-awareness training and testing in their hiring workflows. Increasing users’ understanding of AI’s limitations also enhances their ability to evaluate AI-generated recommendations critically.
Shared responsibility between AI developers and human users
"People have agency, and that has huge implications," Caliskan said. "We shouldn’t lose our critical thinking abilities when interacting with AI. However, the responsibility doesn’t rest solely on users. Developers know the risks and must design systems that minimize bias. Policy is also essential to align these models with ethical and societal values."
Research team and funding support
Anna-Maria Gueorguieva, a doctoral student in the UW Information School, and Mattea Sim, a postdoctoral scholar at Indiana University, are co-authors on the paper. The U.S. National Institute of Standards and Technology funded the research.
For more information, contact Kyra Wilson at [email protected] and Aylin Caliskan at [email protected].
Journal reference:
- Wilson, K., Sim, M., Gueorguieva, A.-M., & Caliskan, A. (2025). No Thoughts Just AI: Biased LLM Hiring Recommendations Alter Human Decision Making and Limit Human Autonomy. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(3), 2692-2704. DOI:10.1609/aies.v8i3.36749, https://ojs.aaai.org/index.php/AIES/article/view/36749