Is ChatGPT 3.5 Funnier than Humans?

In an article published in the journal PLOS One, researchers examined Chat generative pre-trained transformer (GPT) 3.5's ability to produce humor compared to humans. In two studies, human participants rated the funniness of jokes and satirical headlines generated by ChatGPT 3.5 and humans. The findings indicated that ChatGPT 3.5's humor was rated as equally funny or funnier than that produced by humans, regardless of the comedic task or the expertise of the human comedy writers.

Study: Is ChatGPT 3.5 Funnier than Humans? Image Credit: This image was created with the assistance of DALL·E 3

Study: Is ChatGPT 3.5 Funnier than Humans? Image Credit: This image was created with the assistance of DALL·E 3


The ability of large language models (LLMs) like OpenAI’s ChatGPT to generate humor is an intriguing and underexplored area. While humor necessitates a delicate balance of surprise and benignity, LLMs lack emotional perception, raising questions about their capacity for humor production. Existing research highlights ChatGPT’s wide-ranging competencies but also notes its tendency to present false information as fact. This issue is less critical in comedy, where accuracy is secondary to entertainment value.

Past studies have offered mixed and anecdotal evidence on ChatGPT's humor capabilities, lacking comprehensive, comparative evaluations. This paper addressed these gaps by systematically comparing the quality of jokes produced by ChatGPT 3.5 to those created by humans. Employing standardized comedic tasks and assessing humor through laypeople's evaluations, the study aimed to provide empirical insights into how LLM-generated humor stacked up against human creativity, potentially informing the entertainment industry and our understanding of artificial creativity.

ChatGPT 3.5 and Laypeople in Humor Production

In this study, ChatGPT 3.5's humor production abilities were compared to those of laypeople using three diverse tasks: acronym completion, fill-in-the-blank, and roast jokes. Participants from Amazon Mechanical Turk (MTurk) were recruited via 123 initially participated, though 18 were excluded for using external sources, resulting in a final sample of 105. Each participant generated humorous responses to nine prompts across the three tasks, yielding 945 human-produced jokes.

ChatGPT 3.5 was given the same tasks, producing 20 humorous responses per prompt, resulting in 180 artificial intelligence (AI)-generated jokes. The study then recruited 200 additional MTurk workers to rate the funniness of these responses. Each rater evaluated 54 jokes, 27 human-produced and 27 AI-produced, on a 7-point Likert scale, ensuring unbiased assessments by not disclosing the source of each joke.

The researchers aimed to empirically compare the quality of humor produced by ChatGPT 3.5 and humans. The tasks and rating procedures were pre-registered and approved by the University of Southern California Institutional Review Board, with all data and materials available on the Open Science Framework (OSF). The authors sought to provide systematic insights into ChatGPT 3.5's humor production capabilities compared to human creativity.

Comparative Analysis

AI-generated responses were rated funnier than human responses, with significant differences across all tasks. ChatGPT outperformed 73% of humans in the acronym task, 63% in fill-in-the-blank, and 87% in roast jokes. Additionally, 69.5% of participants preferred AI-generated humor. Variance analysis showed less agreement on AI-generated roast jokes, indicating mixed reactions.

Demographic factors did not significantly influence preferences, though right-leaning participants produced slightly less funny jokes. Despite lacking emotions, ChatGPT excelled in humor production, especially in aggressive roast jokes, challenging expectations of AI limitations in generating potentially offensive content.

ChatGPT 3.5 and The Onion in Satirical Humor

In this study, the authors compared ChatGPT 3.5's ability to produce satirical news headlines with those of professional comedy writers from The Onion, focusing on the local news section to ensure timeless and comparable topics.

A total of 217 students from the University of Southern California participated in the study. Participants rated the funniness of 10 headlines, five from The Onion and five from ChatGPT, on a seven-point scale, without knowing the source of each headline to prevent bias. ChatGPT was prompted to generate 20 new headlines in the style of The Onion’s ‘Local’ section.

The study was pre-registered, and materials, including the collected data and pre-registration details, were available on the Open Science Framework. The ethics of the study were approved by the University of Southern California Institutional Review Board. This study aimed to benchmark ChatGPT’s humor against professional standards within the comedic industry, providing insights into the AI's capability to produce satirical content.

Comparative Analysis 

Participants found no significant difference in funniness when comparing ChatGPT 3.5’s satirical headlines to those by professional writers at The Onion. The top four headlines included two from each source, with ChatGPT producing the highest-rated one. Variances in funniness ratings were statistically insignificant.

Participants who sought out comedy and read satirical news rated headlines as funnier overall, regardless of the source. While 48.8% preferred The Onion’s headlines, 36.9% favored ChatGPT’s, and 14.3% showed no preference. No evidence indicated that ChatGPT reproduced existing headlines. This study highlighted that ChatGPT’s humor is comparable to professional standards, suggesting significant economic implications for comedy writing. Future research should explore the use of LLMs in other comedy formats like script writing and meme generation.


In conclusion, the researchers found that ChatGPT 3.5 produced humor that was as funny or funnier than jokes from laypeople and professional comedy writers. This challenged the notion that emotional perception is necessary for humor creation. The findings highlighted ChatGPT's potential in the comedy industry and suggested future research on AI's understanding of humor and its practical applications in personal and professional contexts.

Journal reference:

Article Revisions

  • Jul 11 2024 - Featured image replaced with an image that was created with the assistance of DALL·E 3
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2024, July 10). Is ChatGPT 3.5 Funnier than Humans?. AZoAi. Retrieved on July 17, 2024 from

  • MLA

    Nandi, Soham. "Is ChatGPT 3.5 Funnier than Humans?". AZoAi. 17 July 2024. <>.

  • Chicago

    Nandi, Soham. "Is ChatGPT 3.5 Funnier than Humans?". AZoAi. (accessed July 17, 2024).

  • Harvard

    Nandi, Soham. 2024. Is ChatGPT 3.5 Funnier than Humans?. AZoAi, viewed 17 July 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Mitigating Semantic Drift in AI Language Models