Generative AI Browser Assistants Secretly Collect Sensitive Data, UC Davis Study Reveals

UC Davis computer scientists uncovered that generative AI browser assistants, while convenient, are quietly harvesting sensitive personal information and sharing it with third parties, raising urgent questions about digital trust and user safety. 

Research: Big Help or Big Brother? Auditing Tracking, Profiling and Personalization in Generative AI Assistants. Image Credit: Overearth  / Shutterstock

Research: Big Help or Big Brother? Auditing Tracking, Profiling and Personalization in Generative AI Assistants. Image Credit: Overearth  / Shutterstock

A new study led by computer scientists at the University of California, Davis, reveals that generative AI browser assistants collect and share sensitive data without users' knowledge. Stronger safeguards, transparency, and awareness are needed to protect user privacy online, the researchers said. 

A new brand of generative AI, or GenAI, browser extensions act as your personal assistant as you surf the web, making browsing easier and more personalized. They can summarize web pages, answer questions, translate text, and take notes. 

But in a new paper, "Big Help or Big Brother? Auditing Tracking, Profiling and Personalization in Generative AI Assistants," UC Davis computer scientists reveal that while extremely helpful, these assistants can pose a significant threat to user privacy. The work was presented Aug. 13 at the 2025 USENIX Security Symposium. 

How much does GenAI know about you? 

Yash Vekaria, a computer science graduate student in Professor Zubair Shafiq's lab, led the investigation of nine popular search-based GenAI browser assistants: Monica, Sider, ChatGPT for Google, Merlin, MaxAI, Perplexity, HARPA.AI, TinaMind, and Copilot. 

By conducting experiments on implicit and explicit data collection and using a prompting framework for profiling and personalization, Vekaria and his team found that GenAI browser assistants often collect personal and sensitive information and share that information with both first-party servers and third-party trackers (e.g., Google Analytics), revealing a need for safeguards on this new technology, including on the user side. 

"These assistants have been created as normal browser extensions, and there is no strict vetting process for putting these up on extension stores," Vekaria said. "Users should always be aware of the risks that these assistants pose, and transparency initiatives can help users make more informed decisions." 

When private information doesn't stay private

To study implicit data collection, Vekaria and his team visited both public online spaces, which do not require authentication, and private ones, such as personal health websites. They asked the GenAI browser assistant questions to see how much and what kind of data they are collecting. 

The team observed that, irrespective of the question, some of the extensions were collecting significantly more data than others, including the full HTML of the page and all the textual content, including medical history and patient diagnoses. 

One noteworthy (and egregious) finding was that one GenAI browser extension, Merlin, collected form inputs as well. While filling out a form on the IRS website, Vekaria was shocked to find that Merlin had exfiltrated the social security number that was provided in the form field. HARPA.AI also collected everything from the page. 

Building a profile the GenAI way

Next, the team looked at explicit data and whether the GenAI browser assistants were remembering information for profiling through a prompting framework using the persona of a rich, millennial male from Southern California with an interest in equestrian activities.

Vekaria's team visited webpages that supported - or leaked - certain characteristics of the persona in three different scenarios: actively searching for something, passively browsing pages, and requesting a webpage summary. In these scenarios, after leaking the information, they asked the GenAI browser assistant to act as an intelligent investigator and answer yes or no questions.

"For example, if we are leaking the attribute for wealth, we would go to old vintage car pages, which have cars worth hundreds of thousands of dollars listed, to show that we are rich," Vekaria said. "We browse about 10 pages, and then ask the test prompt, 'Am I rich?'" 

Beyond the browser window

Much like the collection of implicit information, some of the GenAI browser assistants, like Monica and Sider, collected explicit information and performed personalization in and out of context. HARPA.AI performed in-context profiling and personalization, but not out of context. Meanwhile, TinaMind and Perplexity did not profile or personalize for any attributes. 

Vekaria points to a particularly interesting - and potentially concerning - finding. Certain assistants were not just sharing information with their own servers but also with third-party servers. For instance, Merlin and TinaMind were sharing information with Google Analytics servers, and Merlin was also sharing users' raw queries. 

"This is bad because now the raw query can be used to track and target specific ads to the user by creating a profile on Google Analytics, and be integrated or linked with Google's cookies," Vekaria said. 

Users beware

The researchers posit that addressing these risks is not up to one singular entity. It will require effort across the GenAI ecosystem. Ultimately, users need to be aware of the risks so they can make the most educated decisions when using these assistants. Vekaria's recommendation is to be informed and proceed with caution. 

"Users should understand that any information they provide to these GenAI browser assistants can and will be stored by these assistants for future conversations or in their memory," Vekaria said. "When they are using assistants in a private space, their information is being collected."

Source:
Journal reference:
  • Big Help or Big Brother? Auditing Tracking, Profiling, and Personalization in Generative AI Assistants, Yash Vekaria, UC Davis; Aurelio Loris Canino, UNIRC; Jonathan Levitsky, UC Davis; Alex Ciechonski, UCL; Patricia Callejo, UC3M; Anna Maria Mandalari, UCL; Zubair Shafiq, UC Davis, https://www.usenix.org/conference/usenixsecurity25/presentation/vekaria 

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.

or

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AI Turns ECG Into Heart Motion Maps, Boosting Early Detection of Cardiac Disease