AI and Personal Data: Does GPT-3 Know Anything About Me?

Written by hacker4446008 | Published 2023/03/09
Tech Story Tags: ai | personal-data | gpt-3 | artificial-intelligence | digital-identity | data-privacy | data-protection | technology

TLDRI built a site that lets you keep track of what different Large Language Models know about you. Are you important enough to have been encoded? Get an email when you’re added to AI models or let the Big Tech giants know you want to opt out. Try it now at [https://[haveibeenencoded.com]via the TL;DR App

I built a site that lets you keep track of what different Large Language Models know about you. Are you important enough to have been encoded? Get an email when you’re added to AI models or let the Big Tech giants know you want to opt-out.

It all started with an innocent ChatGPT question about the company I co-founded.

Despite being butt-hurt about not being important enough to be encoded, I was intrigued by the nature of how Large Language Models encode information and generate output probabilistically.

I immediately wanted to know three things:

  1. Is my name encoded at all or when will it be encoded?

  2. Do I actually want to be encoded?

  3. Can I somehow opt-out in case I don’t want to be encoded?

Am I encoded and will I ever be?

Although my name did not occur often enough in the training data to just answer a direct question about me, GPT-3 is still able to output my name when prompted with the right questions. It’s obvious that we will all be encoded as the model parameter sizes grow, so for me, the interesting question is WHEN will be my turn?

I polled my immediate network to see if anyone else was vain enough to ask ChatGPT about themselves and turns out it’s totally becoming the new “googling yourself”. Every fifth person who has tried to use ChatGPT had asked about themselves.

Since it wasn’t just me, I decided to build a service dedicated to the regular polling of the OpenAI GPT-3 API and make it available for anyone at haveibeenencoded.com. I’m sure many of you will recognize the inspiration for the name haveibeenpwned.com.

I also stumbled upon a site called haveibeentrained.com, which focuses on visual media and more specifically recent advances in stable diffusion. It allows artists to both search for their work being used in AI training data and to sign up to indicate that they do not consent.

Seeing that someone came up with a similar solution for artwork it became obvious that it was something I wanted to build.

I want to help individuals keep track of what AI models know about them (including Personally Identifiable Data) and support their efforts to requesting Big Tech companies to NOT include their data.

Why? Read on…

Do I want to be encoded?

Playing around with LLM you learn quickly how creative these models get. For example, ChatGPT knows that I am a person in the Estonian startup and technology sector, but will attribute all kinds of companies to my name despite the lack of truth behind it.

Since LLMs are incorporated into search engines as we speak, things will stop being fun when there is a risk that people start taking some of these creative outputs seriously.

In fact, an LLM itself says it best:

The implications of googling yourself with language models like OpenAI’s GPT-3 can be significant. These models are incredibly powerful and can understand and generate human-like text, so when you google yourself with GPT-3, you may find information that appears to be written by a human but was actually generated by the model. This can include false or misleading information that could harm your reputation or cause confusion.

Another implication is that GPT-3 and other language models can generate information at a scale and speed that can be difficult for humans to keep up with, potentially leading to information overload and difficulties in determining what is accurate and what is not. Additionally, because language models can generate information on any topic, there is a risk of encountering content that is inappropriate or offensive, which could be harmful to your well-being.

In conclusion, while googling with language models like GPT-3 can be interesting and provide a lot of information, it is important to be cautious about the information you find and take steps to verify its accuracy.

I’m starting with OpenAI’s GPT3 with an official API available and to illustrate how creative these models can get, but you’ll get an email when other models are added or their responses about you change.

Can I opt out?

At this point in time I’m not aware of any “easy” means to opt out as this technology is new. In the European Union, for example, citizens have the right to be “forgotten” and can request certain PII data to be removed from search engines. It is currently unclear how these laws will apply in the context of generative models, but we better sort this out before things get out of control.

The number of LLMs popping up will be substantial and it won’t make sense to request each one of them to NOT include your data, so I’ll figure out the exact required legal steps and automation to make this super easy for anyone. I’ve already reached out to my contacts at Google, Stability.ai, and Microsoft to make sure we start laying the path.

If this sounds good to you, you can sign up below and make sure tp indicate if you just want to know when your name gets added to AI models or if you actually want us to reach out to the Big Tech companies to request your data to be removed.

Disclaimer:

These are early steps in my path and HaveIBeenEncoded is using a set of rather conservative parameters to generate more confident outputs from the model. Don’t be offended if you’re important and the models say they do not know you. It’s only a question of time, they will eventually. If you want them to that is.

Sign up to find out when it happens and try it now!


Originally published on my personal blog.


Written by hacker4446008 | 🚧 enterpreneural coding monkey 🗨️ 1st researcher at @Skype, put neuralnets to 300M computers
Published by HackerNoon on 2023/03/09