Large-scale online deanonymization with LLMs

Beep@lemmus.org · 2 days ago

Large-scale online deanonymization with LLMs

🌞 Alexander Daychilde 🌞@lemmy.world · 5 hours ago

Too late for me, I’ve been Daychilde since 1996, didn’t keep it separate from my real name, and I’m on wikipedia, so it’s trivial to find me. lol.

The good is that I can report that it’s pretty safe to have an open identity. So far. heh

Silver Needle@lemmy.ca · 11 hours ago

I call BS. We’ll see false positives go through the roof. Just another tool to arbitrarily harass opponents.

Goodman@discuss.tchncs.de · 12 hours ago

Everyday the internet gets a little worse. I hate it here in this technological hellscape. I have more to say, but this bullshit makes me so so tired. Goodnight.

🌞 Alexander Daychilde 🌞@lemmy.world · 5 hours ago

Shut up, Anthony.

(in case your name happens to actually be Anthony, I did pick it at pseudo-random jsut for a stupid joke!)

Goodman@discuss.tchncs.de · 2 hours ago

It took me an embarrassingly long to get the joke. I forgot what the original thread was.

BlindFrog@lemmy.world · 3 hours ago

Lmao, you are now Peter

Silver Needle@lemmy.ca · 11 hours ago

Don’t hate the technology. It’s great. Just how people organize themselves around technology is not up to date. Markets are not meant to coexist with an extremely fast global communication network that everyone can access, why do you think economies restrict internet access?

Let the internet as a social activity die. It’s got to in order to be reborn haha

Goodman@discuss.tchncs.de · 2 hours ago

The internet can mostly die as far as I’m concerned. Just roll it back to file servers again, or something like gemspace. But being able to talk with people across cultures, borders freely is really important. It’s a tragedy that all these people will be hurt by the dystopification of the web. The new web needs to have a safe way to converse socially that is safe and easy enough to use for lay people. I have so much more to say on this, but real life is calling so I’ll leave it at this.

I don’t really get your point about markets though. I’m genuinely trying to understand, so bear with me. This is what I got from your post:

Our market has coexisted with an extremely fast global communication network for decades now. Given that the market feels like a quite organic thing, on what authority is the market not meant to coexists with the internet?

I think that internet access is restricted because of technological constraints, a technological lag in rolling out higher speed infrastructure, and a the lack of demand for that access which is driven by technological and practical constraint. Some complex function of those factors haha. Still, I don’t really know what you are trying to get across.

thedeadwalking4242@lemmy.world · 10 hours ago

There’s no quality of an LLM that would make this possible. It’s just more hallucinations and poor tool use.

thinkercharmercoderfarmer@slrpnk.net · edit-2 6 hours ago

Why not? if LLMs are good at predicting mean outcomes for the next symbol in a string, and humans have idiosyncrasies that deviate from that mean in a predictable way, I don’t see why you couldn’t detect and correlate certain language features that map to a specific user. You could use things like word choice, punctuation, slang, common misspellings, sentence structure… For example, I started with a contradicting question, I used “idiosyncrasies”, I wrote “LLMs” without an apostrophe, “language features” is a term of art, as is “map” as a verb, etc. None of these are indicative on their own, but unless people are taking exceptional care to either hyper-normalize their style, or explicitly spiking their language with confounding elements, I don’t see why an LLM wouldn’t be useful for this kind of espionage.

I wonder if this will have a homogenizing effect on the anonymous web. It might become an accepted practice to communicate in a highly formalized style to make this kind of style fingerprinting harder.

MonkderVierte@lemmy.zip · 17 hours ago

Good thing is, my writing style changes with my mood.

doug@lemmy.today · edit-2 2 days ago

I think it was a Reddit scraper years ago that taught me that I should probably lie more often on the internet about my work, friends, family details, etc.

Just like, little lies that don’t really matter in the comment, but would misdirect an AI or investigator into things that aren’t true.

It’s just so much woooooork to think about this shit. And to come up with different screen names everywhere? And to like, sub to a city I don’t live in and comment there about shit I know nothing about? Exhausting.

Thankfully my brothers and three uncles are here to support me. And my alligator.

Anarki_@lemmy.blahaj.zone · 9 hours ago

Oh hey my dearest friend. Say, did you end up moving to Perth or was that just a thought outloud? Well if you’re ever in the area let me know and we can meet up at that restaurant we enjoyed so much!

xoxo

SuspciousCarrot78@lemmy.world · 1 day ago

Oh - you mean Gustav, Bernhardt, Daffid and Chompy? How are things in Ulaanbaatar any way?

(you’re welcome)

frongt@lemmy.zip · 2 days ago

Aha! By posting this comment, I know you don’t have an alligator!

P1nkman@lemmy.world · 2 days ago

But I do! I know they’re illegal in Denmark, but they seem to love the snow!

DrunkenPirate@feddit.org · 2 days ago

That’s funny I do as well. Unfortunately, I flush my alligator in my toilet down into the harbor I live. Now, I bought a green parot. My three sisters love it.

FoxyFerengi@startrek.website · 1 day ago

I’ve heard they can at least survive a fall onto snow lol

Deacon@lemmy.world · 1 day ago

I call it salting and I do it religiously.

Or do I?

Jakeroxs@sh.itjust.works · 10 hours ago

Haha perfect username too

Deacon@lemmy.world · 10 hours ago

Ah my namesake and fellow gandy dancer.

surewhynotlem@lemmy.world · 1 day ago

The trick is to pick someone else’s identity and use that. I’m Dale from Ohio.

papertowels@mander.xyz · 1 day ago

Rusty shackleford, checking in

MrQuallzin@lemmy.world · 1 day ago

Mom said it’s my turn to be Dale!

stickly@lemmy.world · edit-2 19 hours ago

The solution is simple, just launder each comment through an LLM to fudge the style and details a bit

Edit, tried it for fun:

lowkey just run every comment through an llm and let it switch up the words and details a bit so it dosnt sound like you wrote it

Insekticus@aussie.zone · 2 days ago

Yeah exactly, like if youre 25, say youre 27. Then in another post 24. Youre still around that age, but the exact age is muddied in the waters.

You can also use Americanized spelling in some sentences and or if you’re American, use British English, and become Unamericanised. Say you’re a half-Brit half-American dual citizen even though you’re from South Africa or something.

MountingSuspicion@reddthat.com · 2 days ago

I feel like that may be worse. Kind of like how if you have certain security measures while browsing the web it’s almost easier to fingerprint you. It’ll get a good idea of your age and that’ll be enough rather than sticking to a specific lie. Just always be 3 years older with one additional sibling or a sibling of the opposite sex. If the sex of your sibling is relevant just describe them as a close family friend or close cousin in that instance. I can’t say for sure, but if I had to guess having a static lie is maybe more obfuscation than a variable one. Though even posting on this thread is bad opsec.

couldhavebeenyou@lemmy.zip · 2 days ago

Maybe get an AI agent to post misdirections

Bruncvik@lemmy.world · 1 day ago

I have an account where I only post after I translated my writing through three different languages and back to English. The original input and the output convey the same message, but have very distinct styles. Randomizing the three languages in my translation sequence introduces enough variety that I doubt current LLM’s can identify me. (Full disclosure: I don’t post any sensitive information under any account; I do it just for fun.)

Old Jimmy Twodicks@sh.itjust.works · 2 days ago

XLE@piefed.social · 2 days ago

The doxxing efforts will be funded by venture capital.

What can LLM providers do? Refusal guardrails and usage monitoring can help, but both have significant limitations. Our deanonymization framework splits an attack into seemingly benign tasks – summarizing profiles, computing embeddings, ranking candidates – that individually look like normal usage, making misuse hard to detect. Refusals can be bypassed through task decomposition.

“Guardrails” are a joke and we all know Sam Altman and Elon Musk care about ethics as much as they care about not abusing their siblings or employees.

CerebralHawks@lemmy.dbzer0.com · 2 days ago

It is absolutely possible to identify users who post a lot on a public forum with a real name (e.g. Facebook or the like) as well as Reddit. So say you have some politician who claims to have X, Y, Z values and a Reddit user who has A, B, and C values that are antonymous to X, Y, and Z. By comparing common phrases, as well as by charting when the two seemingly separate users are online, you could say with reasonable certainty that the two people are one and the same, especially if you prompt them carefully to say the kinds of things they would say about neutral topics on both accounts. It would be hard to get 100% certainty, but you’d be close enough to imply it’s them.

AIs (LLMs) just make it faster.

Don’t post about controversial politics if you also post under your real name. It’s not a matter of “mask yourself better.” There will always be tells.

LwL@lemmy.world · 20 hours ago

I’ve always acted assuming this to be possible, but it used to require either an unhinged individual or some other reason for a very dedicated investigation. The barrier being potentially that much lower is scary, particularly for anyone with a bit of internet fame that would rather stay anonymous

Supervisor194@lemmy.world · 1 day ago

I’ve never once posted on the Internet using a real name. I’ve never been a member of any social anything other than Reddit and Lemmy. I only even found Reddit because an IRC link aggregator I used to browse for news/memes went tits up.

Iconoclast@feddit.uk · 2 days ago

For the past 10 years or so I’ve pretty much lived under the assumption that at some point someone figures out a system that digs through the entire internet and everything anyone has ever posted gets linked back to them.

At the same time, it’s both great and absolutely horrifying.

What’s horrifying is that everything you’ve ever posted gets linked back to you.

What’s great is that none of it can really be used against you anymore - because we now know that absolutely everyone is a massive hypocrite and nobody is without sin.

Silver Needle@lemmy.ca · edit-2 11 hours ago

That’ll never work. The internet is messy like a jungle, I might find bird crap somewhere but it will not get me the bird. I might find a turned leaf, but what turned the leaf will never be known to me. All despite me being able to reason and investigate phenomena that occur.

I view all things like particle systems: There are general trends, sometimes we can observe how single particles travel and we can derive rules from their behavior. Yet we are never able to see everything at full resolution, let alone know everyone in the way the “evil” “AI” thought experiments portray all knowing bots. What people say about Palantir is very similar falls into the category of we-don’t-know-the-rest-of-it.

No use going paranoid over preliminary results from a tool we readily use but don’t fully comprehend the limitations of (in the meaning of: we don’t know how shitty and unreliable they are in actuality).

Scrollone@feddit.it · 1 day ago

I mean, there’s even a website (don’t remember the name) that lets you upload a photo of a person and it will show all pictures of that person that are on the web.

Like a Google search but for your face. Super creepy.

KnitWit@lemmy.world · 1 day ago

The Private Eye by Brian K Vaughn used that as a premise (set in 2076) for a comic run about a decade ago.

Jrockwar@feddit.uk · 2 days ago

Some really good advice that someone gave me once is that the internet doesn’t exist.

Sure, it obviously does exist, but this was about communication style. When you send an email, you change codes and don’t write in the same way as a WhatsApp - you can expand your points more… But you should never forget you’re talking to a person - just because it’s internet, you shouldn’t talk any different to them.

You shouldn’t assume that the message is anonymous just because it’s internet. You shouldn’t assume certain things are okay “just because it’s internet”.

I don’t think they were 100% right because they were disregarding that code changing between different mediums and audiences is normal (you don’t talk the same way to your boss and your partner, or in written form vs spoken), but I do stand by the point that you shouldn’t change code or make assumptions just because “internet”.

krashmo@lemmy.world · 1 day ago

Seems like we could all just mellow out a bit. You shouldn’t need to be afraid of saying stuff that isn’t perfectly pc now or in the past. Obviously there’s a difference between an off color joke and shit you would find in the Epstein files but I’m not particularly concerned about anything I’ve posted coming back to me. I’ve had bad takes (I’m sure I still do) and said things in the past that I no longer agree with, but who cares? That’s what life is like. You change over time in more ways than one. If someone wants to judge me harshly for that then we probably don’t weren’t going to hang out anyway so fuck em. Let them react how they want.

That being said, the implications of this kind of technology being used by corporations or the government are quite different. There may be value in what you’re saying from that perspective.

MalReynolds@slrpnk.net · 2 days ago

So, pretty much what Meta/Facebook (and the three letter agencies / GovInt) has been doing with deterministic code (like they’re not scraping reddit et.al, including Lemmy) for ages but probabilistic with more errors and new improved hallucination.

Competition, filling in gaps or just looking to be bought out. Evil.