PDF.
We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.


The internet can mostly die as far as I’m concerned. Just roll it back to file servers again, or something like gemspace. But being able to talk with people across cultures, borders freely is really important. It’s a tragedy that all these people will be hurt by the dystopification of the web. The new web needs to have a safe way to converse socially that is safe and easy enough to use for lay people. I have so much more to say on this, but real life is calling so I’ll leave it at this.
I don’t really get your point about markets though. I’m genuinely trying to understand, so bear with me. This is what I got from your post:
Our market has coexisted with an extremely fast global communication network for decades now. Given that the market feels like a quite organic thing, on what authority is the market not meant to coexists with the internet?
I think that internet access is restricted because of technological constraints, a technological lag in rolling out higher speed infrastructure, and a the lack of demand for that access which is driven by technological and practical constraint. Some complex function of those factors haha. Still, I don’t really know what you are trying to get across.
I’ll try to explain my thought.
The condition for markets to exist as self reproducing and self-stabilizing objects is government, usu. in the form of a state-entity, which itself is an economic actor that exists in competition with other states and in cooperation within free trade zones. Important note: government forms from market activity, specifically from the control of estates. Taxation is a form of rent, for example. I am not putting the state-before the market.
There is an interest for governments to:
Maximize economic output
To do so through cleverly tricking other economic actors outside of the own taxation system. I.e. trade agreements with built-in asymettries.
And to minimize damage to domestic production. Outsourcing can lead to cornerstones of the economy eroding.
Throw in the internet. We can now communicate and exchange with actors that are not in the same tax system. First and foremost this leads to issues with intellectual property. I’d cite geolocked internet radio stations and piracy. Japan doesn’t care about its citizens pirating manhwas, and vice-versa, Korea doesn’t care about anime piracy, and so on and so on. Then there is trade of physical objects. Say you need a laptop battery for your Linuxed MacBook M1 and a Chinese seller has batteries in stock that are cheaper and better than Apple’s own (happens rather frequently), with taxation at the border factored in you are still getting the most optimal deal. Some might find ways of circumventing customs which sweetens the pot further. Obviously there are issues to the domestic economy that can arise from this.
Trade speeds up and global supply chains gain importance as cross border communication speeds up. At the level of national governments there is a distinct threat presenting itself. There is less control over market activity leading to a speedup of the self-polluting nature of trade, in other words the boom and butts cycle shortens. As a national government you’d want to lengthen the boom and bust cycle as crises are the natural killer of states, along with expansionist nations.
Everything you are seeing, from Chat Control to China’s firewall are attempts to stabilize economies. The internet enables one to build structures that are wholly outside of state control. The state fails to direct the economy as planning starts happening between turfs. The internet due to its nation-decentralized function can aid in forming structures that oppose the state, should it falter.
Let’s not forget one of the biggest threats to the economy that is open source. Patents and DRM are threatened by the unstoppable pace of Blender, Open Office and co… It’s as if people said YOLO, let’s stop exchanging goods and services and at the same time solve very real and pressing issues, some of the biggest problems in fact. It works with much less friction than anything before, it exists as this hobbyist thing that we cannot call economical in any sense of the current understanding of the word and it would not exist if it wasn’t for the internet.
India and China have smartphone ownership rates of over 85%. There are no significant technological constraints if you are not someone who needs exorbitant download upload speed and low latency. The Chinese have pretty decent internet speeds, faster than most European countries. I also do not at all believe that there is a lack of demand for practical access. The internet is most generally a sensible thing to have access to no matter who you are.