Yeah, it’s what they do. Generate convincing text. Calling it “errors” makes as much sense as claiming my dice “produced errors” when I lost at yahtzee.
"First, I’d created 2000 free-text responses and labelled them ‘UK’. Then I copied and pasted the exact same 2000 responses but labelled these ‘US’. Finally, I combined them to create a dataset of 4000 total responses, and jumbled them up.
Despite the responses being identical for the UK and US, Copilot produced a rich, detailed summary of how US and UK respondents differed."
Yeah, it’s what they do. Generate convincing text. Calling it “errors” makes as much sense as claiming my dice “produced errors” when I lost at yahtzee.
An illustrative example: https://kucharski.substack.com/p/real-signals-or-artificial-stereotypes
"First, I’d created 2000 free-text responses and labelled them ‘UK’. Then I copied and pasted the exact same 2000 responses but labelled these ‘US’. Finally, I combined them to create a dataset of 4000 total responses, and jumbled them up.
Despite the responses being identical for the UK and US, Copilot produced a rich, detailed summary of how US and UK respondents differed."