• 1 Post
  • 641 Comments
Joined 3 years ago
cake
Cake day: July 5th, 2023

help-circle


  • It’s called decoding and encoding.

    But the big data centers doing all the video processing for the big video services (including both permanent videos from a library and things like live streaming) are encoding the videos with settings that require less computational power to decode. The idea is to be able to let even old budget smartphones still be able to display the video with very low power requirements on the client device. There’s no universe where consumers decoding digital video will be a high-power computational task.

    Restaurants have sharp knives in the kitchen, but generally serve food that requires only minimal cutting effort from the table knives set out with the rest of the table settings. Dining will always be easier than cooking, by a margin that makes the difficulty of dining not worth mentioning, so it would be bizarre to criticize a knife as being only good for cooking and eating food, when plenty of dining tableware knives out there would be insufficient for kitchen work.

    You’ve made the mistake of lumping decoding and encoding together based on the algorithmic/mathematical similarity of those tasks, when everyone else is more inclined to discuss the very different end user use cases of those computing needs.


  • I don’t think government funding can actually offset the crash in consumer and business demand being insufficient to cover the cost of the most expensive models on the most expensive GPUs. But if you look through my comment history I’ve made the comparison to supersonic flight, because I genuinely believe there’s a possibility that governments fund the expensive branch of this technology for their own military or surveillance or law enforcement purposes without the benefits necessarily actually spilling out into normal commercial applications.

    We’ve hit the point where training a model (both pre training and post training) isn’t the expensive part, and the expensive part is actual inference, which makes it hard to scale the most expensive models to where it’s useful for a lot of people. So it might be that the companies and governments that can afford to operate an expensive model might be the only ones to do it. And they’ll be able to, without necessarily the public being able to have access to the same tech.











  • The only solution is to make sure they can’t read data you don’t want shared.

    Isn’t that the appropriate guardrail, then? LLM chats and agents and whatever need to be contained with external permissions settings that the LLMs simply do not and can never have the power to override.

    In a normal customer service setting with human agents, there are still plenty of examples of what a human agent simply doesn’t have the power to do. Often, they’ll need to escalate to a manager to do things like process refunds not just because they weren’t given social permission to do so, but because they weren’t given technical permissions to do so. LLM agents need to be contained in the same way. Any decent use of agents, human or software, requires carefully designed processes and permissions extrinsic to that agent’s own decisionmaking abilities to make sure that agents don’t do something bad for the company.




  • Yeah, the smarter way to use LLM-based agents is carefully defined tasks. Mozilla describes their vulnerability assessment processes in this blog post.

    Mozilla describes the process they’ve used: building a harness that instructs a model to find a specific category of vulnerability on a specific interface, and then write up its findings. It’s a narrow enough context that the model gets specific instructions, and a simple definition of success, and it sets up many such tasks that can be fed into the existing process for verifying and triaging bugs. Note that the output for this LLM pipeline basically feeds into the same interface for accepting bug reports from the public, or from their human contributors within the project.

    There’s a couple of takeaways here, too:

    • This pipeline is model agnostic. Mozilla set it up before Mythos was released, and its description of other models (Opus 4.7, Codex) confirms that Mythos is better but not a true game changer. The ability to swap out other models provides some assurance that the work done to develop the pipeline will be useful when cheaper or better models come along, or when a model becomes unavailable (like when a provider decides a particular model is too expensive to run, or a provider goes under).
    • The increase in automated output (and presumably automation-assisted contributions from the public) has given the humans more work to do. Automation in this context actually increases the demand for human labor.
    • Other projects will need to develop their own custom pipelines, specific to their project, to get good results from LLM based agents.

    There are ways to use these tools, but none of it really seems like a truly revolutionary/disruptive change to how large projects are managed.