• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    9 hours ago

    MTT is just a pipe dream, last I checked. But Deepseek is actively being served, in mixed FP8/FP4, on racks of Huawei accelerators.

    I believe Baidu trained a model on them, too. But most training (like Deepseek’s) is still done on CUDA.


    …Also, be careful equating this stuff with any kind of “consumer friendly” hardware you or I could buy. That’s less likely. The Huawei accelerators (and other local Chinese hardware experiments) are geared towards huge servers serving requests in parallel.