Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

fubarx@lemmy.world · 2 days ago

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

FireWire400@lemmy.world · edit-2 21 hours ago

Gemini 3 (Fast) got it right for me; it said that unless I wanna carry my car there it’s better to drive, and it suggested that I could use the car to carry cleaning supplies, too.

Edit: A locally run instance of Gemma 2 9B fails spectacularly; it completely disregards the first sentece and recommends that I walk.

Jolteon@lemmy.zip · 15 hours ago

You never know. The car wash may be out of order and you might need to wash your car by hand.

Saterz@lemmy.world · 19 hours ago

Well it is a 9B model after all. Self hosted models become a minimum “intelligent” at 16B parameters. For context the models ran in Google servers are close to 300B parameters models

Appoxo@lemmy.dbzer0.com · 14 hours ago

Any source for that info? Seems important to know and assert the quality, no?

Saterz@lemmy.world · edit-2 2 hours ago

Here:

https://www.sitepoint.com/local-llms-complete-guide/

https://www.hardware-corner.net/running-llms-locally-introduction/

https://travis.media/blog/ai-model-parameters-explained/

https://claude.ai/public/artifacts/0ecdfb83-807b-4481-8456-8605d48a356c

https://labelyourdata.com/articles/llm-fine-tuning/llm-model-size

https://medium.com/@prashantramnyc/understanding-parameters-context-size-tokens-temperature-shots-cot-prompts-gsm8k-mmlu-4bafa9566652

To find them it only required a web search using the query local llm parameters and number of params of cloud models on DuckDuckGo.

Edit: formatting

Appoxo@lemmy.dbzer0.com · 10 minutes ago

Appreciated. Very much appreciated!

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Opper