Vision AI in real kitchens: what accuracy means (and what to ignore in vendor demos)

Hands, steam, and motion blur break lab demos. Here is how to evaluate vendors on confidence reporting, not slide decks.

Ask for event-level confidence states and example exports—not only top-1 accuracy on clean datasets.

Require multi-frame capture and a documented path for weight-only or uncertain_visual events.

Test with your plates, gloves, and lighting; a generic benchmark rarely transfers.

Prefer vendors who store failures for review over those who silently drop low-quality frames.