Steve Murr
  • Code
  • Research
Home / Tags / Nvidia

Tags

Nvidia

Apr 2, 2026 10 min read
vlm inference apple-silicon

Best VLMs for 128GB on DGX Spark and M4 Mac

Qwen3-VL-32B is the best vision-language model for both DGX Spark and M4 Max at 128GB — outperforming the 72B predecessor on all benchmarks while using half the memory. The M4 Max's 2× bandwidth advantage makes it surprisingly faster for interactive inference.