Molmo and PixMo
Iscriviti gratuitamente
Ascolta questo episodio e molti altri. Goditi i migliori podcast su Spreaker!
Scarica e ascolta ovunque
Scarica i tuoi episodi preferiti e goditi l'ascolto, ovunque tu sia! Iscriviti o accedi ora per ascoltare offline.
Descrizione
🔓 Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models This research paper introduces Molmo, a new family of vision-language models (VLMs) that surpasses existing open-weight models...
mostra di piùThis research paper introduces Molmo, a new family of vision-language models (VLMs) that surpasses existing open-weight models in performance while maintaining open weights, data, and code. The key innovation is the collection of a large, detailed image caption dataset using speech-based descriptions, avoiding reliance on synthetic data generated by proprietary VLMs. Molmo is trained on this dataset, along with a diverse mixture of fine-tuning datasets, to achieve state-of-the-art performance on multiple academic benchmarks and human evaluation, even compared to proprietary systems like GPT-4o. The paper emphasizes the importance of open research and provides a comprehensive overview of the model architecture, data collection methods, training process, and evaluation results.
📎 Link to paper
🟣 Try their demo
Informazioni
Autore | Shahriar Shariati |
Organizzazione | Shahriar Shariati |
Sito | - |
Tag |
Copyright 2024 - Spreaker Inc. an iHeartMedia Company