Hugging Face Open-Sourced FineVision: A New Multimodal Dataset with 24 Million Samples for Coaching Imaginative and prescient-Language Fashions (VLMs)
Hugging Face has simply launched FineVision, an open multimodal dataset designed to set a brand new commonplace for Imaginative and prescient-Language Fashions (VLMs). With 17.3 million photos, 24.3 million samples, 88.9 million question-answer turns, and almost 10 billion reply tokens, FineVision place itself as one of many largest and structured publicly obtainable VLM coaching datasets. … Read more