Meta releases diverse video dataset to improve inclusivity of AI models

Meta has launched a new dataset of face-to-face video clips to help AI researchers make their models more inclusive. The Casual Conversations v2 dataset includes 26,467 video monologues from 5,567 paid participants across seven countries, with accompanying speech, visual and demographic attribute data. It features 11 self-provided and annotated categories to enable algorithmic fairness and robustness in AI systems. Consent-driven, the dataset was informed by a literature review and consultation with civil rights experts to ensure it was maximally inclusive. It is the first open source dataset with videos collected from multiple countries using detailed demographic information to test AI models. The dataset will help address concerns over language barriers and physical diversity that have been problematic for some AI contexts.