Vision: A Culturally-Aware Multimodal AI

Vansh Kumar

Vision: A Culturally-Aware Multimodal AI

This paper introduces Vision, a novel 175-billion parameter multimodal AI model.Vision is trained from scratch to natively understand text, images, video, and audioand to generate text and images, setting it apart from existing models. Developedwith a focus on incorporating Indian context, values, and culture, Vision aims to em-power users with a culturally relevant AI experience. A unique security feature allowsgenerated images to be backtracked to Vision, mitigating concerns about potential mis-use for misinformation. Evaluations on standard benchmarks demonstrate that Visionachieves state-of-the-art performance in a diverse range of tasks, including reasoning,solving mathematical problems, code generation, and image understanding. Further-more, Vision exhibits remarkable proficiency in multilingual chat, supporting a widearray of global languages as well as regional Indian languages such as Hindi, Punjabi,and Marathi. We believe that Vision represents a significant step towards buildingmore inclusive and culturally relevant AI systems, with the potential to positively im-pact various domains in India and beyond.

Comments: 16 Pages.

Download: PDF

Submission history

[v1] 2024-06-01 18:57:25

Unique-IP document downloads: 357 times

Vixra.org is a pre-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.

Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.

Artificial Intelligence

Vision: A Culturally-Aware Multimodal AI

Submission history