Meta announced ImageBind, a new open-source AI model that connects multiple data streams including text, audio, visual data, temperature and motion readings. Although the model is only a research project at this point, the examples presented show the point reached in artificial intelligence and leave mouths open. ImageBind can create a photo of a bird from a photo of a bird, a photo of a train from a train, or a photo of a car engine and a car parked by the sea from a photo of the sea. What they can do is only the tip of what Meta is aiming for.
Multimodal ImageBind, “very” different
Meta has open sourced an artificial intelligence tool called ImageBind that predicts connections between data, similar to how people perceive or imagine an environment. While renderers like Midjourney, Stable Diffusion, and DALL-E 2 allow you to create visual scenes based solely on a text description by matching words to images, ImageBind creates a wider web. Combining text, image/video, audio, 3D measurements (depth), temperature data (thermal), and motion data (from inertial measurement units), ImageBind does this without having to train on every possibility.
Human perception is imitated
You can view ImageBind as a tool that brings machine learning closer to human learning. For example, if you’re standing in a stimulating environment such as a busy city street, your brain (largely unconsciously) absorbs sights, sounds, and other sensory experiences to extract information about passing cars and pedestrians, tall buildings, the weather, and much more. Humans and other animals are shaped to process this data in order to survive and pass on our DNA, which is our genetic advantage. As computers come closer to mimicking animals’ multi-sensory connections, they can use these connections to create scenes that are completely fabricated based on only limited pieces of data.
He doesn’t take the photo, he just aims to create that moment
So you could use Midjourney to ask the question “a retriever wearing a Gandalf outfit while balancing on a beach ball” and get a relatively realistic photo of this bizarre scene, but a multi-modal AI tool like ImageBind will eventually capture a video of the dog, a detailed suburban sitting. room with associated sounds, including the temperature of the room and the precise positions of the dog and everyone else in the scene. In short, ImageBind does not take a photo of a moment, it aims to create that moment directly.
Target VR and metaverse?
Meta does not hesitate to give examples of what to do with this new toy. In fact, Meta doesn’t hesitate to openly state its core goal: VR, mixed reality, and the metaverse. For example, imagine a title in the future that can instantly render fully real 3D scenes (with sound, motion, etc.). Or, virtual game developers can use it to eliminate much of the legwork from their design process.
Similarly, creators can create immersive videos with realistic soundscapes and motion based solely on text, image or audio input. It’s also not hard to imagine that a tool like ImageBind will open new doors in accessibility, creating real-time multimedia annotations to help visually or hearing impaired people better perceive their immediate surroundings.
“Typical AI systems have a specific embedding (i.e., number vectors that can represent data and its relationships in machine learning) for each modality,” the meta blog post states. shows that it is possible to create a burial area.” says.
Meta also doesn’t want to stop ImageBind at this point. Yes, the model combines 6 different senses or modes, but Meta aims to introduce new modalities that connect as many senses as possible, such as touch, speech, smell and brain fMRI signals, in addition to the six modalities in the future.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.