• Home
  • Internet
  • Nothing more advanced! Meta’s ImageBind AI mimics human perception

Nothing more advanced! Meta’s ImageBind AI mimics human perception

Meta announced ImageBind, a new open-source AI model that connects multiple data streams including text, audio, visual data, temperature and motion readings. While the model is only a research project at this point, the presented ...
 Nothing more advanced!  Meta’s ImageBind AI mimics human perception
READING NOW Nothing more advanced! Meta’s ImageBind AI mimics human perception
Meta announced ImageBind, a new open-source AI model that connects multiple data streams including text, audio, visual data, temperature and motion readings. Although the model is only a research project at this point, the examples presented show the point reached in artificial intelligence and leave mouths open. ImageBind can create a photo of a bird from a photo of a bird, a photo of a train from a train, or a photo of a car engine and a car parked by the sea from a photo of the sea. What they can do is only the tip of what Meta is aiming for.

Multimodal ImageBind, “very” different

Meta has open sourced an artificial intelligence tool called ImageBind that predicts connections between data, similar to how people perceive or imagine an environment. While renderers like Midjourney, Stable Diffusion, and DALL-E 2 allow you to create visual scenes based solely on a text description by matching words to images, ImageBind creates a wider web. Combining text, image/video, audio, 3D measurements (depth), temperature data (thermal), and motion data (from inertial measurement units), ImageBind does this without having to train on every possibility.

Human perception is imitated

You can view ImageBind as a tool that brings machine learning closer to human learning. For example, if you’re standing in a stimulating environment such as a busy city street, your brain (largely unconsciously) absorbs sights, sounds, and other sensory experiences to extract information about passing cars and pedestrians, tall buildings, the weather, and much more. Humans and other animals are shaped to process this data in order to survive and pass on our DNA, which is our genetic advantage. As computers come closer to mimicking animals’ multi-sensory connections, they can use these connections to create scenes that are completely fabricated based on only limited pieces of data.

He doesn’t take the photo, he just aims to create that moment

So you could use Midjourney to ask the question “a retriever wearing a Gandalf outfit while balancing on a beach ball” and get a relatively realistic photo of this bizarre scene, but a multi-modal AI tool like ImageBind will eventually capture a video of the dog, a detailed suburban sitting. room with associated sounds, including the temperature of the room and the precise positions of the dog and everyone else in the scene. In short, ImageBind does not take a photo of a moment, it aims to create that moment directly.

Target VR and metaverse?

Meta does not hesitate to give examples of what to do with this new toy. In fact, Meta doesn’t hesitate to openly state its core goal: VR, mixed reality, and the metaverse. For example, imagine a title in the future that can instantly render fully real 3D scenes (with sound, motion, etc.). Or, virtual game developers can use it to eliminate much of the legwork from their design process.

Similarly, creators can create immersive videos with realistic soundscapes and motion based solely on text, image or audio input. It’s also not hard to imagine that a tool like ImageBind will open new doors in accessibility, creating real-time multimedia annotations to help visually or hearing impaired people better perceive their immediate surroundings.

“Typical AI systems have a specific embedding (i.e., number vectors that can represent data and its relationships in machine learning) for each modality,” the meta blog post states. shows that it is possible to create a burial area.” says.

Meta also doesn’t want to stop ImageBind at this point. Yes, the model combines 6 different senses or modes, but Meta aims to introduce new modalities that connect as many senses as possible, such as touch, speech, smell and brain fMRI signals, in addition to the six modalities in the future.

Comments
Leave a Comment

Details
175 read
okunma40305
0 comments