Back

Leap in computer vision - from assistive aides to generative models

February 8, 2022
#AI22 - No. 3 of 10

#AI22 is a series of articles highlighting what we believe to be 10 developments that will be impacting AI this year.
This series is co-written by
Dr. Johannes Otterbach, Dr. Rasmus Rothe and Henry Schröder.

---

In 2020 and 2021 we have seen an increase in academic breakthroughs in the field of computer vision (Dall-E, nuwa, Diffusion model, ViT, etc.), akin to the NLP breakthroughs in 2017 - 2020. Hence, we can expect a move towards more industrial applications within this field. This AI innovation in computer vision, video and AR/VR will be supercharged within the coming years and a shift from assistive models to creative generative models will take place.

By allowing a computer the automation and augmentation of human sight, computer vision offers a huge array of use cases. Computer vision follows a similar development in complexity as NLP models have and still are experiencing: from understanding to processing and finally generating. The generation of high-quality imagery through computer vision is currently on the verge from academia to business and will further accelerate rapidly within the next year. While image recognition is beginning to be well integrated into industry applications - with well-known examples such as self-driving cars or check-out free shopping - image generation will experience this same development in the near future. 

A key aspect in generating creative output is the concept of the multimodality of models, meaning the combination of models with varying input data (e.g. text and image or audio and video). A prime example is the OpenAI model GLIDE, which enables image generation and editing from a descriptive text in impressive quality and does so in a huge variety of layouts, proving the ability of creative output by a model. This progress will provide marketing experts, designers, or entertainment executives with the possibility to output a much higher number of personalized content a lot more efficiently. 

With online content generation enabled by AI, so will the application of AI in augmented reality (AR) and virtual reality (VR). Photorealistic renderings, as well as computer vision, provide a huge variety of applications in AR for example in shopping, gaming, or education. In gaming we will see further developments in machine vision surface detection, to even better understand the players' physical environment, but also technologies such as procedural content generation which optimizes the gameplay of a player by taking into account its physical surroundings. However, the application of AI in AR/VR is not only recognized by entertainment companies but also by education, construction, or healthcare companies. The academic breakthroughs of the last years will start to materialize in more generative applications in all spaces of our economy and personal lives. 

Read more