First Look: GPT4

March 15, 2023
GPT4 has launched, here's our first take on the exciting things ahead

Yesterday was a big day for AI: We saw the launch of three large foundation models on the same day. Open AI competitor, Anthropic, published their API for Claude, a chatbot to rival OpenAI’s ChatGPT. Google made PaLM, its very own large language model, available to developers. Though, the most long-awaited and anticipated release was that of GPT4, the successor to GPT3, which powered ChatGPT. GPT4 is, on most benchmarks, the most state-of-the-art model that brings us one step closer to achieving Artificial General Intelligence.

Time To Rethink How We Work

These are remarkable times for AI, with the emergence of new models that are transforming the field. In July 2022, OpenAI unveiled DALLE2, an advanced text-to-image model that has been widely adopted. More recently, OpenAI introduced Whisper, an Automatic Speech Recognition model that outperforms all other models in terms of reliability and precision. These models excel at translating one modality into another, such as text-to-text for GPT3, text-to-image for DALLE2, and speech-to-text for Whisper. However, GPT4 is the Open AI’s first model to tackle true multi-modality by accepting both images and text as inputs and generating text as outputs.

This is an example of what GPT4 is able to do:

No alt text provided for this image

OpenAI's Multi-Modal Breakthrough

While the advancement of the multi-modal GPT4 model may seem like an incremental step forward compared to GPT3, its significance is hard to overstate. Multi-modal comprehension is a complex challenge to tackle and requires advanced joint-modeling techniques to master it. The heterogeneity of data across modalities, the lack of annotated data, and the need for immense amounts of computational resources make training multimodal models incredibly challenging. However, with Open AI's exhaustive computing credits (via Microsoft Azure) and its innovative approach to setting up its model training architecture, they have been able to overcome these obstacles and make its model accessible via an API endpoint less than two years after the launch of GPT3.

Paving the Way for General Purpose AI

The multimodality of GPT4 opens up a plethora of new opportunities and use cases and its capabilities resemble more and more what we humans can do. Humans are very multi-modal in a sense that our brains can process various sources of information such as sight, sound, taste, smell, and touch simultaneously. Even though AI models are still far away from being able to comprehend all these different modalities simultaneously, with GPT4 we are approaching a “general purpose AI” that is increasingly powerful independent of the specific use case and modality. For instance, when humans follow a presentation, they see a slide, quickly process the visual information and relate it to the verbal information the speaker is communicating. By combining a transcription tool, like Whisper, with GPT4, machines can achieve a similar level of comprehension and real-time reasoning as humans. This is just one of the many possibilities that GPT4's multimodality unlocks for businesses and researchers, paving the way for more advanced and intuitive AI applications that will supercharge us, humans.

Novel and Incredible Use Cases

GPT-4 demonstrates its remarkable capabilities across various domains, including documents that feature text, photographs, diagrams, or screenshots.

During the GPT-4 launch live stream on March 14th, an impressive example of its abilities was showcased as it, based on a prompt, accurately generated the appropriate HTML code to create a website based on sketches on a piece of paper (screenshots shown below).

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

Further, GPT4 performs incredibly well on captioning, describing, and interpreting images as shown in the screenshot below of the GPT4 whitepaper.

No alt text provided for this image

More Accurate, More Reliable, But Still Hallucinating

Still, GPT4 is struggling with the issue of "hallucination," where the model generates false information due to a lack of understanding of what is true and what is not. So the model still “makes up” stuff and confidently conveys it as true when it is factually false. It continues to do that as it, similar to GPT3, was trained on public data on the internet where we encounter a lot of misinformation - the model is not able to “only listen to” reliable sources. Nonetheless, GPT4's overall accuracy and correctness have improved significantly compared to GPT3, as evidenced by its performance on standardized tests and its deeper expertise on a wider range of topics. Additionally, GPT4 is now also better equipped to handle more nuanced instructions, and it performs exceptionally well in different languages, making it highly usable for non-English speakers.

Excitement For What’s Ahead

All in all, while GPT-3 has already demonstrated impressive language processing capabilities, GPT-4's multi-modal approach enables it to learn from various inputs, including text, images, and audio, leading to more accurate and nuanced responses. This opens up new possibilities for businesses and consumers, such as creating more intuitive, interactive and personalized experiences across all industries. Further, GPT-4's improved accuracy and ability to reason and problem-solve could help address more complex challenges in areas like healthcare, law, and scientific research.

Read more