🎨 ChatGPT's New Interaction Paradigm + 🚨 AI Privacy Nightmare
Plus, Nvidia's open-source answer to GPT-4, Meta's AI for video generation, and more!
Welcome to this week in AI.
This week's news: OpenAI transforms ChatGPT with Canvas, Nvidia releases an open-source model to rival GPT4, and a Harvard experiment raises privacy alarms.
Plus, explore advancements in AI-powered video generation, cancer research, and efficient model deployment.
Let’s get caught up!
Don’t feel like reading? I’ve turned this newsletter into a podcast.
I’ve had very positive feedback for these synthetic podcasts, create your own here.
Don't Miss Out
Join me at the QLD AI Meetup on October 30th in Brisbane to discuss redefining UX for AI, plus get my upcoming Ultimate Guide For Designing AI Systems.
OpenAI Reimagines ChatGPT with Canvas
OpenAI has unveiled Canvas, an upgrade to the ChatGPT interface that reimagines human-AI interaction, offering a more fluid and collaborative workspace.
By moving beyond the limitations of a simple chatbox, Canvas creates a dedicated space for collaborative writing and coding projects.
This new interface features inline feedback, targeted editing, and intuitive shortcuts, making it easier for users to refine their work.
Early tests have shown promising results, with GPT-4o exhibiting a 30% accuracy boost and a 16% quality improvement when used with Canvas.
Canvas is currently being rolled out in beta to Plus and Team users, with a broader release anticipated in the near future.
Why it Matters
Canvas shows OpenAI’s willingness to align and innovate on ChatGPT’s user experience, somewhat similar to Anthropic's Artifacts.
The new interface enhances collaboration with the AI and provides users with greater control over their projects.
By introducing new editing features, shortcuts, and added contextual knowledge, these tools redefine how we can work with AI, moving beyond simple chat interactions.
This trend towards more interactive and collaborative AI interfaces is indicative of a broader standardisation of new interaction paradigms.
I-XRAY: A Privacy Nightmare
Two Harvard students have developed I-XRAY, a system using AI-powered glasses that can identify strangers and reveal their personal information just by looking at them.
This proof-of-concept combines readily available technologies like Meta's Ray-Ban smart glasses, facial recognition software, reverse image search, and LLMs.
The students successfully tested I-XRAY on Harvard's campus, demonstrating its ability to accurately identify individuals and access details like their home address, phone number, and even family members.
Why it Matters
I-XRAY provides a stark illustration of the evolving privacy landscape in the age of AI.
While the students have no intention of releasing this technology, its very existence raises crucial questions about the potential for misuse and the urgent need for safeguards.
Here's why this matters to you:
AI-powered tools like I-XRAY could drastically erode personal privacy, making it increasingly difficult to maintain anonymity in public spaces.
This technology could be exploited for malicious purposes, such as stalking, harassment, or identity theft.
I-XRAY highlights the growing potential for AI-driven surveillance, raising concerns about who has access to such technology and how it might be used.
The I-XRAY project serves as a wake-up call, urging a broader conversation about the ethical implications of AI and the importance of protecting privacy in an increasingly interconnected world.
NVLM: Nvidia's Open Source Answer to GPT-4
Nvidia, a company renowned for providing the picks and shovels to almost the entire AI industry, has made a foray into the world of LLMs with the release of NVLM 1.0.
This open-source family of models has impressive capabilities, with the flagship NVLM-D-72B performing on par with industry giants like GPT-4 and Claude 3.5 across various vision and language tasks.
Unlike many of its counterparts, NVLM 1.0 is fully open source, granting researchers and developers unprecedented access to its inner workings.
Why it Matters
NVLM 1.0's open-source nature, like Llama's, democratises AI, fostering innovation and new applications.
By making its model weights and training code freely available, Nvidia removes barriers for researchers and developers, enabling them to experiment with and build upon the technology.
Moreover, NVLM-D-72B's exceptional performance, particularly in integrating visual and textual information, opens up new opportunities for AI in various fields.
Its ability to understand and process complex multimedia content unlocks diverse applications, from analysing scientific papers and generating creative content to assisting in medical diagnoses.
The Race to Reasoning: Google is Building the Foundations for Agentic AI
Google is intensifying its competition with OpenAI by developing AI software that mimics human-like reasoning, similar to OpenAI's o1 model.
This push signifies a new battleground in the AI race, with both companies striving to create more sophisticated AI capable of solving complex, multi-step problems.
Google's approach involves utilising "chain-of-thought prompting," a technique they pioneered, where the AI considers a series of related prompts before formulating a response, much like a human would reason through a problem.
Google has recently demonstrated progress in AI reasoning with AlphaProof and AlphaGeometry 2, excelling in mathematical problems.
Why it Matters
Google's advancements in AI reasoning could be a crucial step towards developing agentic AI, which refers to AI systems that can act independently and pursue goals. Here's how:
It enables AI to set goals and plan steps to achieve them.
Reasoning helps AI develop strategies to overcome challenges and achieve its objectives without human intervention.
The AI can learn from new information and adapt to changing circumstances.
It will help AI weigh options, evaluate consequences, and make informed decisions.
These advancements could lead to AI systems that are more autonomous, creative, and capable of solving real-world problems.
Meta Movie Gen: AI Video Made Easy
Meta's Movie Gen is a new AI system that simplifies video and audio creation.
It utilises advanced models to generate high-quality videos and audio from text prompts, offering features for content creation and editing.
Creates videos up to 16 seconds long (at 16 frames per second) from text descriptions. The model can understand and generate complex scenes with object motion, interactions, and camera movements.
Incorporate a user's image into AI-generated videos based on text prompts, preserving realistic human identity and motion.
Edit existing videos using text instructions. This includes adding, removing, or replacing elements, as well as changing the background or style.
Generate up to 45 seconds of synchronised, high-fidelity audio for videos, including ambient sound, sound effects, and instrumental music. An audio extension technique allows for audio generation for videos of any length.
Why it Matters
Movie Gen is a step towards more accessible and user-friendly AI tools for media production. It empowers individuals and creators to produce high-quality video and audio content with increased efficiency and creative control.
📝 Meta's Movie Gen landing page
Big Tech Joins Cancer Fight
In the race against cancer, a new alliance has formed.
Top cancer research centers and tech companies are uniting to accelerate cancer breakthroughs with AI.
With $40 million in funding, the alliance aims to use AI and cloud computing to analyse large volumes of cancer data, including medical images, genomic sequences, and patient records.
This technology will allow researchers to identify patterns and insights that may be missed with traditional analysis methods.
Why it Matters
This initiative stands out due to its application of advanced AI techniques like federated learning, which ensures that valuable medical data remains secure while still producing actionable insights.
By harnessing massive computational resources, CAIA can analyse complex data at high speed, offering breakthroughs in understanding tumour biology, treatment resistance, and new therapeutic targets.
The alliance represents a promising leap in cancer research, with the potential to provide faster, more effective treatments by the end of 2025.
📰 Article on Fred Hutch Cancer Center
Shrinking AI: Making Powerful Language Models More Accessible
Imagine trying to fit a massive textbook into your pocket – that's essentially the challenge of deploying LLMs on devices with limited resources.
VPTQ, or Vector Post-Training Quantization, is a new technique that helps solve this problem.
Think of it as a super-efficient compression algorithm specifically designed for LLMs. Instead of the usual way of compressing data, VPTQ uses a method called vector quantization.
This allows it to shrink massive LLMs, even ones with hundreds of billions of parameters, down to an incredibly small size without needing to retrain them.
Why it Matters
VPTQ has the potential to make powerful AI more accessible to everyone. Here's how:
Smaller models, bigger impact: By shrinking the size of LLMs, VPTQ allows them to run on devices like phones and laptops, not just large servers. This means you could have advanced AI assistants and tools on your phone.
Faster and cheaper: Smaller models require less processing power and memory, which translates to faster performance and lower costs. This can make AI technology more affordable and sustainable.
VPTQ is a significant advancement in AI technology. By making LLMs smaller and more accessible, VPTQ paves the way for a future where AI is more integrated into our daily lives.
This is particularly important for privacy and security, as running AI models locally on devices can reduce the need to send data to remote servers, minimising the risk of data breaches and ensuring that personal information remains private.
Flux 1.1 Pro: Faster, Better AI
Images
Black Forest Labs has upgraded its text-to-image AI model with the release of Flux 1.1 Pro.
This new iteration boasts a six-fold increase in speed compared to its predecessor, Flux 1 Pro, while simultaneously enhancing image quality and prompt adherence.
In a field where rapid advancements are the norm, Flux 1.1 Pro stands out, topping the Artificial Analysis image arena leaderboard against formidable competitors like Midjourney, Ideogram, and DALL-E.
Black Forest Labs has also introduced a new API that allows developers to seamlessly integrate the model into their applications.
Why it Matters
Flux 1.1 Pro offers a boost in speed and quality for text-to-image AI generation.
This empowers users to create stunning visuals more efficiently, benefiting industries like graphic design, advertising, and film.
The BFL API further expands the reach of Flux 1.1 Pro by enabling developers to integrate the model into their own applications.
This increased accessibility fosters innovation and the development of new tools and services that leverage AI-powered image generation.
📝 Black Forest Labs launch post