OpenAI's AGI Breakthrough? π€―, $100B AGI Deal π€, China's AI Surge π
PLUS: 6G Powered by AI π‘, OpenAI's Profit Pivot πΈ, and the Looming AI Data Drought π΅
π Welcome in the New Year with This Week in AI.
π΅ Donβt feel like reading? Listen to two synthetic podcast hosts talk about it instead.
π° Latest news
OpenAI's o3: A Costly Step Towards Artificial General Intelligence
OpenAI's o3 model is the closest thing weβve seen to artificial general intelligence (AGI), particularly in its performance on the ARC-AGI benchmark, a test designed to gauge "general intelligence."
The model achieved an incredible score of 85%, equalling the average human score and significantly outperforming the previous AI best of 55%.
This leap is attributed to a technique called test-time compute, allowing o3 to deliberate longer and explore various possibilities before answering, resulting in more accurate responses to complex prompts.
The best analogy Iβve heard to explain this is that you wouldnβt give a human the same amount of time to answer a difficult problem as you would an easy one, yet this is exactly what weβve been doing with AI until now.
o3 scored 76% in low-compute mode. This was still regarded as a breakthrough-worthy score. However, this advancement has been met with considerable scrutiny regarding the testing methodology.
Critics, including researchers from various institutions, point out that o3 was pre-trained on hundreds of public examples of the test, a stark contrast to how humans approach the assessment (most other models are also trained on this).
This pre-training casts doubt on the direct comparability of o3's performance to that of humans. Additionally, o3 was trained specifically for the ARC-AGI test, unlike previous models, which were more general-purpose.
The lack of transparency in OpenAI's presentation and the omission of crucial data, such as the model's performance without pre-training, have further fuelled concerns about the true extent of o3's capabilities.
What is irrefutable, however, is that o3 is remarkable. For example, it scores a 2727 ELO on Codeforces, placing it 175th in the global rankingsβoutperforming approximately 99.9% of users on the site, who are already among the most elite coders in the world.
Why it Matters
The advent of o3 marks an important evolution in the field of AI. Its ability to perform at human-level standards on a complex test like ARC-AGI indicates the potential for AI to tackle increasingly sophisticated tasks.
However, the associated costs are staggering. The high-compute mode of o3, which achieved the 85% score, incurred costs exceeding $1,000 per task, while even the low-compute mode cost approximately $20 per task.
In contrast, the previous o1 model cost less than $4 per task, and human performance is estimated at a mere $5 per task.
The high cost of running models like o3 is driving a shift in AI chip architecture. While GPUs excel at training, they are expensive for everyday inference. This has generated a demand for specialised chips that make AI deployment more affordable and accessible.
The hope is that cost-performance will improve dramatically over the next few months and years. The balance between performance gains and economic feasibility will be a crucial factor in AGI.
One thing is certain, we are on the path to AGI and the vast majority of people have zero idea of the very real impact that it will have on our economy and society.
π Blog by Gary Marcus on the ARC-AGI test scores
π° Article by Ars Technica on o3
AGI Defined: Microsoft and OpenAI's $100 Billion Agreement
Microsoft and OpenAI have established a distinctive, financially driven definition of AGI, according to a report from The Information.
Unlike traditional technical definitions, their agreement stipulates that OpenAI achieves AGI only when it generates $100 billion in profits.
This year, the startup is projected to lose billions of dollars and anticipates reaching profitability only by 2029. This profit-centric approach to defining AGI sets a unique precedent in the field of AI development.
Why It Matters
This agreement between Microsoft and OpenAI has several implications.
Firstly, it provides a clear, albeit unconventional, roadmap for OpenAI's development towards AGI.
Secondly, it potentially grants Microsoft access to OpenAI's technology for a decade or more, given the projected timeline for profitability.
This extended access could offer Microsoft a strong advantage as AI technology continues to advance.
Chinese AI Firms Thrive with Open-Source Models Despite Chip Restrictions
Chinese AI companies are making notable strides in AI, releasing open-source models that rival, and in some cases surpass, their Western counterparts.
DeepSeek, an AI startup, recently launched DeepSeek-V3, an ultra-large language model with 671 billion parameters. Despite its size, the model employs a mixture-of-experts architecture, activating only 37 billion parameters per token, resulting in efficient training and inference.
DeepSeek-V3 has demonstrated exceptional performance, outperforming leading open-source models like Llama 3.1-405B and Qwen 2.5-72B, and closely competing with closed-source models from Anthropic and OpenAI on various benchmarks.
DeepSeek-V3 was trained on 14.8T tokens, and the company claims to have completed training in about 2788K H800 GPU hours, costing approximately $5.57 million.
Similarly, Alibaba's Qwen team introduced QVQ-72B-Preview, an experimental open-source AI model excelling in visual reasoning, particularly in mathematics and physics.
It achieved a score of 70.3 on the MMMU benchmark, approaching the performance of top closed-source models.
These advancements are occurring amid US restrictions on the export of advanced AI chips to China.
ByteDance, the parent company of TikTok and Doubao (China's popular AI chatbot with 51 million active users), exemplifies the strategies employed to navigate these restrictions.
Reports suggest ByteDance plans to spend $7 billion on Nvidia chips in 2025, potentially becoming one of the world's top owners of these chips. To comply with US restrictions, ByteDance is reportedly storing these chips in data centres outside of China, such as in Southeast Asia.
Why it Matters
The progress made by Chinese AI firms, particularly with open-source models, demonstrates a significant evolution in the global AI landscape.
Despite facing limitations on acquiring cutting-edge hardware, companies like DeepSeek and Alibaba are leveraging innovative architectures and training techniques to develop highly competitive AI models.
This not only narrows the performance gap between open-source and closed-source models but also fosters a more diverse and accessible AI ecosystem.
The ability of these companies to achieve state-of-the-art results under resource constraints highlights their ingenuity and could accelerate the democratisation of AI technologies.
Moreover, the strategies employed by companies like ByteDance to secure necessary hardware, while adhering to international regulations, underscore the intense global competition in AI and the lengths to which firms will go to maintain a competitive edge.
π° Article by Venture Beat on DeepSeek-V3
π° Article by Tech Crunch on skirting US restrictions
π Release paper by QWEN on QVQ
The Future is 6G: AI, Open-Source, and a United Front Between the EU and US
The EU and U.S. are jointly advancing 6G technology, heavily integrating artificial intelligence (AI) to enhance network capabilities.
A key initiative is the 6G Trans-Continental Edge Learning (6G-XCEL) project, part of the EU's seven-year Horizon Europe program, which is developing a decentralised AI platform (DMMAI) for seamless operation across various networks.
This is supported by the $42 million U.S. ACCoRD program, focused on creating testing facilities for next-gen wireless systems. Open-source projects like COSMOS provide crucial real-world testbeds.
AI's role is central, enabling flexible network architectures through open radio access networks (ORANs) and introducing functionalities like AI-as-a-Service (AIaaS).
Why It Matters
AI integration transforms 6G beyond just faster data, enabling connected intelligence for advanced applications such as localisation and sensing.
This shift towards efficient information processing enhances computing and communication systems.
AIaaS promises automated service management and network operations, meeting the demands of a data-driven world with increased efficiency.
The EU-U.S. collaboration ensures a unified approach to standards, fostering global innovation and practical applications in the next generation of wireless networks. This joint effort aims to create an adaptable framework, rather than just a research project, ensuring widespread usability and adoption.
OpenAI's Bold Transition: Funding the Future of AI
OpenAI is undergoing a transformation, planning to restructure its for-profit arm as a Delaware-based Public Benefit Corporation (PBC). This move is driven by the immense financial demands of developing AGI.
Initially established as a non-profit research lab in 2015, OpenAI found that achieving its ambitious goals required far more capital than donations could provide. The company estimates that hundreds of billions of dollars will be necessary to advance its AGI research.
The new PBC structure is designed to attract significant investment by offering conventional equity to investors, a departure from its previous capped-profit model.
In this new structure the original non-profit will be given a lot of shares. This strategic shift allows OpenAI to compete with other major players in the AI field, such as Anthropic and xAI, which have also adopted similar corporate structures.
Currently, over 300 million people use OpenAI's ChatGPT each week, demonstrating the widespread adoption and impact of its technology.
The latest funding round of $6.6 billion, valuing the company at $157 billion, underscores the scale of investment required and the confidence of backers like Microsoft, which holds a 49% stake in OpenAI.
Why It Matters
OpenAI's restructuring signals a maturation in the field of AI development, highlighting the shift from theoretical research to large-scale, capital-intensive projects.
By embracing a for-profit model, OpenAI is positioning itself to secure the funding needed to build the advanced AI systems of the future.
This transition reflects the growing realisation that achieving AGI requires not only technological innovation but also substantial financial resources to develop the required infrastructure.
π Blog by OpenAI on the transition
AI's Growing Pains: Navigating the Data Drought
The rapid advancement of AI, particularly in large language models (LLMs) like those behind ChatGPT, is facing a looming challenge: a shortage of training data.
A study projects that by around 2028, the size of data sets used to train AI models will match the total estimated stock of public online text, effectively exhausting available data.
This issue is compounded by content providers, such as newspapers, implementing stricter controls on how their data is used, with restrictions rising from less than 3% in 2023 to 20-33% in 2024.
To combat this, AI developers are exploring innovative solutions. One approach involves using synthetic data, with companies like OpenAI generating a staggering 100 billion words per day.
However, synthetic data has limitations, as it can lead to the entrenchment of errors and a decline in learning quality. Another strategy is the development of smaller, more specialised models that require refined data and better training techniques.
Additionally, there's potential for AI to learn from non-text data, such as videos and sensory experiences, similar to how a 4-year-old child absorbs 50 times more information than current LLMs through observation.
Why It Matters
The data shortage is a turning point for AI development. It highlights the limits of relying solely on vast quantities of text data and pushes the industry towards more sustainable and efficient practices.
The exploration of synthetic data and specialised models indicates a shift from the "bigger is better" paradigm to a more nuanced approach, focusing on quality over quantity.
This evolution could lead to AI systems that are not only more powerful but also more adaptable and resource-efficient.
Moreover, the increasing restrictions on data access raise important questions about data ownership and the balance between innovation and compensation for content creators.
As AI continues to evolve, these developments will be crucial in shaping a future where AI can continue to advance while addressing ethical considerations and resource limitations.
The ability of AI to learn from diverse data types and through methods like self-reflection opens up exciting possibilities for the next stage of AI development, potentially leading to more human-like learning capabilities.
π° Paper in Nature
Sriram Krishnan: Trump's AI Advisor and What it Means for the Industry
Sriram Krishnan, a former tech executive at Microsoft, Twitter, Facebook, and Snap, and a general partner at Andreessen Horowitz (a16z), has been appointed as President-elect Donald Trump's senior policy advisor for AI.
He'll work with Trump's crypto and AI "czar," David Sacks.
Krishnan, who also has close ties to Elon Musk, led a16z's London office until November 2023.
Why It Matters
Krishnan's appointment suggests a pro-innovation stance on AI policy from the incoming Trump administration.
His background in big tech and venture capital, combined with his connections to figures like Musk, points to a focus on fostering industry growth.
His prior call for new ways for websites to interact with AI chatbots hints at policies that could balance innovation with data access concerns.
This appointment, alongside Sacks' role, may signal an administration that prioritises AI industry expansion, potentially through deregulation or research incentives.
This could benefit companies developing and using AI technologies, shaping the trajectory of the sector's growth.