Open Source AI Defined

The Open Source Initiative (OSI) has released a new definition for open-source AI, aiming to bring clarity to a rapidly evolving field. According to reports from MIT Technology Review, this definition outlines that an open-source AI system can be freely used, studied, modified, and shared for any purpose without requiring permission.

OSI’s Role in AI Definition

The Open Source Initiative (OSI) has taken a leading role in defining open source AI, leveraging its 25-year history as a steward of open source software principles. After two years of collaborative efforts involving researchers, lawyers, policymakers, and tech industry representatives, the OSI released a draft definition that aims to bring clarity and rigor to the open source AI discussion

This definition emphasizes four key freedoms for open source AI systems:

  • The ability to freely use the system
  • The right to study how the system works
  • The freedom to modify the system for any purpose
  • The ability to share the system with others, with or without modifications

The OSI’s definition also addresses the complex issue of training data transparency, acknowledging the challenges of sharing full datasets while promoting norms that support wider availability of AI training data. By establishing this definition, the OSI aims to combat “openwashing” practices and provide a valuable reference point for developers, advocates, and regulators in the rapidly evolving field of AI.

Impact of Open Source AI on Regulation

Open source AI is significantly impacting regulatory approaches worldwide. The EU AI Act, while initially considering exempting open-source AI, ultimately included partial exemptions for systems below certain computational thresholds. This compromise aims to balance innovation with safety concerns. Meanwhile, California’s AI safety bill has sparked debate over potential liabilities for open-source AI developers.

Key regulatory considerations for open-source AI include:

  • Defining what qualifies as “open source AI” for regulatory purposes
  • Balancing transparency requirements with potential security risks
  • Addressing liability concerns for developers of open models
  • Ensuring regulations don’t inadvertently concentrate AI development among large tech companies
  • Developing flexible frameworks that can adapt to rapid technological advancements

Challenges in AI Openwashing

The practice of “openwashing” in AI poses significant challenges to transparency and innovation in the field. Companies like Meta, Microsoft, and Mistral have been accused of strategically co-opting terms like “open” and “open source” while shielding their models from scientific and regulatory scrutiny. This trend threatens to undermine the core principles of openness in AI development:

  • Resource intensity: Building and running AI models requires substantial computing power and financial backing, limiting widespread participation
  • Selective disclosure: Some companies release only partial components of their AI systems, such as model weights, while keeping critical elements like training data and processes proprietary
  • Regulatory loopholes: The EU AI Act’s special exemptions for “open source” models may inadvertently incentivize openwashing practices

These challenges highlight the need for clearer definitions and standards of openness in AI to foster genuine transparency and innovation in the field.

OSAID 1.0 Release

The Open Source AI Definition (OSAID) version 1.0, released by the Open Source Initiative (OSI) on October 28, 2024, marks a significant milestone in establishing clear guidelines for open source AI systems. OSAID 1.0 outlines specific requirements for AI systems to be considered open source:

  • Data Information: Detailed information about training data must be provided, including descriptions of data sources, processing methods, and availability
  • Code: Complete source code used for training and running the system must be made available under OSI-approved licenses
  • Parameters: Model parameters, such as weights and configuration settings, must be accessible under OSI-approved terms

The definition emphasizes transparency and reproducibility, requiring that a skilled person should be able to recreate a substantially equivalent system using the provided information. While OSAID 1.0 does not mandate full disclosure of training datasets, it aims to strike a balance between openness and practicality in the rapidly evolving AI landscape.

Source: Perplexity