SFC leader wants to repeal the term “open source AI”

OSI Open Source AI Definition

Few days ago SFC made official its new definition of “Open Source Artificial Intelligence” to which Bradley M. Kuhn, leader of the Software Freedom Conservancy (SFC), has expressed concern on this definition.

Kuhn mentions that his intention to annul said term It is because Current “open AI” criteria may have serious consequences by diluting the value of the term “open source” and dividing the community. One of the points that has generated conflict is the exclusion of the requirement to publish the data used to train AI models.

According to the OSI, including this requirement would make it impossible for most current language models to be considered open, since their training data is mostly private.

Kuhn considers that this definition was approved in a hasty manner, without the same exhaustive and lengthy process that gave the definition to traditional “open source.” In his view, OSI should have labeled this new guidance a “recommendation” rather than a “definition,” given that AI systems are still in their early stages of development.

The bottom line here, in my view, is simple: OSAID does not require public reproducibility of the scientific process of building these systems, because it does not set sufficient requirements on licensing and public disclosure of training sets for so-called “open source” systems. The OSI declined to add this requirement because of a fundamental flaw in their process; they decided that “it made no sense to publish a definition that no existing AI system could currently meet.”

Meanwhile The OSI argues that the definition will help avoid ambiguous use of the term. "opened" in the context of AI, since many manufacturers label their models as open only because they allow access to certain components, such as weighting coefficients, but restrict their use or do not disclose all implementation details.

La OSI has established that an open AI system only needs to offer information detailed about the data used in your training, without requiring the data to be public. Kuhn, however, believes that this limitation prevents AI models from meeting the reproducibility needed in open source software, where full access to data and code is essential.

In his critique, Kuhn mentions that, by skipping access to the training data, The OSI has reduced the definition to a purely technological approach, without recognizing AI as a complete, reproducible system, which he says conflicts with open source principles.

I don't really know for sure (yet) whether the only way to respect user rights in a LLM-backed generative AI system is to only use training sets that are publicly available and licensed under free software licenses. I think that's the ideal and preferred way for modification of such systems.

La definition of “open AI system” approved by the OSI has generated controversy since it guarantees only two of the four freedoms fundamental of open source software: the ability to use and distribute. However, the freedoms to study and modify the model are not fully guaranteed, mainly due to the lack of access to training data. This omission also complicates the detection of possible backdoors inserted in AI models.

From OSI's point of view, The restriction on the publication of data is understood, as it is often due to factors outside the control of the developers, such as the protection of confidential data, copyright, or licensing agreements with third partiesHowever, critics, including Bradley Kuhn and members of the Debian community, argue that these challenges do not justify a definition that weakens open source principles. The absence of access to training data, they argue, diminishes the value of open AI and threatens to dilute the meaning and integrity of the open source movement.

Kuhn mentions that he plans to participate in the upcoming OSI leadership elections to try to overturn this definition and promote its classification as a recommendation only, and not a standard. In addition, other bodies such as the Open Source Foundation are developing their own definition of free AI, which will include a requirement for data availability, although recognizing ethical exceptions for certain types of data, such as medical or personal data.

Finally if you are interested in knowing more about it, you can check the details In the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.