State of Open (Source?!) and Free AI – a FOSDEM recap

February 15, 2024
4303 Unique Views
7 min read

Table of Contents

FOSDEM
What is Open (Source) AI?
Why Free and Open?
What are the components of an AI system?
State of “Open”-ness in AI systems
What is AI system Specification?
TLDR;
References

Disclaimer: This article is on the things I learned/observed spending the day in AI and Machine Learning Developer Room at FOSDEM 24. Opinions and statements are mine and have nothing to do with my employer. This article might raise more questions than answers, but in my opinion, we all need more awareness on this topic and get familiar with the (right) questions that are to be answered.

FOSDEM

FOSDEM (Free Open-Source Developers’European Meeting) is a community-organised event that is free and non-commercial. The aim is to provide a venue for free and open-source software developers and communities to:

connect with other developers and projects.
learn about the newest trends in the free software world.
learn about the newest trends in the open-source world.
listen to interesting talks and presentations on diverse topics by project leaders and committers.
to encourage the development and benefits of free software and open-source solutions.

There were 35 devrooms, ranging from Java, Containers, Go, Rust, Network, Community, and other various topics. Although I am a huge fan of Java and OSS eco-system around it, but I went to FOSDEM this year specifically to understand and discuss about the state and direction of AI in Free and/or Open-Source world. And this article is about that.

“An AI system is a machine-based system that can, for a given set of human-defined objectives, make predictions, recommendations, or decisions influencing real or virtual environments. AI systems are designed to operate with varying levels of autonomy.” – Open-Source Initiative, AI definition

What is Open (Source) AI?

To be Open Source, an AI system needs to make its components available under licenses that individually grant the freedoms to:

Study how the system works and inspect its components.
Use the system for any purpose and without having to ask for permission.
Modify the system to change its recommendations, predictions, or decisions to adapt to your needs.
Share the system with or without modifications, for any purpose.

The Golden Rule applies “also” to AI > If I like an AI system, I must be free to share it with other people. (Reference #4)

Why Free and Open?

The term ‘open source’ means software that is available on an open-source licence that lets anyone see the source code or the code that humans can read and allows anyone using the code on that licence to keep and change the code. They can do this by themselves, or with a skilled third party they choose. The Open-Source Initiative must approve open-source licenses.(Reference #1, #2)

"Free software" is a different term though and it means any piece of software that doesn't cost anything, but there is a difference between free and open-source software. Because open-source software is not only free in terms of money---"free" also means the freedom open-source software gives its users by being easy to modify and more transparent. (Reference #2, #3)

There is a general emphasis on ethics and morals in the open-source community with how developers treat their users. While it's not a sure thing, this can help to make sure you're getting the best experience possible without being exploited for private data. And because the source code is public, it is easy for knowledgeable users to find out if the developers are doing something untrustworthy. (Reference #2, #3)

The supply-side value of widely used Open-Source Software (OSS) is $4.15 billion, but that the demand-side value is much larger at $8.8 trillion.(Reference #5) To put some perspective, this amount is 30% more than the total federal budget of USA in 2023.(Reference #6)

What are the components of an AI system?

It was easy to categorize a software or the code behind and although it had its complications but the definition of components in a traditional software is straightforward. But it becomes very complicated when we try to define the same for an AI system.

A (current possible) identified components of an AI system:(Reference #7)

Data
a. The data on which it is trained.
b. Description of it.
c. Collection methodologies.
d. Hosting options and costs.
e. Transparency of data quality.
f. Ability of opting out.
Code
a. Data cleaning/processing related.
b. Actual training code.
c. Assumptions/pre-reqs related to the implementation.
External
a. Specification of hardware on which it is trained.
b. Time spent on training.
c. Configurations.
d. Definition of correctness.
Output
a. Model it produces.
b. Binary data it comprises of.
c. Tasks or results it generates.

This also implies, that the definition of FREE and OPEN might be different for each component or a sub-set of a component. For example, a model which identifies early-stage cancer based on X-Ray or MRI images might want to shield the data it is trained on due to privacy regulations, but at the same time can have the rest of the components FREE and/or OPEN. Modification to this model by the community would be defined differently.

State of “Open”-ness in AI systems

Currently there is no proper definition of open-ness for AI systems, and they fall under a big spectrum.(Reference #8)
And for reasons mainly of ethical consideration and on how to engage with whole or parts of AI system, a definitive guide is needed.

Mostly now, the access and usage of an AI systems is managed by individual or additional license restriction.

But this imposes barriers against use, difficulties to adopt and improve, problem in control over the technology and weak oversight and transparency.

What we need is:

Open-ness in AI.
Interoperable licenses with possibilities of making it free.
Accessibility, Reusability and Sustainability of AI systems.
Ethical compliance to fall under purview of regulations and not software licenses.

What is AI system Specification?

Open-Source shows that when you eliminate the obstacles to learning, using, sharing and enhancing software systems, everyone benefits. These benefits come from using licenses that follow the Open-Source Definition. The benefits can be expressed as autonomy, transparency, and cooperative improvement. They are necessary for everyone in AI. We need basic freedoms to help users create and use AI systems that are trustworthy and clear.(Reference #4)

The current draft version is here > The Open Source AI Definition – draft v. 0.0.5 – Open Source Initiative and it follows the definition of AI system adopted by the Organization for Economic and Co-operation Development (OECD).
For each AI systems (such as Pythia, Llama, BLOOM, Mistral, Phi2, Olmo etc.) the Specification target to define:

What do you need to give an input and get an output?
What do you need to give an input and get a different output?
What do you need to understand why given an input, you get that output?
What do you need to let others give an input and get an output?
What’s the preferred form to make modifications to an AI system?

The plan and schedule of Open Initiative about this spec is to have a release candidate (RC) at the end of October’24.

Stakeholders engaged in this varies from system and license creators, regulators, end users and the subject.

Ongoing and following tasks of this spec for Open-Source Initiative are:

more publicity to the process
- public discussion forum https://discuss.opensource.org
- bi-weekly townhalls
- more opportunities to volunteer.
reach out to more stakeholders.
raise funds for 2024 meetings.
setup the board for review and approval of v. 1.0.

The drafts can be found at > Drafts of the Open Source AI Definition – Open Source Initiative

TLDR;

What is Open-Source AI and why it matters: Open-Source AI is an AI system that allows anyone to study, use, modify, and share its components under licenses that follow the Open-Source Definition. Open-Source AI matters because it offers benefits such as autonomy, transparency, and cooperative improvement, and it helps to create and use AI systems that are trustworthy and clear.

What are the components of an AI system and how to define their openness: An AI system is composed of data, code, external factors, and output, which can have different levels of openness depending on the licenses and specifications that apply to them. The openness of an AI system can be defined by the freedoms that it grants to its users and the transparency that it provides about its functioning and outcomes.

What are the challenges and barriers for Open-Source AI: Open-Source AI faces challenges and barriers such as privacy, quality, interoperability, and ethical compliance of its components, especially data and output. Moreover, Open-Source AI may face difficulties to adopt and improve due to individual or additional license restrictions, lack of control over the technology, and weak oversight and transparency.

What is the Open-Source AI Definition and its goals: The Open-Source AI Definition is a draft specification by the Open-Source Initiative that aims to provide a clear and consistent way to assess the openness of an AI system and its components. The goals of the specification are to encourage the development and benefits of Open-Source AI, and to ensure that AI systems respect the basic freedoms of their users.

What is the Open-Source AI Specification and how to use it: The Open-Source AI Specification is a set of questions that help to evaluate the openness of an AI system and its components, based on the freedoms to study, use, modify, and share them. The specification can be used by system and license creators, regulators, end users, and subjects to understand and engage with different aspects of an AI system.

References

Don’t Forget to Share This Post!

Soham Dasgupta

Author

Testing an OpenRewrite Recipe

Foojay Podcast #75: JCON Report, Part 4 – Tips and Tricks for Java Devs

Data Modeling for Java Developers: Structuring With PostgreSQL and MongoDB

Creating Scalable OpenAI GPT Applications in Java

Clean and Modular Java: A Hexagonal Architecture Approach

Dissection of Joeffice: Open Source Office Suite in Java

Building a Real-Time AI Fraud Detection System with Spring Kafka and MongoDB

Prime Time: The High Performance Java Event

Project Panama for Newbies (Part 1)

How I Improved Zero-Shot Classification in Deep Java Library (DJL) OSS

foojay: A Place for Friends of OpenJDK

Dashboard for OpenJDK Update Release Details

JDK14: New Features and Enhancements

Fun with Flags: My Top 10 Resources for JVM Flags

Performance of Modern Java on Data-Heavy Workloads: Real-Time Streaming

Performance of Modern Java on Data-Heavy Workloads: Batch Processing

How does Java handle different Images and ColorSpaces – Part 1

How does Java handle different Images and ColorSpaces – Part 2

How does Java handle different Images and ColorSpaces – Part 3

How does Java handle different Images and ColorSpaces – Part 4

Indexing all of Wikipedia, on a laptop

Working with Multiple Carets in IntelliJ IDEA

Clean Shutdown of Spring Boot Applications

Java 17 on the Raspberry Pi

How to Create Mobile Apps with JavaFX (Part 1)

Project Panama for Newbies (Part 1)

Foojay Slack: bit.ly/join-foojay-slack

Beginning JavaFX Applications with IntelliJ IDE

SpringBoot 3.2 + CRaC

Debugging Java on the Command Line

Stable, Secure, and Affordable Java

Azul Platform Core is the #1 Oracle Java alternative, offering OpenJDK support for more versions (including Java 6 & 7) and more configurations for the greatest business value and lowest TCO.

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Learn about a number of experiments that have been conducted with Apache Kafka performance on Azul Platform Prime, compared to vanilla OpenJDK. Roughly 40% improvements in performance, both throughput and latency, are achieved.

Comments (2)

temp mail

1 year ago

I wanted to express how wonderful your post is. I could tell you are an authority on this subject because of how obvious it is. If everything is up to you, I would want to follow your feed so I can be informed when you publish new content. Many thanks, and keep up the fantastic work.

Java Weekly, Issue 530 | Baeldung

[…] >> State of Open (Source?!) and Free AI – a FOSDEM recap [foojay.io] […]

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

State of Open (Source?!) and Free AI – a FOSDEM recap

FOSDEM

What is Open (Source) AI?

Why Free and Open?

What are the components of an AI system?

State of “Open”-ness in AI systems

What is AI system Specification?

TLDR;

References

Soham Dasgupta

Soham Dasgupta

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Comments (2)

temp mail

Java Weekly, Issue 530 | Baeldung

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Do you want your ad here?

State of Open (Source?!) and Free AI – a FOSDEM recap

FOSDEM

What is Open (Source) AI?

Why Free and Open?

What are the components of an AI system?

State of “Open”-ness in AI systems

What is AI system Specification?

TLDR;

References

Soham Dasgupta

Soham Dasgupta

Thanks to our Sponsors!

Azul

Redis

CodeRabbit

Reo

Zencoder

Payara

Digma

adesso

Trending

Stable, Secure, and Affordable Java

Apache Kafka Performance on Azul Platform Prime vs Vanilla OpenJDK

Stable, Secure, and Affordable Java

Step up your coding with the Continuous Feedback Udemy Course: Additional coupons are available

Jakarta EE 11: Beyond the Era of Java EE

Do you want your ad here?

Related Articles

Comments (2)

temp mail

Java Weekly, Issue 530 | Baeldung

Set Event Reminder

Subscribe to foojay updates:

Share with