TNS
VOXPOP
Will JavaScript type annotations kill TypeScript?
The creators of Svelte and Turbo 8 both dropped TS recently saying that "it's not worth it".
Yes: If JavaScript gets type annotations then there's no reason for TypeScript to exist.
0%
No: TypeScript remains the best language for structuring large enterprise applications.
0%
TBD: The existing user base and its corpensource owner means that TypeScript isn’t likely to reach EOL without a putting up a fight.
0%
I hope they both die. I mean, if you really need strong types in the browser then you could leverage WASM and use a real programming language.
0%
I don’t know and I don’t care.
0%
AI / Open Source

How Important Is Open Source to AI Adoption?

Open source projects are critical to developers of AI and ML projects; we examine usage statistics from a number of different studies.
Aug 23rd, 2023 7:36am by
Featued image for: How Important Is Open Source to AI Adoption?
Image via Unsplash.

How important is open source to the future of AI and LLMs? The answer depends on how you define open source in the age of AI.

Open source is viewed by 40% as “the solution” to concerns about AI ownership, while only 15% disagree with this assessment, according to the 224 UK residents surveyed for State of Open: The UK in 2023, Phase Two. In this regard, many respondents are talking about who should own the large datasets being generated by large language models (LLMs).

Indeed, Predibase’s just published “Beyond the Buzz: A Look at Large Language Models in Production” found there is a reluctance to rely on commercial LLMs in production. Based on a survey of 150 people conducted from May through July 2023, 13% of respondents say their enterprise has at least one LLM in production. Another 44% said their organization has so far only used LLM for experimentation purposes.

Among the whopping 85% of survey respondents who are using or planning to use LLMs, only 27% actually expect a commercial version to be used in production. Almost half (47%) of those with no plans to use a commercial LLM cited a desire not to share proprietary information with vendors. In comparison, only 17% said the reason is because commercial LLMs are too expensive to scale.

Stagnating Growth in Open Source AI

Despite all the noise surrounding the subject, the growth in new traditional AI projects continues to slow down. According to the OECD AI Policy Observatory, GitHub projects associated with AI grew 6% to 348,934 from 2020 to 2022. In comparison, from 2016 to 2018, the number of projects grew 203% to 194,268. Furthermore, the number of contributions to these projects actually peaked in 2020 and dropped 7% by 2022.

With the boom in LLMs and applications that take advantage of them, it is likely that the number of AI projects is being under-counted because AI-related concepts identified have changed over time. Indeed, Ashley Wolf, Open Source Program Office Director at GitHub, told The New Stack that “it is possible some projects may have transitioned to using new terminology that isn’t currently reflected. Additionally, there might be a trend where people are focusing on highly successful projects, resulting in less churn. Both are worth investigating.”

Contributions to Open Source AI

Open source projects are obviously critical to developers of artificial intelligence and machine learning projects. Eighty-nine percent (89%) of all developers involved with AI/ML have contributed to an AI project, according to the AI & Machine Learning Survey Report, which was published by Evans Data in Q2 2023. That statistic is comparable to the results published by SlashData in Q3 2022, in which 73% of all developers contribute to a “vendor-owned” open source community.

Both these stats likely vastly overstate the act of using an open source project without actually contributing to one, according to the JetBrains State of Developer Ecosystem 2022. That study found that of 424 developers involved with machine learning activities, only 54% have contributed — with almost half (45%) of these contributors having only contributed a few times in their careers. It is likely that some people who claim to be contributors in the other studies are users, not contributors to projects.

There continues to be confusion about what exactly makes a community vendor-dominated. Using something called an Elephant Factor, vendor control can be defined based on how many companies account for at least half of all contributions. A project can also be considered to be vendor-0wned if it is located in a repository controlled by a corporate organization (i.e., anything hosted in https://github.com/openai). Another approach is to look at the governing structure of a project.

Per Evans Data, Python and Apache communities associated with AI are most likely to receive contributions from AI/ML developers. PyTorch supposedly receives contributions from 35% of AI/ML developers, with TensorFlow close behind at 34%. While the Meta donated PyTorch is now controlled by an independent foundation, Google still manages TensorFlow's progress. Even excluding these types of communities, 22% of AI/ML developers contribute to vendor-run communities.

Adoption of AI Frameworks

When looking at usage instead of project involvement, PyTorch comes out ahead versus TensorFlow. Based on 1,510 data science and ML specialists surveyed in Stack Overflow's 2023 Developer Survey, 54% use a variant of the PyTorch project, while 48% use TensorFlow. While not exactly comparable, 71% use Scikit-Learn, which is a Python library for machine learning that utilizes several other popular Python projects. The use of these frameworks is not widespread among all professional developers, however, with adoption rates between 9% and 10%.

25% of all AI/ML developers in the Stack Overflow survey have used Hugging Face Transformers (founded in 2019) extensively in the last year. Used by 21% of AI/ML developers, Nvidia's CUDA is the only non-open source framework on this list. CUDA allows software to use certain types of graphics processing units (GPUs).

GPUs and the Edge

According to the aforementioned Evans Data report, 57% of AI/ML developers prefer using GPUs “that are dedicated to my individual development work” as compared to 42% who would rather share GPUs across multiple devs or workloads. The preference for using dedicated GPUs may increase costs. However, this may partly be due to developers focusing on their own projects.

CUDA's prominence is one reason why Nvidia is expected to benefit from the increased demand for computing power being generated by the proliferation of large language models (LLMs). Overall, 55% of AI/ML developers are running inference models at the edge, but these are most often (73% of the time) being deployed to run on PCs. As developers scale up their use of these models, they are expected to rely more heavily on specialized chips, which may often be installed by data center operators or cloud providers.

More from the Predibase Study

  • Giving up access to proprietary data was cited by 33% of respondents as the top challenge preventing them from using LLMs in production. Customization and fine-tuning was the second most inhibitor to LLM use, cited by 30%.
  • Digging into the challenge of fine-tuning a LLM, only 22% of the study have had success doing so. The top reason for not doing fine-tuning is they don't have the requisite knowledge to handle this complex task. In response to a different question, 45% said they do not have the data needed to fine-tune a LLM.
What is your top challenge preventing you from using LLMs in production? 33% say access to proprietary data and 30% say customization and fine-tuning.

Source: Predibase's "Beyond the Buzz: A Look at Large Language Models in Production"

Have you successfully fine-tuned an LLM? Do you have the requisite data for fine-tuning LLMs?

Source: Predibase's "Beyond the Buzz: A Look at Large Language Models in Production"

Group Created with Sketch.
TNS owner Insight Partners is an investor in: The New Stack.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.