Will real-time data processing replace batch processing?
At Confluent's user conference, Kafka co-creator Jay Kreps argued that stream processing would eventually supplant traditional methods of batch processing altogether.
Absolutely: Businesses operate in real-time and are looking to move their IT systems to real-time capabilities.
Eventually: Enterprises will adopt technology slowly, so batch processing will be around for several more years.
No way: Stream processing is a niche, and there will always be cases where batch processing is the only option.
AI / Hardware / Large Language Models

Free GPUs and AI Chips Are Available to Run AI

Users can fire up a Jupyter notebook, load the models, pull down code from GitHub repositories, power up the runtime, and let GPUs from a number of cloud providers do the heavy lifting to produce the output, all for free.
Sep 8th, 2023 6:00am by
Featued image for: Free GPUs and AI Chips Are Available to Run AI
Feature Image by Erika Wittlieb from Pixabay.

Free is great, especially for developers looking to run AI models on GPUs just hanging out in data centers waiting to be exploited at zero expense.

The free GPUs available in the cloud today are aging GPUs on their last legs, from Google and other cloud providers. The cloud providers have faster GPUs but are trying to prevent rot on the older GPUs by donating time on these chips to AI enthusiasts and researchers with the technical chops to run Python scripts.

Users can fire up a Jupyter Notebook, load the models, pull down code from GitHub repositories, power up the runtime, and let GPUs do the heavy lifting to produce the output.

Unfortunately, running tweaked AI models is not as easy as just firing your laptop and double-clicking an icon. It may get there at some point, but until then, it still needs command-line tech savviness.

It is different from universal chatbot tools provided by OpenAI or Google, which include a user interface to make AI accessible to the masses.

Friendly user interfaces do not exist for open source models like Llama 2, which was recently released by Meta, though there are exceptions like Huggingface Chat, which runs on Llama 2.

Llama 2 is like AI raw material that developers can take and customize to their own requirements, and in most cases that will require a GPU available on cloud services, or graphics cards on a local PC.

Google Cloud is one of the few places on the Internet where you can find free GPUs and TPUs. The Google Colab website, which is primarily for researchers, has a free tier on its Jupyter Notebook where developers can choose one GPU — the T4 — on which to run inferencing.

T4 is one of the earliest Nvidia chips optimized for artificial intelligence computing, but it is slow. Google previously provided the V100, which is an upgrade over the T4, under the free tier. But the V100 is not free anymore and is now offered under the paid tier, which starts at $9.99 for 100 uses a month. Colab also offers paid access to Nvidia’s A100 GPU, which is faster and was used to train OpenAI’s and 4.0, and Google’s PaLM and PaLM 2.

Google Colab’s free tier involves putting up a script in the Colab notebook, which pulls the model and code from GitHub and other websites and is tuned to run on a GPU. Users can select the Nvidia T4 GPU in the notebook settings and run the script. The task is placed in a queue until a physical GPU becomes available in the Google Cloud.

The Colab creates a virtual desktop on Google to run the inferencing. Users need to make sure plenty of space is available on their Google Drive to store temporary code as the model executes. If the model is large, users will need to buy extra Google Drive storage.

Google Colab’s free tier utilizes older hardware when available. It also provides an option to run inferencing on CPUs, which will be much slower, or Google’s own TPUs. Google’s TPUs can be powerful, but the code needs to be specifically tuned to exploit TPU acceleration. TPUs are ASICs (application-specific integrated circuits) with fixed functionality.

By comparison, GPUs and CPUs can take on generic code, but the GPU provides faster results. Nonetheless, the Nvidia T4 GPU will take a long time to run, so feel free to go out, and get a meal or a pint, especially if AI compute requests are large in scope.

Google offers many AI models in its Vertex AI offering, where developers do not need to worry about the hardware. Users can build their own or run models already available in Google Cloud, or run their own prompts and get responses. Vertex is a one-stop-shop for AI that automatically assigns the hardware in the infrastructure, and users do not have to worry about runtimes or coding specifically to hardware.

Nvidia’s stranglehold on AI has left many AI chip companies in the dust, and developers are emerging as winners in this battle. Graphcore has opened its AI chips for developers to try models that include the most recent Llama-2 model with 7 billion parameters.

Developers can fire up Jupyter notebooks, load up the model from Hugging Face and execute it on Graphcore’s AI chips. However, developers need to have the technical knowledge to run code. Graphcore is providing access to its chips in the cloud for free to prove the real-world functionality of the latest large-language models. The chips also run text-to-image models that include OpenAI’s Dolly 2.0.

Papersource hosts Graphcore’s AI chips and provides access to free GPUs. The free GPU option is only the Quadro M4000, a workstation graphics chip that is eight years old and was not designed for AI. But beggars can’t be choosers — it is a GPU, it is free, and it is better than a CPU.

Cerebras, another AI chipmaker, is offering free access to its AI chips in its data centers. The company’s WSE-2, which is the size of a wafer and the largest chip in the world, is exclusively for training and is expensive to make, so accessing the chip involves jumping through a few hoops.

Cerebras has programs for developers, graduate students, and faculty to access its GPUs, and those with interest can get in touch with the company.

“We are constantly putting models in the open-source community for them to use. Currently, the top performing 3B parameter model BTLM-3B, with more than 1 million downloads on HuggingFace, was developed by Cerebras,” a company spokeswoman said. Cerebras’ servers are programmed via PyTorch, an open industry standard language.

The cheapest form of AI remains running code locally on desktops with powerful GPUs or CPUs. It involves using tools like Oogabooga, which is an automated tool that can create chatbots on local PCs similar to ChatGPT.

Users can load the Oogabooga tool to download and load up chatbots based on existing open models such as Llama 2 or MPT. The process can be complex for non-tech users as it involves installing Python, Nvidia’s developer tools such as CuDNN, and downloading tuned models to run locally.

The tool can run chatbots on CPUs, but it is best to have a gaming PC or laptop with Nvidia’s recent RTX 3090 or 4090 GPUs, which cost thousands of dollars. This doesn’t add up to running AI models for free, but a great alternative if you have a graphics card in a PC.

A tool called AUTOMATIC1111 is a similar tool to load and run text-to-image models locally on PCs.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.