Favorite Social Media Timesink
When you take a break from work, where are you going?
Video clips on TikTok/YouTube
X, Bluesky, Mastodon et al...
Web surfing
I do not get distracted by petty amusements
AI / Cloud Services / Large Language Models

Decoding Amazon’s Generative AI Strategy

Amazon unveiled its generative AI strategy at AWS re:Invent 2023. AWS now has everything it needs to train and deploy foundation models.
Dec 6th, 2023 5:00am by
Featued image for: Decoding Amazon’s Generative AI Strategy
Image via Amazon.

Amazon unveiled its comprehensive generative AI strategy at AWS re:Invent 2023. AWS now has everything it needs to train, adapt and deploy foundation models, from purpose-built chips to a specialized chatbot.

Before delving into the details of the announcements and how they align with the overall strategy of AWS, let’s take a look at the big picture of the generative AI stack:

AWS has invested in almost every layer of the above stack. Let us take a look at each of the layers of the stack that power AWS’s generative AI strategy.


Amazon has a wide range of compute offerings tailored to train and deploy generative AI foundation models.

Apart from Intel and AMD CPUs, Amazon has invested in home-grown ARM-based CPUs branded as Graviton. At re:Invent 2023, AWS announced Graviton4, the latest iteration of the CPU, which provides up to 30% better compute performance, 50% more cores, and 75% more memory bandwidth than the previous generation Graviton3 processor. Graviton4-based instances are expected to deliver up to 40% better price performance for compute-intensive applications compared to current-generation C6i instances.

AWS Trainium is a purpose-built chip that AWS uses to train AI models. Amazon announced Trainium2, the second generation of its Trainium chip, at AWS re:Invent 2023. This chip is designed for AI model training. Trainium2 is expected to outperform the first-generation Trainium by up to 4x in terms of performance and 2x in terms of energy efficiency. This chip will be available in Amazon EC2 Trn2 instances, which house 16 Trainium chips in a single instance.

These instances are designed to be used for large-scale model development and training. Customers of AWS, such as Anthropic, are using these chips to train LLMs.

If Trainium is used for model training, Inferentia is a chip designed for inference. Amazon announced the Inferentia2 chip, the second generation of its Inferentia chip, at AWS re:Invent 2023, which is designed for deep learning (DL) inference applications. Inferentia2 outperforms the first-generation Inferentia by up to 4x in throughput and 10x in latency. Inferentia2-powered Inf2 instances come in four sizes and have a combined compute power of 2.3 petaFLOPS at BF16 or FP16 data types. They also have an ultra-fast NeuronLink connection between chips, which lets big models be spread across many Inferentia2 chips without any communication problems. It also makes inference run faster.

NVIDIA and AWS announced an expansion of their strategic partnership to provide new supercomputing infrastructure, software and services for generative AI. AWS will be the first to bring the NVIDIA GH200 Grace Hopper Superchip to the cloud as part of this collaboration. AWS will also provide cloud infrastructure for NVIDIA’s Project Ceiba, which aims to build the world’s fastest AI supercomputer.

Foundation Models

Amazon Bedrock is the runtime that delivers various foundation models to AWS customers. At re:Invent 2023, AWS announced the addition of Anthropic’s Claude 2.1, Meta Llama 2 70B, Amazon Titan Image Generator, and Amazon Titan Multimodal Embeddings models. Bedrock already supports Cohere’s Command, AI21’s Jurrasic, and Stable Diffusion XL.

Amazon Bedrock offers a choice of first-party foundation models from Amazon under the Titan family, open source models such as Meta’s Llama 2, and commercial models including Claude 2 and Jurassic-2.

Vector Storage

Vector databases are essential for managing and querying high-dimensional data for machine learning applications such as generative AI and large language models (LLMs). They store the vector embeddings generated by LLMs, and at runtime, the user input is matched to a stored embedding in the database using a nearest neighbor search algorithm.

At AWS re:Invent 2023, Amazon announced the addition of vector search and vector embedding capabilities to more of its database services. This includes Amazon MemoryDB for Redis, Amazon DocumentDB, and Amazon DynamoDB.

AWS also announced the general availability of the previously announced vector engine for Amazon OpenSearch Serverless. AWS also introduced Neptune Analytics, a new service that combines the power of vector search with the ability to analyze massive amounts of graph data in seconds. Customers can use vector search to find key insights in existing Neptune graph data or data lakes on top of S3 storage with this service.


The retrieval service enables developers to bring disparate data sources into a single context to build retrieval augmented generation (RAG) pipelines. They provide additional context to the LLMs to reduce hallucinations and increase the precision of the responses.

Amazon Bedrock Knowledgebase is a fully managed service that connects foundation models to data sources for RAG, enhancing the models’ capabilities and making them more knowledgeable about specific domains and organizations. A RAG workflow is made possible by the knowledge base, which combines private data with Large Language Models (LLMs) to create contextual apps. To create a knowledge base, a data source such as Amazon S3 is specified for data ingestion, an embeddings foundation model such as Amazon Titan Embeddings is used to convert the data to vector format, and a destination vector database such as Amazon OpenSearch Serverless, Pinecone, or Redis Enterprise Cloud is chosen to store the vector data.


To deliver an accurate result, the orchestration component integrates the RAG context, external data, and LLMs. It may recursively involve the LLM in identifying the appropriate APIs and tools to use in order to provide real-time and factually correct data.

Amazon Bedrock agents use knowledge bases to identify appropriate data sources, retrieve relevant information based on user input, and provide more accurate responses. The knowledge base can be managed through the Amazon Bedrock console. The service was announced in preview in September 2023 and made generally available in November 2023.

AI Platform

The AI platform enables ML researchers and developers to manage the lifecycle of foundation models. It exposes the APIs, tools and environment to evaluate, test, fine-tune and deploy models.

Amazon Bedrock and Amazon SageMaker Studio Canvas deliver the required capabilities to manage the foundation models. Bedrock provides serverless APIs to perform fine-tuning and inference of the models, while SageMaker Studio acts as a low-code or no-code tool to customize the models. Foundation models that are not available within the Bedrock runtime can be accessed via SageMaker Studio, which provides additional features for developers and researchers with varied experience. The integration of Amazon Bedrock with AWS Step Functions make it easy to build generative AI applications without writing code.

Amazon announced several new features for SageMaker Studio, its integrated development environment (IDE) for machine learning, at AWS re:Invent 2023. SageMaker Studio now includes an IDE suite, including a Code Editor based on Code-OSS. Open Source Visual Studio Code, improved and faster JupyterLab, and RStudio allow ML practitioners to select their preferred IDE to accelerate ML development. Furthermore, SageMaker Studio includes an improved JumpStart experience that makes it easier to discover, import, fine-tune and deploy foundation models with just a few clicks.

AI Applications

This layer consists of AI assistants from the platform provider as well as custom applications developed and deployed by developers.

AWS announced two AI assistants at re:Invent 2023: Amazon Q for Builders and Amazon Q for Business.

Amazon Q for Builders is an AI-powered assistant designed to aid developers and IT professionals in their work. It is based on over 17 years of AWS knowledge and best practices, and provides assistance at every stage of application development — from researching best practices to resolving errors and coding new features. Amazon Q can answer questions about the software development process, explain program logic in natural language, identify and fix bugs, and even implement complete features with test cases. It also has features like code transformation and troubleshooting console errors. Amazon Q is available across AWS, including the AWS Management Console, documentation, website, IDEs with Amazon CodeWhisperer, team chat apps like Slack or Microsoft Teams with AWS Chatbot, Amazon CodeCatalyst, and (soon) the AWS Console Mobile Application.

Amazon Q is a generative AI-powered assistant that can be tailored to a customer’s specific needs. It provides employees with timely, relevant information and advice to help them streamline tasks, speed up decision-making and problem-solving, and spark creativity and innovation at work. Based on data and information available in customers’ systems, Amazon Q can answer questions, provide summaries, generate content, and complete tasks. S3, Salesforce, Google Drive, Microsoft 365, ServiceNow, Gmail, Slack, Atlassian and Zendesk are just a few of the popular enterprise applications and document repositories it can connect to. Amazon Q honors existing access control based on user permissions and provides responses with references and citations for easy fact-checking.


At AWS re:Invent 2023, Amazon announced major advancements across its AI and compute offerings. For compute, AWS introduced the latest Graviton CPUs, Trainium chips for training, and Inferentia chips for inference, which offer significant performance improvements over previous generations. Amazon also expanded its collaboration with NVIDIA, bringing new GPUs to the cloud.

On the foundation model front, Amazon Bedrock added models like Anthropic’s Claude 2.1 and Meta’s Llama 2 70B. For vector storage, Amazon enabled new vector capabilities in databases like DynamoDB and OpenSearch to efficiently manage machine learning data. The company also launched services to enable retrieval augmented generation (RAG), which provides additional context to large language models to increase precision. Amazon announced new orchestration features in the form of Bedrock agents to integrate external data sources with models to deliver accurate results. The SageMaker platform provides tools to manage the foundation model lifecycle, including fine-tuning and deployment.

Finally, Amazon launched AI assistants like Amazon Q for Builders and Amazon Q for Business to aid developers and employees with timely, relevant information to enhance productivity and innovation.

Amazon’s generative AI stack is comprehensive and complete, with support for cutting-edge commercial and open models. The combination of Bedrock and SageMaker Studio provides developers and ML engineers with the appropriate level of choice based on their skills.

I will publish a detailed review and analysis of various generative AI services announced by AWS at re:Invent. Stay tuned.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.