AI Infrastructure

cloud-providermlopsenterprise

Amazon Web Services AI

by Amazon

Amazon Web Services is the world's largest cloud provider and offers the most comprehensive set of AI and machine learning services, including Amazon Bedrock for managed foundation model APIs, SageMaker for MLOps, Rekognition for computer vision, and Alexa for voice AI. AWS Bedrock gives enterprises access to models from Anthropic, Meta, Mistral, Cohere, and others through a unified API.

75.3B+

cloud-providerenterprisemanaged-ai

Microsoft Azure AI

by Microsoft

Microsoft Azure AI is the AI services division of Microsoft's cloud platform, uniquely positioned as the exclusive cloud partner of OpenAI. Through Azure OpenAI Service, enterprises access GPT-4, DALL-E, and Whisper with enterprise-grade compliance and data residency guarantees. Microsoft has deeply integrated AI across its product suite including Copilot for Microsoft 365, GitHub Copilot, and Azure AI Foundry.

73.9B+

Pinecone + OpenAI Embeddings

by Pinecone

Direct integration pairing Pinecone's managed vector database with OpenAI's text-embedding-3 models. Commonly used pattern for production RAG systems where OpenAI generates dense vectors and Pinecone handles ANN retrieval at scale. Supports serverless and pod-based indexes with metadata filtering.

pineconeopenaiembeddings

73.2B+

NVIDIA RTX 4090

by NVIDIA

NVIDIA's flagship consumer GPU based on Ada Lovelace. Has become popular for local LLM inference and fine-tuning due to its 24GB GDDR6X memory and high performance-per-dollar ratio, enabling on-premise AI workloads without data center costs.

gpuconsumerworkstation

72.6B+

vLLM + NVIDIA

by vLLM Project

vLLM's NVIDIA backend leverages CUDA kernels, FlashAttention-2, and PagedAttention to deliver state-of-the-art throughput for LLM inference on NVIDIA A100, H100, and H200 GPUs. The integration supports tensor and pipeline parallelism across multiple GPUs, FP8/FP16/BF16 quantization, and CUDA graph capture for minimal per-token latency.

inferencenvidiagpu

72.1B+

OpenAI + Azure OpenAI Service

by Microsoft Azure

Microsoft Azure's managed deployment of OpenAI models including GPT-4o, o1, and DALL-E 3 with enterprise SLAs, private networking, and regional data residency. Provides the same OpenAI API surface with additional Azure IAM, VNet integration, content filtering, and Azure Monitor observability.

openaiazureenterprise-ai

71.5B+

cloud-providerenterprisemanaged-ai

Google Cloud AI

by Google

Google Cloud AI provides enterprise access to Google DeepMind's Gemini models and a comprehensive suite of managed AI services via Vertex AI. As the creator of the Transformer architecture and TensorFlow, Google Cloud offers unmatched AI infrastructure including custom TPUs, a full MLOps platform, and pre-built APIs for vision, speech, and natural language processing.

71.4B+

vector-databaseinfrastructurerag

Pinecone Systems

by Pinecone

Pinecone is the leading managed vector database, purpose-built for AI applications requiring similarity search at scale. It powers retrieval-augmented generation, semantic search, and recommendation systems for thousands of enterprises. Pinecone's serverless architecture eliminates infrastructure management while delivering sub-millisecond query performance.

69.2B

Anthropic + AWS Bedrock

by Amazon Web Services

Anthropic's Claude model family available through Amazon Bedrock's fully managed foundation model service. Provides serverless inference with pay-per-token pricing, AWS IAM authentication, VPC endpoint support, and model evaluation tools. Claude 3.5 Sonnet, Haiku, and Opus are all available through the Bedrock API.

anthropicawsbedrock

68.2B

inferencehuggingfacetext-generation

TGI + Hugging Face Hub

by Hugging Face

Text Generation Inference (TGI) by Hugging Face is a production-grade inference server that directly loads models from the Hugging Face Hub via model IDs, handling shard downloading, quantization, and OpenAI-compatible endpoint serving in a single Docker command. It implements continuous batching, speculative decoding, and FlashAttention for optimal throughput on Ampere and Hopper GPUs.

68B

ai-supercomputerlarge-scale-trainingenterprise-ai

NVIDIA DGX H100

by NVIDIA

The NVIDIA DGX H100 is a purpose-built AI supercomputer, serving as the foundational building block for large-scale AI infrastructure. It integrates eight H100 Tensor Core GPUs with high-speed NVLink interconnects, providing a turnkey solution for the most demanding AI training, inference, and data analytics workloads.

67.9B

gpuai-acceleratordata-center

NVIDIA B100

by NVIDIA

The NVIDIA B100 is a data center GPU based on the Blackwell architecture, succeeding the H100. It offers substantial performance improvements for AI training and inference, featuring a second-generation Transformer Engine with FP4 precision, and a fifth-generation NVLink interconnect for massive multi-GPU scaling.

65.8B

edge-aiembedded-systemsrobotics-platform

NVIDIA Jetson AGX Orin

by NVIDIA

The NVIDIA Jetson AGX Orin is a high-performance System-on-Module (SoM) designed for edge AI and autonomous machines. It delivers up to 275 TOPS of AI performance, integrating an NVIDIA Ampere architecture GPU with Arm CPUs and deep learning accelerators for server-class computing in a power-efficient package.

65.5B

incident-managementon-callaiops

PagerDuty AI

by PagerDuty

PagerDuty AI is an AIOps agent for incident management that automates triage and response. It intelligently groups related alerts to reduce noise, correlates events to identify root causes, and suggests or executes automated remediation runbooks. This helps teams minimize downtime and streamline their on-call processes.

65.1B

NVIDIA A10G

by NVIDIA

NVIDIA Ampere GPU optimized for graphics and inference workloads. Commonly deployed in AWS G5 instances, offering a cost-effective option for inference, graphics rendering, and video processing at cloud scale.

gpudata-centerinference

63.9B

NVIDIA V100

by NVIDIA

NVIDIA Volta architecture GPU that introduced Tensor Cores to the data center, providing the first dedicated matrix multiply hardware for AI. Powered the first wave of transformer model training including BERT and GPT-2, and became the dominant AI training platform from 2017–2020.

gpudata-centertraining

63.6B

LoRA Library

by Hugging Face

The LoRA Library, integrated within Hugging Face's PEFT (Parameter-Efficient Fine-Tuning) package, provides tools to create, share, and use LoRA adapters. It allows for the efficient customization of large pre-trained models by training only a small number of new weights, drastically reducing computational costs and storage requirements compared to full fine-tuning.

loraadaptersmodel-hub

63.1B

llm-inferencemodel-servinghugging-face

TGI

by Hugging Face

A production-ready inference server for large language models, developed by Hugging Face in Rust. It enables high-performance LLM serving through features like tensor parallelism, continuous batching, and quantization, making it ideal for deploying demanding models at scale with low latency.

63B

ai-assistantdevsecopscode-generation

GitLab Duo Agent

by GitLab

GitLab Duo is an AI-powered assistant integrated into the GitLab DevSecOps platform. It enhances developer productivity across the software development lifecycle by offering code suggestions, summarizing issues, explaining vulnerabilities, and generating tests, all within the native GitLab environment.

62.6B

Model Fine-Tuning (LoRA)

by AaaS

This script automates the process of fine-tuning large language models using Low-Rank Adaptation (LoRA). It provides an end-to-end workflow, from preparing custom datasets to training lightweight adapters and merging them into a base model for efficient deployment. This enables domain-specific model specialization with significantly reduced computational costs.

fine-tuningloratraining

62.6B

Groq

by Groq

Groq is a semiconductor company that developed the Language Processing Unit (LPU), a custom chip for ultra-fast AI inference. Their managed API provides some of the fastest publicly available LLM inference speeds, often exceeding 800 tokens/second, making it ideal for latency-sensitive applications.

inferencehardwarelpu

62.3B

data-qualitygreat-expectationsvalidation

Data Quality Checker

by Great Expectations

Automates data quality testing for tabular data using the Great Expectations library. This script profiles datasets to generate and validate 'Expectations' covering schema, statistical properties, and referential integrity. It produces a comprehensive HTML report (Data Docs) and can be integrated into CI/CD pipelines as a quality gate to prevent bad data from entering production systems.

62B

Streaming Responses

by AaaS

This skill involves implementing real-time, token-by-token data delivery from Large Language Models to end-users. It utilizes protocols like Server-Sent Events (SSE) or WebSockets to create interactive and responsive applications, such as chatbots or code assistants, by progressively displaying content as it's generated.

streamingssewebsockets

61.9B

pii-redactiondata-maskingdata-anonymization

PII Redaction Pipeline

by Microsoft

An automated pipeline that leverages Microsoft Presidio to identify and remove personally identifiable information (PII) from text and structured data. It supports configurable entity recognizers for GDPR and HIPAA compliance and features a reversible pseudonymization capability with a secure vault for authorized re-identification.

61.7B

Basic RAG Pipeline

by AaaS

This script provides a foundational Retrieval-Augmented Generation (RAG) pipeline. It handles core tasks like loading documents, splitting text into chunks, generating embeddings, and indexing them into a vector store. It includes a basic query interface, making it ideal for learning the RAG workflow and prototyping simple applications.

scriptragpipeline

61.5B

vector-databaseopen-sourceinfrastructure

Weaviate

by Weaviate

Weaviate is an open-source vector database designed for AI-native applications. It enables flexible hybrid search, combining vector and keyword methods, and uniquely supports multi-modal data like text, images, and audio. Weaviate offers both self-hosting for maximum control and a managed cloud service for ease of use.

61.2B

context-windowoptimizationtoken-management

Context Window Optimization

by AaaS

A set of techniques for managing the limited memory (context window) of Large Language Models. It involves strategically structuring prompts, summarizing or pruning conversation history, and selectively including relevant information to ensure efficient, cost-effective, and coherent long-form interactions with an AI.

59.6C+

model-servinginference-servernvidia

Triton Inference Server

by NVIDIA

Triton is an open-source inference server from NVIDIA designed for high-performance, production-ready AI. It supports deploying models from virtually any framework, such as TensorFlow, PyTorch, and ONNX, on both GPUs and CPUs. Key features include dynamic batching, concurrent model execution, and model ensembling to maximize throughput and resource utilization.

59.1C+

devopscicddeployment-automation

Harness AI

by Harness

Harness AI is an intelligent software delivery agent that automates CI/CD pipelines. It leverages machine learning to verify deployments, detect anomalies in real-time, and automate rollback decisions to ensure service health. This helps reduce mean time to recovery (MTTR) and optimize pipeline execution in complex environments.

59.1C+

Serverless Model Deploy

by Community

Packages a trained ML model into a serverless function on AWS Lambda, Modal, or Google Cloud Run, handling cold-start optimization, dependency layering, and auto-scaling configuration. Includes health-check endpoints, structured logging, and a GitHub Actions workflow for automated rollout.

serverlesslambdamodal

59C+

edge-deploymentonnxquantization

Edge Model Optimization

by Community

Optimizes PyTorch and TensorFlow models for edge hardware by applying INT8/FP16 quantization and converting them to ONNX or TFLite formats. This script provides platform-specific tuning for ARM and NPU targets, benchmarking latency and memory usage while generating a report on accuracy trade-offs.

58.7C+

kubernetesdevopsdiagnostics

K8sGPT

by K8sGPT

AI-powered Kubernetes diagnostics agent that scans clusters for issues and provides plain-language explanations with remediation suggestions. Integrates with multiple LLM backends to analyze pod failures, misconfigurations, and performance bottlenecks in real time.

57.9C+

Kubeflow

by Google

Open-source ML platform for Kubernetes providing end-to-end ML workflow orchestration. Includes pipeline authoring, distributed training, hyperparameter tuning, and model serving on Kubernetes clusters.

mlopskubernetespipelines

57.05C+

servingdeploymentinference

Model Serving

by AaaS

Deploys and serves language models in production environments with high availability and low latency. Covers framework selection (vLLM, TGI, Triton), batching strategies, GPU memory management, and auto-scaling configurations for different workload profiles.

56.1C+

data-qualityvalidationtesting

Great Expectations

by Superconductive

Open-source data quality platform for validating, profiling, and documenting data pipelines. Provides expectation-based testing for data quality with automated documentation and alerting capabilities.

55.3C+

version-controldata-versioningml-pipelines

DVC

by Iterative

Open-source version control system for machine learning projects with Git-like data management. Provides data versioning, experiment tracking, and ML pipeline management alongside your existing Git workflow.

54.9C+

quantizationoptimizationcompression

Model Quantization

by AaaS

Reduces model size and inference cost by converting weights from higher to lower precision (FP16 to INT8/INT4). Covers GPTQ, AWQ, GGUF, and bitsandbytes quantization methods with quality-preservation techniques that minimize accuracy degradation.

54.7C+

Batch Inference

by AaaS

Processes large volumes of LLM inference requests efficiently through batched execution. Implements request queuing, dynamic batching, rate limit management, and result aggregation for high-throughput offline processing workloads.

batchinferencethroughput

53.1C+