Skip to content
← Back to Dashboard

AdaptCore™ Middleware

AdaptCore™ is Adaptensor's proprietary TPU middleware that enables dynamic PyTorch and JAX models to run efficiently on Google's Tensor Processing Units (TPUs).

The Problem: Static Shape Barrier

TPUs require static tensor shapes for optimal performance. The XLA compiler needs to know exact dimensions at compile time. But most real-world AI workloads are dynamic:

  • Variable-length text (tweets vs. legal documents)
  • Changing batch sizes (traffic spikes, streaming data)
  • Conditional execution (early stopping, branching logic)

When shapes change, XLA must recompile—causing latency spikes and wasted compute.

The Solution: AdaptCore™

AdaptCore inserts a translation layer between your model and the TPU:

┌─────────────────┐
│  Your Model     │  (Dynamic shapes, any framework)
│  PyTorch / JAX  │
└────────┬────────┘
┌─────────────────┐
│  AdaptCore™     │  ← Shape normalization
│  Middleware     │  ← Adapter injection
│                 │  ← Early exit logic
└────────┬────────┘
┌─────────────────┐
│  Google TPU     │  (Static shapes, XLA-compiled)
│  v2/v4/v5       │
└─────────────────┘

Three Pillars of AdaptCore™

1. Adaptive Bucketing & Padding

Instead of sending variable-length inputs directly to the TPU, AdaptCore:

  1. Inspects incoming batch shapes
  2. Assigns each batch to a bucket (128, 256, 512, 1024 tokens)
  3. Pads inputs to the bucket size
  4. Masks padded positions in attention and loss computation
# Example: Input of 347 tokens → Bucket 512
# XLA compiles once for 512, reuses for all 300-512 token inputs

Result: XLA compiles 3-4 variants instead of thousands. Throughput stays consistent.

2. Tensor Adapters (LoRA Hot-Swap)

Large models (7B+ parameters) are expensive to reload. AdaptCore keeps a frozen backbone in TPU memory and injects small adapter matrices:

output = (W @ x) + (A @ B @ x) * scale

Where:
- W = frozen backbone weight (7B params, never changes)
- A, B = adapter matrices (1-10M params per user/task)
- scale = normalization factor

Benefits:

Metric Without Adapters With AdaptCore™
Model switch time 30-60 seconds < 100ms
Memory per user 14GB+ ~50MB
Users per TPU 1 100+

3. Entropy-Based Early Exit

Not every query needs all model layers. AdaptCore measures prediction confidence after each layer:

def entropy(logits):
    probs = softmax(logits)
    return -sum(probs * log(probs))

# Low entropy = high confidence = exit early
# High entropy = uncertain = continue to next layer

XLA-Compatible Implementation:

# Fixed loop bounds + numerical masking
for i in range(MAX_LAYERS):
    x = layer[i](x)
    if not finished:
        confidence = 1.0 - entropy(x) / MAX_ENTROPY
        if confidence > threshold:
            finished = True
            final_output = x

This gives dynamic depth behavior while maintaining static graph structure for XLA.

Performance Impact:

Query Type Layers Used Speedup
"What is 2+2?" 3/32 10x
"Explain quantum entanglement" 28/32 1.1x
Average workload 12/32 2.7x

Technical Specifications

Specification Value
Supported TPUs v2-8, v4-8, v5p (pods)
Supported Frameworks JAX, PyTorch (via PyTorch/XLA)
Bucket Sizes 128, 256, 512, 1024, 2048
Adapter Rank 8-64 (configurable)
Early Exit Threshold 0.7-0.95 (configurable)
Compilation Overhead ~30s first request, <1ms subsequent

Using AdaptCore™

AdaptCore is automatically used when you interact with Adaptensor's APIs. You don't need to configure anything—just send your documents and queries.

For advanced users who want direct TPU access:

from adaptensor import AdaptensorClient

client = AdaptensorClient(api_key="sk_live_...")

# AdaptCore handles bucketing, adapters, and early exit automatically
results = client.query(
    index_name="my-documents",
    query="What are the key findings?",
    options={
        "early_exit_threshold": 0.85,  # Optional tuning
        "adapter": "legal-v2"          # Optional custom adapter
    }
)

Comparison: With vs Without AdaptCore™

Metric Raw TPU With AdaptCore™
Dynamic shape support ❌ Manual bucketing ✅ Automatic
Multi-tenant ❌ One user per TPU ✅ 100+ users
Framework support TensorFlow only PyTorch, JAX, TF
Compilation stalls Frequent Rare
Cost efficiency Baseline 2-3x better

Next Steps