AdaptCore™ Middleware¶

AdaptCore™ is Adaptensor's proprietary TPU middleware that enables dynamic PyTorch and JAX models to run efficiently on Google's Tensor Processing Units (TPUs).

The Problem: Static Shape Barrier¶

TPUs require static tensor shapes for optimal performance. The XLA compiler needs to know exact dimensions at compile time. But most real-world AI workloads are dynamic:

Variable-length text (tweets vs. legal documents)
Changing batch sizes (traffic spikes, streaming data)
Conditional execution (early stopping, branching logic)

When shapes change, XLA must recompile—causing latency spikes and wasted compute.

The Solution: AdaptCore™¶

AdaptCore inserts a translation layer between your model and the TPU:

┌─────────────────┐
│  Your Model     │  (Dynamic shapes, any framework)
│  PyTorch / JAX  │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  AdaptCore™     │  ← Shape normalization
│  Middleware     │  ← Adapter injection
│                 │  ← Early exit logic
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Google TPU     │  (Static shapes, XLA-compiled)
│  v2/v4/v5       │
└─────────────────┘

Three Pillars of AdaptCore™¶

1. Adaptive Bucketing & Padding¶

Instead of sending variable-length inputs directly to the TPU, AdaptCore:

Inspects incoming batch shapes
Assigns each batch to a bucket (128, 256, 512, 1024 tokens)
Pads inputs to the bucket size
Masks padded positions in attention and loss computation

# Example: Input of 347 tokens → Bucket 512
# XLA compiles once for 512, reuses for all 300-512 token inputs

Result: XLA compiles 3-4 variants instead of thousands. Throughput stays consistent.

2. Tensor Adapters (LoRA Hot-Swap)¶

Large models (7B+ parameters) are expensive to reload. AdaptCore keeps a frozen backbone in TPU memory and injects small adapter matrices:

output = (W @ x) + (A @ B @ x) * scale

Where:
- W = frozen backbone weight (7B params, never changes)
- A, B = adapter matrices (1-10M params per user/task)
- scale = normalization factor

Benefits:

Metric	Without Adapters	With AdaptCore™
Model switch time	30-60 seconds	< 100ms
Memory per user	14GB+	~50MB
Users per TPU	1	100+

3. Entropy-Based Early Exit¶

Not every query needs all model layers. AdaptCore measures prediction confidence after each layer:

def entropy(logits):
    probs = softmax(logits)
    return -sum(probs * log(probs))

# Low entropy = high confidence = exit early
# High entropy = uncertain = continue to next layer

XLA-Compatible Implementation:

# Fixed loop bounds + numerical masking
for i in range(MAX_LAYERS):
    x = layer[i](x)
    if not finished:
        confidence = 1.0 - entropy(x) / MAX_ENTROPY
        if confidence > threshold:
            finished = True
            final_output = x

This gives dynamic depth behavior while maintaining static graph structure for XLA.

Performance Impact:

Query Type	Layers Used	Speedup
"What is 2+2?"	3/32	10x
"Explain quantum entanglement"	28/32	1.1x
Average workload	12/32	2.7x

Technical Specifications¶

Specification	Value
Supported TPUs	v2-8, v4-8, v5p (pods)
Supported Frameworks	JAX, PyTorch (via PyTorch/XLA)
Bucket Sizes	128, 256, 512, 1024, 2048
Adapter Rank	8-64 (configurable)
Early Exit Threshold	0.7-0.95 (configurable)
Compilation Overhead	~30s first request, <1ms subsequent

Using AdaptCore™¶

AdaptCore is automatically used when you interact with Adaptensor's APIs. You don't need to configure anything—just send your documents and queries.

For advanced users who want direct TPU access:

from adaptensor import AdaptensorClient

client = AdaptensorClient(api_key="sk_live_...")

# AdaptCore handles bucketing, adapters, and early exit automatically
results = client.query(
    index_name="my-documents",
    query="What are the key findings?",
    options={
        "early_exit_threshold": 0.85,  # Optional tuning
        "adapter": "legal-v2"          # Optional custom adapter
    }
)

Comparison: With vs Without AdaptCore™¶

Metric	Raw TPU	With AdaptCore™
Dynamic shape support	❌ Manual bucketing	✅ Automatic
Multi-tenant	❌ One user per TPU	✅ 100+ users
Framework support	TensorFlow only	PyTorch, JAX, TF
Compilation stalls	Frequent	Rare
Cost efficiency	Baseline	2-3x better

Next Steps¶

AdaptHex™ Compression - How we reduce storage costs
AdaptLLM™ Inference - TPU-accelerated LLM processing
Architecture Overview - Full system design