This document provides a comprehensive overview of the Adaptensor platform architecture, covering data flow, security boundaries, and scaling strategies.
System Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ ADAPTENSOR PLATFORM │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Web UI │ │ Python SDK │ │ REST API │ │ Webhooks │ │
│ │ (React) │ │ (PyPI) │ │ (Direct) │ │ (Callbacks) │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │ │
│ └──────────────────┼──────────────────┼──────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ API Gateway (Cloud Run) │ │
│ │ • Authentication (Firebase) │ │
│ │ • Rate Limiting │ │
│ │ • Request Routing │ │
│ └──────────────────┬──────────────────┘ │
│ │ │
│ ┌─────────────────────────────┼─────────────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Document │ │ Search │ │ Billing │ │
│ │ Service │ │ Service │ │ Service │ │
│ │ │ │ │ │ │ │
│ │ • Upload │ │ • Query │ │ • Credits │ │
│ │ • Index │ │ • Retrieve │ │ • Metering │ │
│ │ • Chunk │ │ • Rank │ │ • Stripe │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────────────────┼────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ AdaptCore™ + AdaptLLM™ │ │
│ │ TPU Middleware & Inference Engine │ │
│ └──────────────────┬──────────────────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ TPU v2-8 │ │ TPU v2-8 │ │ TPU v4-8 │ │
│ │ (Shared) │ │ (Shared) │ │ (On-Demand)│ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Data Storage Architecture
Per-User Isolation
Every user's data is completely isolated:
Google Cloud Storage
├── gs://adaptensor-uploads/
│ ├── {user_id_1}/
│ │ ├── document_abc.pdf
│ │ ├── document_def.docx
│ │ └── ...
│ ├── {user_id_2}/
│ │ └── ...
│ └── {user_id_N}/
│
├── gs://adaptensor-indexes/
│ ├── {user_id_1}/
│ │ ├── index_aviation/
│ │ │ ├── chunks.json
│ │ │ └── vectors.adapthex
│ │ └── index_legal/
│ ├── {user_id_2}/
│ │ └── ...
│ └── {user_id_N}/
Firestore Database
├── users/
│ ├── {user_id_1}/
│ │ ├── credits: 45.50
│ │ ├── createdAt: ...
│ │ └── settings: {...}
│ └── ...
│
├── documents/
│ ├── {doc_id}/
│ │ ├── userId: {user_id} ← Isolation key
│ │ ├── filename: "..."
│ │ ├── chunks: 150
│ │ └── ...
│ └── ...
│
├── transactions/
│ ├── {tx_id}/
│ │ ├── userId: {user_id}
│ │ ├── type: "query"
│ │ ├── amount: -0.0001
│ │ └── timestamp: ...
│ └── ...
Security Boundaries
┌─────────────────────────────────────────────────────────────┐
│ Security Perimeter │
├─────────────────────────────────────────────────────────────┤
│ │
│ Authentication Layer (Firebase Auth) │
│ ├── Google OAuth │
│ ├── Email/Password │
│ └── API Keys (SHA256 hashed) │
│ │
│ Authorization Layer │
│ ├── Every request validates: token → user_id │
│ ├── Every data access filters by: user_id │
│ └── No cross-tenant queries possible │
│ │
│ Data Layer │
│ ├── GCS: IAM policies restrict access to service account │
│ ├── Firestore: Security rules enforce user_id matching │
│ └── TPU: No persistent storage, process only │
│ │
│ Network Layer │
│ ├── HTTPS everywhere (TLS 1.3) │
│ ├── Cloud Run: Private ingress option │
│ └── VPC Service Controls (Enterprise) │
│ │
└─────────────────────────────────────────────────────────────┘
Request Flow: Document Upload
1. User uploads file via dashboard or SDK
│
▼
2. API Gateway receives request
├── Validates Firebase token
├── Extracts user_id
└── Checks credit balance
│
▼
3. Document Service
├── Generates unique doc_id
├── Uploads to gs://adaptensor-uploads/{user_id}/{doc_id}
└── Creates Firestore document record
│
▼
4. Chunking Engine
├── Downloads file from GCS
├── Parses (PDF, DOCX, etc.)
├── Splits into semantic chunks
└── Returns chunk array
│
▼
5. Embedding Service (TPU)
├── Batches chunks (AdaptCore bucketing)
├── Generates 384-dim embeddings
└── Returns vectors
│
▼
6. Compression (AdaptHex™)
├── Quantizes float32 → hex8/hex4
├── 4-8x size reduction
└── Returns compressed vectors
│
▼
7. Storage
├── Saves chunks + vectors to gs://adaptensor-indexes/{user_id}/
└── Updates Firestore with chunk count, status
│
▼
8. Billing
├── Calculates cost: chunks × $0.0001
├── Deducts from user credits
└── Logs transaction
│
▼
9. Response to user with document_id, chunk_count, cost
Request Flow: Semantic Query
1. User sends query via API/SDK
│
▼
2. API Gateway
├── Validates authentication
├── Extracts user_id
└── Checks credits (≥ $0.0001)
│
▼
3. Query Service
├── Embeds query text (same model as indexing)
└── Returns 384-dim vector
│
▼
4. Search Engine
├── Loads user's index from gs://adaptensor-indexes/{user_id}/
├── Decompresses AdaptHex vectors (on-demand)
├── Computes cosine similarity
└── Returns top-k matches
│
▼
5. (Optional) Reranking
├── Cross-encoder scoring
└── Reorders by relevance
│
▼
6. Billing
├── Deducts $0.0001 (query cost)
└── Logs transaction
│
▼
7. Response with results, scores, latency
Scaling Strategy
Current: Small Scale (1-100 users)
┌─────────────────┐ ┌─────────────────┐
│ Cloud Run API │────▶│ TPU v2-8 │
│ (Auto-scaling) │ │ (On-demand) │
└─────────────────┘ └─────────────────┘
- Single API instance handles all requests
- TPU spun up per job, destroyed after
- Cost: ~$0/month when idle
Medium Scale (100-1000 users)
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Cloud Run API │ │ Job Queue │ │ TPU Pool │
│ (3-10 instances)│────▶│ (Cloud Tasks) │────▶│ (2-4 v2-8) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- API auto-scales with traffic
- Job queue buffers bursty workloads
- Small TPU pool stays warm
- Cost: ~$500-2000/month base
Large Scale (1000+ users)
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Load Balancer │ │ Regional API │ │ TPU Pods │
│ (Global) │────▶│ Clusters │────▶│ (v5p) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
┌───────┴───────┐
▼ ▼
┌───────────┐ ┌───────────┐
│ Shared │ │ Dedicated │
│ Pool │ │ Customers │
└───────────┘ └───────────┘
- Multi-region deployment
- TPU pods for massive throughput
- Dedicated TPU options for enterprise
- Cost: Usage-based, margins improve at scale
Technology Stack
| Layer | Technology | Purpose |
| Frontend | React + Vite + Tailwind | Dashboard UI |
| Hosting | Firebase Hosting | Static assets, CDN |
| Auth | Firebase Authentication | User identity |
| API | Cloud Run (Python/Flask) | Request handling |
| Database | Firestore | User data, metadata |
| Storage | Google Cloud Storage | Documents, indexes |
| Queue | Cloud Tasks | Job scheduling |
| Compute | TPU v2-8 / v4-8 / v5p | AI inference |
| Billing | Stripe | Payment processing |
| Docs | MkDocs + Netlify | Documentation |
| SDK | Python (PyPI) | Developer access |
Compliance & Certifications
| Standard | Status | Notes |
| SOC 2 Type II | Ready | GCP infrastructure compliant |
| HIPAA | Ready | BAA available through GCP |
| GDPR | Compliant | Data residency options |
| ISO 27001 | Inherited | Via GCP certification |
| FedRAMP | Roadmap | Enterprise feature |
Disaster Recovery
| Component | RPO | RTO | Strategy |
| User Data | 0 | < 1hr | GCS multi-region |
| Firestore | 0 | < 5min | Auto-failover |
| API | N/A | < 5min | Cloud Run auto-recovery |
| TPU | N/A | < 30min | Zone failover |
Next Steps