Software Development Engineer II · Amazon — Bedrock Infrastructure (Mar 2025 — Present)
Technologies: Rust, DynamoDB, ECS, CDK, Tokio, AppConfig, ALB
Inference Scheduling Service (Rudi)
- Co-architected and developed Rudi, a Rust-based Bedrock inference scheduling service handling 1000+ TPS; governs timing and ordering of inference requests from FES to foundation models — implementing concurrency-per-variant enforcement, priority-based queuing with weighted round-robin scheduling, request throttling/load shedding, and deferred capacity allocation enabling Provisioned Throughput V2 without hard-allocating capacity
- Built on single-threaded Tokio scheduling engine with MPSC/oneshot channel patterns for microsecond-level scheduling decisions; deployed across all Bedrock regions
Priority Queue Migration
- Implemented two-phase zero-downtime migration from legacy 4-value (0,1,2,3) to standardized 7-tier priority system (P5–P100), with bidirectional session token normalization handling values 0–150 for rollback compatibility
- Built configuration-driven priority weights, AppConfig/Rust fallback configs, and capacity allocation constraints; deployed across Beta, PreProd, and Prod with zero incidents over 6-week rollout enabling 2 new revenue tiers (Premium OD, Best Effort OD)
Quality of Service (QoS)
- Implemented QoS tier mapping (RudiPriority → QoS 0/1/10) in schedule responses and forwarding logic for Anthropic models (Claude Sonnet, Haiku, Opus); added preemptable flag in capacity constraints and AppConfig-based model QoS detection
- Built unified priority assignment avoiding separate QoS vs. non-QoS code paths; updated capacity allocator to deduct preemptable consumption only for QoS-supported models
RAMA Classification Consolidation
- Led 4-team consolidation expanding from 2-value to 10-classification system (PTv2, Priority Access, OnDemand, Flex, Batch); implemented fail-closed exception throwing in Rudi and FES for unknown classifications with SEV2 alarms
- Built thick client with AppConfig-based cached overrides (50+ account-level imports), shadow mode validation framework, and session token migration including RAMA classification for request lifecycle consistency
ServiceTier API
- Implemented ServiceTier-to-RudiPriority mapping with configuration-driven rules engine (service_tier.json, 15+ rules) supporting FLEX/PRIORITY tiers and RAMA BEST_EFFORT integration; built dual-operation mode with RequestType fallback and feature flags for gradual rollout
Leader-Follower & Game Day
- Implemented leader-follower architecture with DynamoDB coordination, request forwarding, and broadcast mechanisms; automatic failover recovery under 10 seconds
- Built and executed 27 Game Day failure scenarios (routing, leader election, capacity tracking, peer communication); identified and fixed 5 critical edge cases including session token priority mismatches, TTL recovery, and ALB cookie validation
Operations & Full-CD
- Resolved 800+ tickets; created 30+ composite alarms reducing false positives by 70%; reduced MTTR by 60% through variant-level monitoring, priority fairness alarms, and cache error metrics
- Enabled Full-CD for AppConfig, Canary, and Service pipelines with 20+ rollback alarms and ECS circuit breakers; drove region expansion and 10+ model launches (Claude Sonnets, Haikus, Opus, Amazon Nova family) resolving GMDS fallback, health check, and AZ blockers