PROJ Service Migration in Cellular Networks

This project has three focus parts: Cellular Networks Testbed, LLMs, and optimization algorithms.

Cellular Networks Testbed

Image from: Real-Time Service Migration in Edge Networks: A Survey

The testbed architecture spans four tiers: Central Cloud, Regional MEC, Aggregation MEC, and Local MEC. Primary focus: Regional MEC and Aggregation MEC. Regional MEC, Aggregation MEC, and Local MEC should be deployed to my PVE Testbed.

Each MEC site (or per Metro (metropolitan area) / PoP (Point of Presence)) needs its own control plane because:

  • Survivability: If WAN/backhaul drops, the site keeps running. A single, stretched cluster loses control-plane access and flakes.
  • Latency/etcd constraints: Kubernetes control-plane (etcd) hates WAN latency/packet-loss; cross-site RTTs >~5–10 ms and jitter cause elections and outages.
  • Blast radius & upgrades: Failures and rollouts stay local, enabling per-site upgrades.
  • Regulatory / tenancy: Site-level isolation simplifies policy and compliance.

TODO Central Cloud

>200 km coverage | ≥50 ms RTT Global control

Cloud (Azure) High-level design: Azure Virtual WAN (Standard) with four hubs in a full inter-hub mesh. Regional spokes (AKS VNets) attach to their nearest hub; inter-hub routing provides global any-to-any.

Regions (paired for HA/DR):

  • East US 2 (VA) — primary; paired with Central US (closest to UVA)
  • Central US (IA) — DR for East US 2
  • West US 3 (AZ) — west capacity/DR; paired with East US
  • East US (VA) — additional east capacity and the formal pair for West US/West US 3

(All selected regions provide Availability Zones.)

TODO Regional MEC (e.g., Richmond PoP)

50–200 km coverage | RTT to Aggregation 15–30 ms Use cases: smart city, cloud gaming, content delivery Components: SMF/AMF/PCF (control plane) + Regional UPF

TODO Aggregation MEC (Hub PoP)

10–50 km coverage | RTT to Local: 10–20 ms Use cases: campus control, local CDN

OKD (3 master nodes) Components: SMF/AMF/PCF (control plane) + optional Aggregation UPF

TODO Local MEC

1–5 km coverage | UE↔UPF target: ≤5–10 ms Use cases: autonomous driving, industrial robots, real-time AR/XR

Components: Local UPF Deploy two OKD SNOs or MicroShift clusters (MEC-1: Campus South; MEC-2: Campus North)

N3 (gNB ⇄ UPF @ MEC): VLAN/VRF local to the site, low jitter N6 (UPF ⇄ campus/ISP): routed toward the PoP

TODO gNodeB (UEs)

LLMs

Focus on multi-agent workflow design and LLM fine-tuning.

TODO LLM corpus

  • ETSI = European Telecommunications Standards Institute

    Famous work includes ETSI MEC (edge computing) and the original ETSI NFV effort.

TODO LLM Serving

Deploy Three LLMs to Regional MEC or Aggregation MEC:

https://ai-on-openshift.io/generative-ai/llm-serving/

TODO Retrieval Augmented Generation (RAG)

Org-roam MCP Server

TODO The Multi-Agent Roster & Roles

AgentLocation in TestbedPrimary RoleKey InputsResearch Motivation
Mobility Predictor Agent (MPA)Aggregation MEC / Local MEC (Near-RT RIC/O-RAN Layer)Context Generation: Provides real-time prediction of UE handover and mobility patterns to anticipate service relocation.Real-time Radio KPIs (RSRP, RSRQ), Handover/Xn/N2 events, UE location/velocity.Proactive Migration: Essential for timely initiation of migration at the lowest latency tiers, ensuring QoE under high mobility.
MEC Resource Agent (RCA)All Managed MEC Sites (Local, Aggregation, Regional)Local State Reporting: Monitors the instantaneous resource utilization and available capacity of its local compute cluster (OKD/MicroShift)CPU/Memory/GPU load, Available network bandwidth, K8s/OKD/MicroShift node metrics.Survivability and Autonomy: Guarantees that every control-plane instance has local resource awareness, upholding isolation and independence
Migration Planner Agent (PLA)Regional MEC and Aggregation MEC (Control Plane)Decision-Making: Determines the optimal migration target, timing, and method based on its scope (Local → Local vs. Regional → Regional).Aggregated Predictions (MPA data), Resource Availability (RCA reports), Service SLOs, Migration Cost Model.Hierarchical/Decentralized Decision: Enables ultra-low-latency decision-making for local PoP movements and wide-area optimization, avoiding high Central Cloud RTT
State/Traffic Steering Agent (TSA)Co-located with SMF/UPFExecution & Cutover: Executes the migration by coordinating state transfer and updating the 5G Core traffic rulesPLA’s Decision (Target MEC ID), State Transfer Status, 5G Core N4/N11 APIs (for UPF/SMF control plane updates)Critical Service Continuity: Directly implements the necessary 5G Core control procedures (PSA Relocation/UL-CL) at all anchor points to shift traffic seamlessly
Policy Enforcement Agent (PEA)Central Cloud (Azure)Global Policy Management: Distributes high-level, long-term policies, cost objectives, and optimization models across all PLA instancesLong-term historical data, Global business objectives, Failure tolerance settings, Regulatory/Tenancy policies.Global Governance: Provides the top-level goals and learning feedback to the decentralized PLA instances, ensuring consistency and alignment with global business objectives.

TODO Optimization Algorithms

Prioritize time series forecasting and decision making (TBD).