Service Migration in Cellular Networks

PROJ Service Migration in Cellular Networks

This project has three focus parts: Cellular Networks Testbed, LLMs, and optimization algorithms.

Cellular Networks Testbed

Image from: Real-Time Service Migration in Edge Networks: A Survey

The testbed architecture spans four tiers: Central Cloud, Regional MEC, Aggregation MEC, and Local MEC. Primary focus: Regional MEC and Aggregation MEC. Regional MEC, Aggregation MEC, and Local MEC should be deployed to my PVE Testbed.

Each MEC site (or per Metro (metropolitan area) / PoP (Point of Presence)) needs its own control plane because:

Survivability: If WAN/backhaul drops, the site keeps running. A single, stretched cluster loses control-plane access and flakes.
Latency/etcd constraints: Kubernetes control-plane (etcd) hates WAN latency/packet-loss; cross-site RTTs >~5–10 ms and jitter cause elections and outages.
Blast radius & upgrades: Failures and rollouts stay local, enabling per-site upgrades.
Regulatory / tenancy: Site-level isolation simplifies policy and compliance.

TODO Central Cloud

>200 km coverage | ≥50 ms RTT Global control

Cloud (Azure) High-level design: Azure Virtual WAN (Standard) with four hubs in a full inter-hub mesh. Regional spokes (AKS VNets) attach to their nearest hub; inter-hub routing provides global any-to-any.

Regions (paired for HA/DR):

East US 2 (VA) — primary; paired with Central US (closest to UVA)
Central US (IA) — DR for East US 2
West US 3 (AZ) — west capacity/DR; paired with East US
East US (VA) — additional east capacity and the formal pair for West US/West US 3

(All selected regions provide Availability Zones.)

TODO Regional MEC (e.g., Richmond PoP)

50–200 km coverage | RTT to Aggregation 15–30 ms Use cases: smart city, cloud gaming, content delivery Components: SMF/AMF/PCF (control plane) + Regional UPF

TODO Aggregation MEC (Hub PoP)

10–50 km coverage | RTT to Local: 10–20 ms Use cases: campus control, local CDN

OKD (3 master nodes) Components: SMF/AMF/PCF (control plane) + optional Aggregation UPF

TODO Local MEC

1–5 km coverage | UE↔UPF target: ≤5–10 ms Use cases: autonomous driving, industrial robots, real-time AR/XR

Components: Local UPF Deploy two OKD SNOs or MicroShift clusters (MEC-1: Campus South; MEC-2: Campus North)

N3 (gNB ⇄ UPF @ MEC): VLAN/VRF local to the site, low jitter N6 (UPF ⇄ campus/ISP): routed toward the PoP

TODO gNodeB (UEs)

Digital Twin

Must implement N2, N3, and optionally Xn. Focus on mmWave.
- NVIDIA AI Aerial
- ns-O-RAN

Physical RAN

CBRS band only
- mosoLab (all-in-one)
- baicells (7.2 split)
- USRP (8 split)
- Lite-On (7.2 split)

LLMs

Focus on multi-agent workflow design and LLM fine-tuning.

TODO LLM corpus

3GPP Standards

https://www.3gpp.org/specifications-technologies

ETSI = European Telecommunications Standards Institute

Famous work includes ETSI MEC (edge computing) and the original ETSI NFV effort.

Kubernetes documentation

https://kubernetes.io/docs/home/

OKD documentation

https://docs.okd.io/

Cilium

https://docs.cilium.io/en/stable/network/kubernetes/index.html

TODO LLM Serving

Deploy Three LLMs to Regional MEC or Aggregation MEC:

Custom LLM (based on Google Gemma) fine-tuned on UVA CS Slurm cluster.
Codex CLI / Gemini CLI / Droid CLI
Time-Series LLM (TBD)

https://ai-on-openshift.io/generative-ai/llm-serving/

TODO Retrieval Augmented Generation (RAG)

Org-roam MCP Server

TODO The Multi-Agent Roster & Roles

Agent	Location in Testbed	Primary Role	Key Inputs	Research Motivation
Mobility Predictor Agent (MPA)	Aggregation MEC / Local MEC (Near-RT RIC/O-RAN Layer)	Context Generation: Provides real-time prediction of UE handover and mobility patterns to anticipate service relocation.	Real-time Radio KPIs (RSRP, RSRQ), Handover/Xn/N2 events, UE location/velocity.	Proactive Migration: Essential for timely initiation of migration at the lowest latency tiers, ensuring QoE under high mobility.
MEC Resource Agent (RCA)	All Managed MEC Sites (Local, Aggregation, Regional)	Local State Reporting: Monitors the instantaneous resource utilization and available capacity of its local compute cluster (OKD/MicroShift)	CPU/Memory/GPU load, Available network bandwidth, K8s/OKD/MicroShift node metrics.	Survivability and Autonomy: Guarantees that every control-plane instance has local resource awareness, upholding isolation and independence
Migration Planner Agent (PLA)	Regional MEC and Aggregation MEC (Control Plane)	Decision-Making: Determines the optimal migration target, timing, and method based on its scope (Local → Local vs. Regional → Regional).	Aggregated Predictions (MPA data), Resource Availability (RCA reports), Service SLOs, Migration Cost Model.	Hierarchical/Decentralized Decision: Enables ultra-low-latency decision-making for local PoP movements and wide-area optimization, avoiding high Central Cloud RTT
State/Traffic Steering Agent (TSA)	Co-located with SMF/UPF	Execution & Cutover: Executes the migration by coordinating state transfer and updating the 5G Core traffic rules	PLA’s Decision (Target MEC ID), State Transfer Status, 5G Core N4/N11 APIs (for UPF/SMF control plane updates)	Critical Service Continuity: Directly implements the necessary 5G Core control procedures (PSA Relocation/UL-CL) at all anchor points to shift traffic seamlessly
Policy Enforcement Agent (PEA)	Central Cloud (Azure)	Global Policy Management: Distributes high-level, long-term policies, cost objectives, and optimization models across all PLA instances	Long-term historical data, Global business objectives, Failure tolerance settings, Regulatory/Tenancy policies.	Global Governance: Provides the top-level goals and learning feedback to the decentralized PLA instances, ensuring consistency and alignment with global business objectives.

TODO Optimization Algorithms

Prioritize time series forecasting and decision making (TBD).

Boyang Yan

Explorer

Service Migration in Cellular Networks — To-do List