Boyang Yan


Areas of Interest

  • Computer Networks
  • Networked Systems
  • Anomaly Detection
  • Network Measurement
  • Cloud Computing
  • Time Series Analysis

Summary

I have a broad range of research interests, spanning computer networks and distributed systems, with a focus on network measurement, anomaly detection, and cloud computing. My work aims to enhance the observability, performance, and reliability of modern networked systems. I design and implement tools for real-world network analysis, troubleshooting, and optimization. My recent research involves programmable networking, workload prediction in P4 switches, and scalable network monitoring architectures.

I hold a Master’s degree in Statistics and Operations Research from the Royal Melbourne Institute of Technology (RMIT), Australia, and a Bachelor’s degree in Computer Science from the University of Wollongong, Australia. I have a strong background in network systems engineering, reinforced by hands-on experience in cloud infrastructure, time-series analysis, and software-defined networking. Before joining academia, I worked in the industry, contributing to cloud platforms and network function virtualization.


Education

North Carolina State University, Raleigh, NC, USA

Ph.D. in Computer Science (Non-degree) (Full-time)
Aug, 2023 - Nov, 2024

  • Coursework: Cloud Computing, Software Testing, Automated Software Engineering

M.S. in Statistics (Exchange Student) (Full-time)
Aug, 2022 - Dec, 2022

  • Coursework: Fundamentals of Inference I, Advanced NextG Network Design, Cloud Computing

Royal Melbourne Institute of Technology (RMIT), Melbourne, VIC, Australia

M.S. in Statistics and Operations Research (Full-time)
Mar, 2021 - Dec, 2022

  • Coursework:
    • Optimization for Decision Making (linear optimization)
    • Multivariate Analysis
    • Data Visualization
    • Data Cleaning (Data Wrangling)
    • Time Series Analysis
    • Machine Learning
    • Applied Analytics
    • Statistical Computing

University of Illinois Urbana-Champaign, Champaign County, Illinois, USA

NetMath (Non-degree Courses) (Part-time)
Mar, 2018 - Jan, 2020

  • Coursework: Algebra, Preparation for Calculus

University of Wollongong, Wollongong, NSW, Australia

B.S. in Computer Science (Software Engineering) (Full-time)
Mar, 2014 - Dec, 2017

  • Coursework:
    • Object-Oriented Programming in C++
    • Algorithms and Data Structures
    • Database Systems
    • Systems Development
    • Software Development Methods & Tools

Honors & Awards

  • 2022: Graduate with Distinction, RMIT University, Australia
  • 2022: SAS Advanced Analytics Certificate, SAS, Australia
  • 2020: Microsoft Hackathon Prize, Microsoft, China
  • 2018: Best Undergraduate Final Project Prize, University of Wollongong, Australia
  • 2017: Amateur Radio Operator’s Certificate of Proficiency (Standard), The Wireless Institute of Australia (WIA), Australia
  • 2014, 2015: Undergraduate Excellence Scholarship, University of Wollongong, Australia

Employment

NC State University | Raleigh, NC, USA

Teaching Assistant - Software Architectures for the Cloud (CSC495)(PART-TIME) | Aug, 2024 – Nov, 2024

  • Guided 21 undergraduate students and 2 master’s students to independently complete projects on their chosen topics.
  • Projects were divided into four parts: system design, implementation, security improvements, and deployment plans.
  • Provided detailed feedback and assigned grades for each component of the projects.
  • Achieved a high success rate, with 80% of students earning an A grade.

Research Assistant - Data Science (Part-time) | Aug, 2023 - Aug, 2024

  • Developed statistical models and sampling techniques to improve data efficiency in resource-constrained environments.
  • Enhanced classifier accuracy through oversampling algorithms (SMOTE, ADASYN, Borderline-SMOTE, SVM-SMOTE, and SMOTUNED) for optimizing distributed systems and large-scale networked applications.
    • Explored recursive projection techniques for clustering high-dimensional network data, with applications in anomaly detection and traffic classification.
    • Implemented advanced data processing pipelines to analyze network workload patterns and optimize performance.

Research Assistant - Wireless Networks - J1 academic training (Full-time) | Dec, 2022 - Aug, 2023

  • Contributed to the Airpaws O-RAN testbed, focusing on scalable network deployment.
  • Integrated and maintained OpenAirInterface (OAI) 7.2 RAN with Benetel 550 (RU) and OAI’s CU/DU.
  • Developed and tested Aether Cloud Native 5G Core to improve reliability and performance in distributed environments.
  • Designed orchestration and automation strategies to streamline deployment across diverse network topologies.

Microsoft | Shanghai, China

Support Engineer - Cloud Solutions (Fully-time) | Mar, 2020 - Jun, 2021

  • Resolved complex network infrastructure challenges for enterprise customers using cloud-based solutions.
  • Led troubleshooting and performance optimization efforts for cloud networking and large-scale distributed systems.
  • Designed deployment strategies for cloud-native applications, ensuring robust, scalable architectures.
  • Provided in-depth technical support for enterprise networking, including transport protocols, hybrid cloud integration, and containerized workloads.
  • Manager: Kenji Hamada (Data & AI Pod)

Peng Cheng Laboratory | Shenzhen, China

Algorithm Engineer - Computer Networks (Full-time) | Dec, 2018 - Mar, 2020

  • Designed a high-performance networking simulator to optimize resource allocation in distributed environments.
  • Implemented and maintained Network Function Virtualization (NFV) infrastructure to enhance network efficiency.
  • Developed a robust configuration management framework, improving operational reliability and reducing downtime. Achieved: One patent
  • Designed a communication framework for wireless networks, optimizing routing and information exchange in dynamic environments.
  • Manager: Jingpu Duan (Network Communications Research Center)

University of Wollongong | Wollongong, NSW, Australia

Research Assistant - Software Engineering (Fully-time) | Jan, 2018 - Nov, 2018

  • Researched Natural Language Processing (NLP) with a focus on developing innovative failure testing methodologies for Machine Translation systems, utilizing Metamorphic Testing principles to enhance automation and failure detection capabilities.
  • Designed and implemented a comprehensive data processing pipeline, significantly improving the efficiency and traceability of the experimental research process.
  • Developed a Big Data analysis engine tailored for the in-depth analysis of experimental results.
  • Advisor: Dr. George Zhou (Software Testing Lab)

Skills

Programming Languages

  • C/C++
  • Rust
  • Python
  • R
  • SAS
  • Lisp
  • Bash Scripting

Databases

  • Relational Databases: PostgreSQL, MySQL, Oracle
  • NoSQL: MongoDB
  • Key-Value Stores: Redis

Operating Systems

  • Arch Linux
  • Ubuntu
  • Debian

Computer Networks/Cloud

  • Cloud Platforms: Azure, AWS
  • Openshift (Kubernetes)
  • Open vSwitch (OVS)
  • P4 (Programming Protocol-Independent Packet Processors)
  • Transport Protocols: REST APIs, TCP/IP, gRPC

Tools/Frameworks

  • Network Simulation: NS3
  • Version Control: Git
  • Configuration Management: Ansible
  • Infrastructure as Code (IaC): Terraform

Publications

Workload Prediction in P4 Programmable Switches

arXiv preprint, 2024

  • Revised the decision tree (DT) to the Recursive Random Projection Regression (Kind of Binary Decision Tree - Own method), supports faster training, generates fewer rules and satisfies switch constraints better
  • Conducted comprehensive comparisons among six Hyperparameter Search methods (Random Search, Grid Search, Bayesian Optimization, Genetic Algorithms - using TPOT, SHERPA, and Optuna) to optimize my own Regression method
  • Benchmarked the Recursive Random Projection Regression against seven baselines (Vector Auto Regression, Support Vector Machines, Random Forest Regressor, Gradient Boosting Machine, Large Bayesian vector auto regression, explainable Boosted Linear Regression, and distribution enhanced linear regression), which demonstrated superior performance, particularly in scenarios involving P4 switches.

MLOps in a Multi-cloud Environment: Typical Network Topology

arXiv preprint, 2024

  • Designed and implemented a robust MLOps pipeline capable of seamless integration across multiple cloud platforms, including Azure, Google Cloud, AWS, and Corewave. This pipeline facilitated secure, efficient deployment and operation of machine learning models, ensuring consistency and reliability across diverse cloud infrastructures.
  • Architected a comprehensive infrastructure that supports the entire lifecycle of machine learning models, from data collection to real-time model inference, which advanced data management strategies to optimize the storage, retrieval, and processing of large datasets, crucial for maintaining the accuracy and efficiency of deployed models.
  • Tackled critical challenges in maintaining high availability and scalability of machine learning services in a multi-cloud environment. Implemented strategies to ensure that models remained available and performant under varying load conditions.
  • Conducted an in-depth analysis of both business and technical requirements to inform the selection of cloud service providers. The result was a strategically optimized deployment environment that balanced cost, performance, and security considerations.

MetaDetect: Metamorphic Testing-Based Anomaly Detection for Multi-UAV Wireless Networks

arXiv preprint, 2023

  • Developed advanced methods for predicting anomalies in wireless network data, utilizing time-series analysis. This approach was applied to nine distinct wireless network metrics and various UAV sensor data, enabling precise detection and characterization of anomalies within wireless networks.
  • Main contributions:
    • Ensured that the anomaly detection models provided clear and understandable explanations for their predictions, enhancing the interpretability of the results.
    • Designed to operate without relying on physical environmental data, making the system robust and adaptable to various operational contexts.
    • Implemented automated systems capable of identifying incidents and accident events in real-time. This feature significantly improved the responsiveness and efficiency of monitoring systems, facilitating quick actions in response to detected anomalies.

Metamorphic Relations for Data Validation: A Case Study of Translated Text Messages

The 41st International Conference on Software Engineering (ICSE MET Workshop), 2019

  • Expansion of Metamorphic Testing for Data Validation: Advanced the use of metamorphic testing techniques to assess the quality and integrity of input data in software programs. This approach was particularly focused on scenarios where source or follow-up inputs might be improperly generated, ensuring the robustness of data processing systems.
  • Conducted an in-depth case study within the NLP domain, specifically addressing the unique challenges associated with the machine translation of text messages. The study focused on translations between Chinese and English, highlighting common pitfalls and inconsistencies in automated translation outputs.
  • Demonstrated the utility of metamorphic relations (MR) in automatically detecting poor-quality translations. This method provided a reliable means of identifying issues in translated texts without relying on reference translations, thus streamlining the validation process.
  • Engineered a sophisticated crawler system to automate the collection of test datasets from various sources, including Wikipedia Context, Google Translate, Microsoft Translator, and Youdao Translate. This system facilitated the creation of a comprehensive set of test cases, crucial for evaluating and improving the quality of machine translation systems.

Patents

Automated text difference analysis and verification system/method, 2020

  • China Patent Office ID: CN202010489931.8
  • Area: Network Configuration File Update
    • Conceptualized and spearheaded the development of an automated system for text difference analysis and verification, specifically targeting the complexities associated with network configuration file updates.
    • Designed a groundbreaking approach for autonomously generating test cases based on the internal attribute relationships, enhancing the efficiency and accuracy of differential analysis tools and contributing to more reliable network configuration management practices.
    • Engineered a pioneering method to automatically generate test cases by leveraging internal attribute relationships within the data. This innovation significantly improved the efficiency and precision of differential analysis tools, ensuring more accurate detection and verification of changes.

Projects

Meta Scientific Linux | Personal Project (2022 – Ongoing)

  • Created a Linux distribution tailored for researchers using Arch Linux.

Highway Traffic Anomaly Analysis | University Research Project (2022)

  • Developed visualization techniques for highway traffic data.

Custom Ergonomic Keyboard | Personal Project (2024 – Ongoing)

  • Designed a split ergonomic keyboard with voice input support.