Boyang Yan


Areas of Interest

  • Networked Systems
  • Anomaly Detection
  • Network Measurement
  • Cloud Computing
  • Time Series Analysis

Summary

I have a broad range of research interests, spanning distributed networked systems, cloud computing, anomaly detection, and machine learning algorithms. My recent work focuses on building and optimizing cellular networks using practical machine learning (ML) techniques, with time-series analysis, hyperparameter tuning, and simulation methodologies.

I hold a Master’s degree in Statistics and Operations Research from the Royal Melbourne Institute of Technology (RMIT) and a Bachelor’s degree in Computer Science from the University of Wollongong. Prior to coming to the U.S., I accumulated four years of industry experience as a Cloud Solutions Engineer and Algorithm Engineer at Microsoft and Peng Cheng Laboratory. My work there centered on data center networks and ML systems (Azure Machine Learning Service).

I am deeply passionate about exploring new technologies and methodologies, always eager to tackle challenges and venture into uncharted territories.


Education

North Carolina State University, Raleigh, NC, USA

Ph.D. in Computer Science (Non-degree) (Full-time)
Aug, 2023 - Nov, 2024

  • GPA: 3.583 / 4.0 with exchange study
  • Coursework: Cloud Computing, Software Testing, Automated Software Engineering, Artificial Intelligence

M.S. in Statistics (Exchange Student) (Full-time)
Aug, 2022 - Dec, 2022

  • GPA: 3.583 / 4.0 with PhD study
  • Coursework: Fundamentals of Inference I, Advanced NextG Network Design, Cloud Computing

Royal Melbourne Institute of Technology (RMIT), Melbourne, VIC, Australia

M.S. in Statistics and Operations Research (Full-time)
Mar, 2021 - Dec, 2022

  • GPA: Distinction / High Distinction
  • Coursework:
    • Optimization for Decision Making (linear optimization)
    • Multivariate Analysis
    • Data Visualization
    • Data Cleaning (Data Wrangling)
    • Time Series Analysis
    • Machine Learning
    • Applied Analytics
    • Statistical Computing

University of Illinois Urbana-Champaign, Champaign County, Illinois, USA

NetMath (Non-degree Courses) (Part-time)
Mar, 2018 - Jan, 2020

  • GPA: 4.0 / 4.0
  • Completed these courses while working full-time at Peng Cheng Laboratory, focusing on enhancing mathematical foundations relevant to professional work.
  • Coursework: Algebra, Preparation for Calculus

University of Wollongong, Wollongong, NSW, Australia

B.S. in Computer Science (Software Engineering) (Full-time)
Mar, 2014 - Dec, 2017

  • GPA: Credit / High Distinction
  • Coursework:
    • Object-Oriented Programming in C++
    • Algorithms and Data Structures
    • Database Systems
    • Systems Development
    • System Security
    • Software Development Methods & Tools

Honors & Awards

  • 2022: Graduate with Distinction, RMIT University, Australia
  • 2022: SAS Advanced Analytics Certificate, SAS, Australia
  • 2020: Microsoft Hackathon Prize, Microsoft, China
  • 2018: Best Undergraduate Final Project Prize, University of Wollongong, Australia
  • 2017: Amateur Radio Operator’s Certificate of Proficiency (Standard), The Wireless Institute of Australia (WIA), Australia
  • 2014, 2015: Undergraduate Excellence Scholarship, University of Wollongong, Australia

Employment

NC State University | Raleigh, NC, USA

Teaching Assistant - Software Architectures for the Cloud (PART-TIME) | Aug, 2024 – Nov, 2024

  • Guided 21 undergraduate students and 2 master’s students to independently complete projects on their chosen topics.
  • Projects were divided into four parts: system design, implementation, security improvements, and deployment plans.
  • Provided detailed feedback and assigned grades for each component of the projects.
  • Achieved a high success rate, with 80% of students earning an A grade.

Data Science Research Assistant (Part-time) | Aug, 2023 - Aug, 2024

  • Conducted research on AI simplification for complex systems, exploring structural and relational dynamics.
  • Authored “MLOps in a Multi-Cloud Environment: Typical Networking Topology” and worked on “P4 Workload Prediction.”
  • Enhanced classifier accuracy through oversampling algorithms (e.g., SMOTE, ADASYN, Borderline-SMOTE, SVM-SMOTE, and SMOTUNED).
  • Researched Recursive Random Projection (Tree) for clustering high-dimensional data, applicable to regression, localization, anomaly detection, and compression.
  • Collaborated with a multidisciplinary team on innovative ML algorithms.
  • Advisor: Dr. Timothy Menzies (AI for Software Engineering Lab)

Wireless Networks Research Assistant - J1 Academic Training (Full-time) | Dec, 2022 - Aug, 2023

  • Worked on the NCSU Airpaws O-RAN testbed.
  • Integrated and managed openairinterface (OAI) 7.2 RAN with Benetel 550 (RU) and OAI’s CU/DU.
  • Developed and tested the Aether Cloud Native 5G Core for scalable and efficient 5G network deployment.
  • Focused on Aether’s orchestration and automation capabilities for seamless integration with the O-RAN testbed.

Microsoft | Shanghai, China

Cloud Solutions Engineer (Full-time) | Mar, 2020 - Jun, 2021

  • Spearheaded the resolution of complex machine learning algorithm challenges for Microsoft’s top Fortune 500 customers.
  • Developed and implemented MLOps solutions using the Azure Machine Learning SDK, enhancing operational efficiency.
  • Troubleshot Azure Networking issues, ensuring optimal performance and reliability.
  • Led projects leveraging Azure technologies, including Azure Machine Learning, Data Lake, Batch, Kubernetes, Edge, and Networking.
  • Manager: Kenji Hamada (Data & AI Pod)

Peng Cheng Laboratory | Shenzhen, China

Algorithm Engineer - Networking (Full-time) | Dec, 2018 - Mar, 2020

  • Engineered a networking simulator for complex network environments, reducing computational resource requirements.
  • Established and maintained the NFV infrastructure for the national laboratory’s supercomputer system.
  • Designed a network configuration update strategy, minimizing system failures and improving reliability.
  • Developed a robust flight control system for fixed-wing UAVs within a Mesh network framework.
  • Manager: Jingpu Duan (Network Communications Research Center)

University of Wollongong | Wollongong, NSW, Australia

Software Engineering Research Assistant (Full-time) | Jan, 2018 - Nov, 2018

  • Researched Natural Language Processing (NLP), focusing on failure testing methodologies for Machine Translation systems.
  • Designed and implemented a data processing pipeline, improving research efficiency and traceability.
  • Developed a Big Data analysis engine for experimental result analysis.
  • Advisor: Dr. George Zhou (Software Testing Lab)

Skills

Programming Languages

  • C/C++
  • Rust
  • Python
  • R
  • SAS
  • Lisp
  • Bash Scripting

Databases

  • Relational Databases: PostgreSQL, MySQL, Oracle
  • NoSQL: MongoDB
  • Key-Value Stores: Redis

Operating Systems

  • Arch Linux
  • Ubuntu
  • Debian

Computer Networks/Cloud

  • Cloud Platforms: Azure, AWS
  • Openshift (Kubernetes)
  • Open vSwitch (OVS)
  • P4 (Programming Protocol-Independent Packet Processors)
  • Transport Protocols: REST APIs, TCP/IP, gRPC

Tools/Frameworks

  • Network Simulation: NS3
  • Version Control: Git
  • Configuration Management: Ansible
  • Infrastructure as Code (IaC): Terraform

Machine Learning Frameworks and Data Science

  • TensorFlow
  • PyTorch
  • Keras
  • NumPy
  • SciPy
  • Scikit-learn
  • Optuna
  • MLflow

Publications

Workload Prediction in P4 Programmable Switches

arXiv preprint, 2024

  • Revised the decision tree (DT) to the Recursive Random Projection Regression (own method), enabling faster training, fewer rules, and better compliance with switch constraints.
  • Conducted comprehensive comparisons of six hyperparameter search methods (Random Search, Grid Search, Bayesian Optimization, Genetic Algorithms with TPOT, SHERPA, and Optuna) to optimize the regression method.
  • Benchmarked the Recursive Random Projection Regression against seven baselines, demonstrating superior performance in scenarios involving P4 switches.

MLOps in a Multi-cloud Environment: Typical Network Topology

arXiv preprint, 2024

  • Designed and implemented a robust MLOps pipeline integrating Azure, Google Cloud, AWS, and Corewave for seamless machine learning model deployment and operation.
  • Developed a comprehensive infrastructure supporting the full lifecycle of machine learning models, from data collection to real-time inference.
  • Tackled challenges in maintaining high availability and scalability of machine learning services in a multi-cloud environment, optimizing deployment strategies to balance cost, performance, and security.

MetaDetect: Metamorphic Testing-Based Anomaly Detection for Multi-UAV Wireless Networks

arXiv preprint, 2023

  • Developed methods for predicting anomalies in wireless network data using time-series analysis, enabling precise detection in multi-UAV networks.
  • Ensured anomaly detection models were interpretable and adaptable without relying on environmental data.
  • Implemented real-time systems capable of identifying incidents, improving monitoring efficiency and responsiveness.

Metamorphic Relations for Data Validation: A Case Study of Translated Text Messages

The 41st International Conference on Software Engineering (ICSE MET Workshop), 2019

  • Expanded metamorphic testing techniques to assess input data quality in software programs.
  • Conducted a case study within the NLP domain, addressing challenges in automated translation between Chinese and English.
  • Developed a crawler system to automate the collection of test datasets from various sources, creating comprehensive test cases for evaluating machine translation systems.

Projects

Meta Scientific Linux | Personal Project (2022 – Ongoing)

  • Created a Linux distribution tailored for researchers using Arch Linux.

Highway Traffic Anomaly Analysis | University Research Project (2022)

  • Developed visualization techniques for highway traffic data.

Custom Ergonomic Keyboard | Personal Project (2024 – Ongoing)

  • Designed a split ergonomic keyboard with voice input support.