service mesh

The Definition of Service Mesh

Though controversial, we believe that the concept of Service Mesh was first articulated by William Morgan from industry. A detailed definition [19] of service mesh can be presented as the following:

A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It’s responsible for the reliable delivery of requests through the complex topology of services that comprise a modern, cloud native application. In practice, the service mesh is typically implemented as an array of lightweight network proxies that are deployed alongside application code, without the application needing to be aware.

The importance of Service Mesh

While the technology development towards micro service can significantly improve the speed and agility of software service delivery, it also raises the operational complexity associated with modern applications. This has led to the emergence of Service Mesh, a promising approach to mitigate this situation by introducing a dedicated infrastructure layer over microservices without imposing modification on the service implementations.

Current problems

a cloud-native application might consist of a large number of microservices which (I.) might be implemented using different languages, (II.) might belong to different tenants, and (III.) might have thousands of service instances with constantly changing states that are caused by the orchestration platform scheduling.

In such dynamic environments, debugging microservices applications turns out to be a challenging task [13] due to the complexity of service dependencies and the fact that any service can become temporarily inaccessible to its consumers [14]. Additionally, as the application behavior depends on the flow of the traffic among microservices, traffic control becomes a must (but currently missing) for runtime operation.

Service meshes aim to be a full networking solution for microservices; however, they also introduce overhead into a system - this can be significant for low-powered edge devices, as service mesh proxies work in user space and are responsible for processing the incoming and outgoing traffic of each service. To mitigate performance issues caused by these proxies, the industry is pushing the boundaries of monitoring and security to kernel space by employing eBPF for faster and more efficient responses.

Emergence of Service Mesh

This has led to the emergence of Service Mesh, a promising approach to mitigate this situation by introducing a dedicated infrastructure layer over microservices without imposing modification on the service implementations. Serving as a fully manageable service-to-service communication platform, a service mesh is designed to standardize the runtime operations of applications. As part of the microservices ecosystem, the service mesh technologies hold the promise in addressing communication-related concerns [15], e.g., interoperability [16], traffic segmentation [17], dependency control [18], runtime enforcement, etc.

service mesh challenges

fundamental features

In general, a service mesh is designed to provide a set of fundamental features, as follows:

Load balancing: The load-balancer in a service mesh allows traffic to be routed across the network. Modern service mesh load-balancing routing can now consider latency and the state (e.g., health status and current variable load) of the backend services, as opposed to simple routing mechanisms (e.g., round robin and random routing).
Service discovery: As the number of service instances, as well as their state and placement, change dynamically over time. As a result, there should be a service discovery mechanism that allows service consumers to discover the placement and make requests to a dynamic set of ephemeral service instances. Typically, service instances are discovered by searching a registry, which keeps track of new and removed instances from the network. This is critical feature of a service mesh.
Fault tolerance: A service mesh is a layer of abstraction above TCP/IP in the networking model. Given that the network underlying the L3/L4 network, like every other aspect of the environment, is unreliable, the service mesh must be able to handle failures. This is typically achieved by redirecting consumer requests to a healthy service instance.
Traffic monitoring: All communication among microservices are captured, enabling reporting of requests volume per target, latency metrics, success and error rates, etc.
Circuit breaking: Circuit breaking rejects incoming requests to protect latency at the expense of availability, enabling fast reactions to overloads.
Authentication and access control: Through policy enforcement from the control plane, a service mesh can define which services are allowed to be accessed by which services, and what type of traffic is unauthorized and should be denied. With these fundamental features, complex functionalities can A e built, e.g., through authentication and access control, an identity-based routing policy can be implemented and enforced.

Disadvantage

However, some of these features can have a huge overhead when they are enforced on the sidecar proxies in a service mesh. To alleviate such overhead, kernel features - notably the extended Berkeley Packet Filter (eBPF) - are being exploited for improved performance and efficiency.

Enablement Technologies

research opportunities

Reference List

Buoyant Inc., “A Service Mesh for Kubernetes,” last accessed in Dec,2018. [Online]. Available: https://cdn2.hubspot.net/hubfs/2818724/A%20Service%20Mesh%20for%20KubernetesFinal.pdf
Sedghpour, M. R. S., & Townend, P. (2022, August). Service Mesh and eBPF-Powered Microservices: A Survey and Future Directions. In 2022 IEEE International Conference on Service-Oriented System Engineering (SOSE) (pp. 176-184). IEEE.

Boyang Yan

Explorer