Hierarchical segment routing solution of CANZTE CorporationNanjingPRC+86 13770311052huang.guangping@zte.com.cnChina MobileBeijingPRC+86 13811071289duzongpeng@chinamobile.comPurple Moutain LaborotaryNanjingPRC+86 15300249211zhangchen@pmlabs.com.cn
Routing Area
RTGWGHierarchical segment routing solution of CANCAN (Computing Aware Network) is designed to enable the routing network to be aware of computing status thus
deliver the service flow accordingly. Nevertheless, computing and networking is quite different in terms of resource
granularity as well as its status stability. It would gain significant benefits to accommodate the computing status
to that of networking by employing a hierarchical computing routing segment scheme. The network-accommodated computing
status could be maintained at remote CAN nodes while the rest could reside at local CAN nodes. By enabling the network
to schedule and route computing services in a compatible way with the current IP routing network, CAN would bring
benefits to the industry by both efficiently pooling the computing resources and rendering services through perspective
of converged networking and computing.IntroductionComputing-related services have been provided in such a way that computing resources either are confined within isolated
sites (data centers, MECs etc.) without coordination among multiple sites or they are coordinated and managed within specific
and closed service systems without fine-grained networking facilitation, while the industry develops into an era in which
the computing resources start migrating from centralized data centers to distributed edge nodes. Therefore substantial benefits
in light of both cost and efficiency resulting from scale of economy, would be brought into multiple industries by intelligently
and dynamically connecting the distributed computing resources and rendering the coordinated computing resources as a unified
and virtual resource pool. On top of the cost and efficiency gains, applications as well as services would be served in a more
sophisticated way in which computing and networking resources could be aligned more efficiently and agilely than conventional
way in which the two are delivered in separate systems.Some impressive drafts such as and analyze the benefits of routing
related solution, and give the reference architecture and preliminary test results. End applications could be served not only by
fine-grained computing services but also fine-grained networking services rather than the best-effort networking services without
routing network involved otherwise. The cost is the burden of maintaining and sensing computing resource status in the networking
layer. The proposal is designed to be as much smoothly compatible with the ongoing routing architecture as possible.Requirements LanguageThe key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .Terminology
CAN Remote Node (CAN-R): routing node maintaining computing resource as well as service status from remote
cloud sites, and executing the cross-site routing policies in terms of the aforementioned status as well as the
identification of computing service. CAN-R usually resides at the network edge and works as ingress of the end
to end computing service flow.
CAN Local Node (CAN-L): routing node maintaining computing resource as well as service status from the
geographically local cloud sites and being responsible for the last hop of the service flow towards the computing
service instance in the specific cloud site. CAN-L usually resides at the network edge and works as egress of the
end to end computing service flow.
CAN Mid Node (CAN-M): routing node unaware of computing resource and service status and disregarding encapsulation
of the identification of computing service. CAN-M usually resides between CAN-R and CAN-L and works as ordinary routing
nodes.
Global Computing Resource and Service Status (GCRS): General cloud site status of the computing resource and
service which consists of overall resource occupation and types of computing service (algorithms, functions etc.) the
specific cloud site provides. GCRS is maintained at CAN-R and expected to remain relatively stable and change in slow
frequency.
Local Computing Resource and Service Status (LCRS): fine-grained cloud site status of the computing resource and
service which consists of status of each active computing service instance as well as its parameters which impact the
way the instance would be selected and visited by CAN-L. LCRS is maintained at CAN-L and expected to stay quite active
and change in high frequency.
Computing Service Identification (CSI): a globally unique identification of a computing service with
optional parameters, and it could be an IPv6-like address or specifically designed identification structure.
Instantiated Computing Service (ICS): an active instance of a computing service identification which
resides in a host usually purporting to a server, container or virtual machine.
Two-segment CAN routing solutionRouting network is enabled sensing the computing resource and service from the cloud sites and routing the service
flow according to both network and computing status as illustrated in figure 1. The proposed solution is a horizontal
convergence of cloud and network, while the latter maintains the converged resource status and thus is able to achieve
an end to end routing and forwarding policy from a perspective of cloud and network resource. PE1 maintains GCRS with
a whole picture of the multiple cloud sites, and executes the routing policy for the network segment between PE1 and
PE2 or PE3, namely between CAN-R and CAN-L, while PE2 maintains LCRS with a focus picture of the cloud site where S1
resides, and establishes a connection towards S1. S1 is an active instance of a specific computing service type (CSI).
On top of the role of CAN-L which maintains LCRS, PE2 and PE3 also fulfill the role CAN-R which maintains GCRS from
neighboring cloud sites. P provides traditional routing and forwarding functionality for computing service flow, and
remains unaware of any computing-related status as well as CSI encapsulations. Hierarchical granularity routing schemeStatus updates of computing resource and service in the cloud sites extend in a quite broad range from relatively
stable service types and overall resource occupation to extremely dynamic capacity changes as well as busy and idle
cycle of service instances. It would be a disaster to build all of the status updates in the network layer which would
bring overburdened and volatile routing tables and ruined its stability. It should be reasonable to divide the wide range of computing resource and services into different categories with
differentiated characteristics from routing perspective. GCRS and LCRS correspond to cross-site domain and local site
domain respectively, and GCRS aggregates the computing resource and service status with low update frequency from multiple
cloud sites while LCRS focuses only upon the status with high frequency in the local sites. Under this two-granularity
scheme, computing-related routing table of GCRS in the CAN-R remains in a position roughly as stable as the traditional
routing table, and the LCRS in the CAN-L maintains a near synchronized state table of the highly dynamic updates of computing
service instances in the local cloud site. Nonetheless, LCRS focusing upon a single and local cloud site is the normal case
while upon multiple sites should be exemption if not impossible.Two-segment routing and forwardingWhen it comes to end to end service flow routing and forwarding, there is an status information gap between GCRS and LCRS,
therefore a two-segment mechanism has to be in place in line with the two-granularity routing scheme demonstrated in 3.1. As
is illustrated in figure 2, R1 as an ingress determines the specific service flow’s egress which turns out to be R2 according
to policy calculation from GCRS. In particular, the CSI from both in-band (user plane) and out-band (control plane) is the
only index for R1 to calculate and determine the egress, it’s highly possible to make this egress calculation in terms of both
networking (bandwidth, latency etc) and computing Service Agreement Level. Nevertheless, the two SLA routing optimization could
be decoupled to such a degree that the traditional routing algorithms could remain as they are. The convergence of the SLA
policies as well as the methods to make CAN-R aware of the two SLA is out of scope of this proposal. When the service flow arrives at R2 which terminates the GCRS segment routing and determines S1 which is the service instance
selected according to LCRS maintained at R2. Again CSI is the only index for LCRS segment routing process.Cross-domain computing routing and forwardingCo-ordinated computing resource scheduling among multiple regions which are usually connected by multiple network domains,
as illustrated in section 1, is an important part of intended scenarios with regard to why computing-based scheduling and routing
is proposed in the first place. The two-segment routing and forwarding scheme illustrated in 3.2 is a typical use case of cross-domain
computing routing and forwarding and a good building block for the full-domain scenario solution. Computing status information is
brought into network domain to enable the latter scheduling routing policies beyond network. However, a particular scheme has to be
put in place to ensure mild and acceptable impacts upon the ongoing IP routing scheme. A consistent CSI across terminal, network
(multiple domains) and cloud along with hierarchical CSI-associated computing resource and service status which corresponds with
different network domains, is the enhanced full-domain routing and forwarding solution. Each domain maintains a corresponding computing
resource and service status at its edge node and makes the computing-based routing for the domain-related segment which should be connected
by the neighboring segments. CSI routingCSI encapsulated in the headers and maintained in LCRS and GCRS indicates an abstract service type rather than a geographically
explicit destination label, thus the routing scheme based upon CSI is actually a two-part and two-layer process in which CSI only
indicates the routing intention of user’s requested computing service type where routing does not actually materialize in forwarding
plane and the explicit routing destination would be determined by LCRS and GCRS. Therefore the actual routing falls within the
traditional routing scheme which remains intact. Apart from the indication of computing service routing intention, CSI could also indicates a specific network service requirements
by associating the networking service policy indexed by the routing table of the CAN control plane which would therefore schedule the
network resources such as an SR tunnel, guaranteed bandwidth etc.Therefore, GCRS and LCRS in control plane along with CSI encapsulation in user plane enables an logical computing routing sub-layer
which is able to be aware of the computing from cloud sites and forward the service flow in terms of computing resources as well as
networking resources. Nevertheless, this logical sub-layer remains only relevant at CAN-R and CAN-L and is simply about computing nodes
selection rather than executing the actual forwarding and routing actions.Traffic affinityCSI holds the only semantics of the service type that could be deployed as multiple instances within specific cloud site or across
multiple cloud sites, CSI is not explicit enough for all of the service flow packets to be forwarded to a specific destination. Traffic
affinity has to be guaranteed at both CAN-R and CAN-L. Once the egress is determined at CAN-R, the binding relationship between the
egress and the service flow’s unique identification (5-tuple or other specifically designed labels) is maintained and the subsequent
flow could be forwarded upon this binding table. Likewise CAN-L maintains the binding relationship between the service flow identification
and the selected service instance. Traffic affinity could be guaranteed by mechanisms beyond routing layer, but they will not be in the scope of this proposal.Hierarchical CAN computing status update work flowComputing resource and service update work flowThe full range of computing resource and service status from a specific cloud site is registered at CAN-L which maintains LCRS in itself
and notifies the part of GCRS to remote CAN-R where GCRS would be thus maintained and updated. As is illustrated in Figure 3, CAN-R in R1 from
site1 and site 2 is updated by R2 and R3, while LCRS of site 1 in R2 is updated by S1 and LCRS of site 2 in R3 is updated by S2. GCRS in R2
and R3 is updated by each other. Edge routers associating with local cloud site establish a mesh fabric to update the according GCRS among the
whole network domain, the computing resource and services in distributed cloud sites thus are connected and could be utilized as a single pool
for the applications rather than the isolated islands.Service flow routing and forwarding work flowFrom perspective of the service work flow, more details have actually been demonstrated in 3.2 and 3.3. Rather than the traditional
destination-oriented routing mechanism and the segment routing in which the ingress router is explicitly aware of a specific destination,
CSI as an abstract label without semantics of physical address works as the required destination from viewpoint of the user in terms of the
intended computing service. Therefore the service flow has to be routed and forwarded segment by segment in which the two segment destinations
are determined by GCRS and LCRS respectively.Control planeCentralized control planeLCRS’s volatility makes it infeasible to be maintained and controlled in a centralized entity, GCRS is the chief computing resource and
service status information to be collected and managed in the controller when it comes to centralized control plane. Routing and forwarding
policies from GCRS calculated in the centralized controller, as is demonstrated in 3.2, apply only to the segment from CAN-R to CAN-L, while
the second segment routing policy from CAN-L to the selected service instance in the cloud site is determined by LCRS at egress.Hierarchically centralized control plane architecture would be strongly recommended under the circumstances of nationwide network and cloud management.Distributed control planeGCRS is updated among the edge routers which have been connected in a mesh way that each pair of edge routers could exchange GCRS to each other, while
LCRS will be unidirectionally updated from cloud site to the associated CAN-L in which LCRS is maintained and its update process is terminated.Protocol consideration upon which GCRS and LCRS is updated is out of the scope of this proposal and will be illustrated in future drafts.Hybrid control planeIt should be more efficient to update the GCRS by a distributed way than a centralized way in terms of routing request and response in a limited network
and cloud domain, but be the opposite case in a nationwide circumstance. This is how hybrid control plane could be deployed in such a scheme that overall
optimization is achieved. Data planeCSI encapsulationComputing service identification is the predominant index across the entire computing delivery in routing network architecture under which a new virtual
routing sub-layer is employed with CSI working as the virtual destination. Data plane indicates the routing and forwarding orientation with CSI by inquiring
GCRS and LCRS at CAN-R and CAN-L respectively. CSI encapsulation could be achieved by extending the existing packet header and also achieved by designing a
dedicated shim layer, which along with the specific structure of CSI are out of the scope of this proposal and will be illustrated in future draft.CSI for CAN-R, CAN-M and CAN-LCAN-R encapsulates CSI in a designated header format as a proxy by translating the user-originated CSI format, and makes the first segment routing policy
and starts routing and forwarding the service traffic. CAN-M ignores CSI and simply forwards the traffic as usual. CAN-L decapsulates CSI and makes the second
segment routing policy and completes the last hop routing and forwarding.SummaryIt would significantly benefit the industry by connecting and coordinating the distributed computing resources and services and more so by further converging
networking and computing resource. Uncertainty and the potential impacts over the ongoing network architecture is the main reason for the community to think twice.
By classifying the end to end routing and forwarding path into two segments, the impacts from computing status are to be reduced to a degree they would be as
acceptable and comfortable enough as they are as networking status. In particular, employment of CSI enables a new service routing solution perfectly compatible
with the ongoing routing architecture.AcknowledgementsTo be added upon contributions, comments and suggestions.IANA ConsiderationsThis memo includes no request to IANA.Security ConsiderationsAs information originated from the third party (cloud sites), both GCRS and LCRS would be frequently updated in the network domain, both security threats against the
routing mechanisms and credibility and security issues of the computing services should be taken into account by architecture designing. The detailed analysis as well as
solution consideration will be proposed in the updated version of the draft.Informative ReferencesKey words for use in RFCs to Indicate Requirement LevelsIn many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.Dynamic-Anycast (Dyncast) Use Cases and Problem StatementService providers are exploring the edge computing to achieve better
response time, control over data and carbon energy saving by moving
the computing services towards the edge of the network in 5G MEC
(Multi-access Edge Computing) scenarios, virtualized central office,
and others. Providing services by sharing computing resources from
multiple edges is an emerging concept that is becoming more useful
for computationally intensive tasks. Ideally, services should be
computationally balanced using service-specific metrics instead of
simply dispatching the service in a static way, e.g., to the
geographically closest edge since this may cause unbalanced usage of
computing resources at edges which further degrades user experience
and system utilization. This draft provides an overview of scenarios
and problems associated with realizing such scenarios.
The document identifies several key areas which require more
investigations in terms of architecture and protocol to achieve
balanced computing and networking resource utilization among edges
providing the services.Dynamic-Anycast ArchitectureThis document describes a proposal for an architecture for the
Dynamic-Anycast (Dyncast). It includes an architecture overview,
main components that shall exist, and the workflow. An example of
workflow is provided, focusing on the load-balance multi-edge based
service use-case, where load is distributed in terms of both
computing and networking resources through the dynamic anycast
architecture.