| Internet-Draft | PCE for Network HA | September 2022 | 
| Chen, et al. | Expires 26 March 2023 | [Page] | 
This document describes extensions to Path Computation Element (PCE) communication Protocol (PCEP) for improving the reliability or availability of a network controlled by a controller cluster.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 26 March 2023.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
More and more networks are controlled by central controllers or controller clusters. A controller cluster is a single controller externally. It normally consists of two or more controllers internally working together as a single controller externally to control a network, i.e., every network element (NE) in the network. The reliability or availability of a network is heavily dependent on its controller cluster. The issues or failures in the controller cluster may impact the reliability or availability of the network greatly.¶
For a controller cluster comprising two or more controllers (i.e., primary controller, secondary controller, and so on), the failures in the cluster may split the cluster into a few of separated controller groups. These groups do not know each other and may be out of synchronization. Two or more groups may be elected as primary groups to control the network at the same time, which may cause some issues.¶
This document proposes some procedures and extensions to PCEP for the separated controllers or controller groups to know each other thus elect one new primary controller or controller group correctly when the cluster is split because of failures in the cluster.¶
The following terminologies are used in this document.¶
This section briefs the mechanism of controller cluster reliability or availability using PCEP, and illustrates some details through a simple example.¶
When a cluster of controllers is split into a few of separated groups because of failures in the cluster, the live controllers are still actually connected to the network (i.e., network elements). Through some of these connections, each group can get the information about the other groups. A new primary controller or controller group is correctly elected to control the network based on the information.¶
Each controller has a PCEP session with each of a give number of the same NEs in the network and the session is established and maintained over an IP path between the controller and the NE. The session is a session of PCEP with extensions.¶
In one example or configuration, the given number of NEs is one NE with the highest node ID. Suppose that node PE2 as NE has the highest ID. The session between the primary controller (e.g., A) and the NE (e.g., PE2) is the session of PCEP with extensions. Each of the non-primary controllers (e.g., B, C, ...) creates and maintains a PCEP session with this NE (e.g., PE2).¶
In normal operations, the cluster has all its controllers connected. They are the primary controller controlling the network, the secondary controller, and so on. They have current position 1, 2, and so on respectively. The primary controller advertises the information about the controllers via its PCEP sessions to the given number of the same NEs.¶
For example, it sends the information in a PCEP message to the NE (e.g., PE2), which transfers the information to each of the other controllers via the PCEP sessions to the other controllers.¶
When the cluster is split into a few separated groups of controllers, each group elects an intent primary controller, secondary controller and so on from the group, which have intent position 1, 2, and so on respectively. The intent primary controller in each group advertises the information about the controllers in its group.¶
The information advertised by the (intent) primary controller includes its current (intent) position, its old position, its priority to become a primary controller, number of controllers in its group or cluster, and the IDs of the controllers which are ordered according to their (intent) positions. In addition, a flag C indicating that whether it is Controlling the network (i.e., it is the primary controller or intent primary controller) is included.¶
Figure 1 shows a controller cluster comprising two controllers: the primary controller and the secondary controller. Each controller has a PCEP session with the same NE, which is NE4.¶
   +---------------------------------------------------+
   | Controller Cluster                                |
   |                                                   |
   |    +------------+               +------------+    |
   |    |Controller A|  Synchronize  |Controller B|    |
   |    |(Primary)   +---------------+(Secondary) |    |
   |    +------------+               +-----------++    |
   |           ^                                 |     |
   |           |_______________                  |     |
   |                          |                  |     |
   |                          v                  |     |
   +-----------------Channels to Network---------|-----+
                         /       \               |
    PCEP session---->   /         \____          |
    between            /           \   \____     | <--PCEP session
    A and NEi         /\  .---. .---+       \    |    between
    (i=1,2,..)       |  \(     '    |'.---. |    |    B and NE4
                     |---\  Network |      '+.   |
                    (o NE1\         |       | ) /
                     (     |        |       o) /
                      (    |        |       ) NE4
                       (   o NE2    o NE3.-'
                        '               )
                         '---._.-.     )
                                  '---'
The primary PCE controller (i.e., A) has a PCEP session with each NE in the network, including NE4. The secondary controller (i.e., B) has a PCEP session with the same NE4 in the network and the session is established and maintained over an IP path between B and NE4.¶
In normal operations, controller A (Primary) sends NE4 a PCEP message containing the information about the controllers connected to it. NE4 transfers the information to controller B (Secondary). The information includes:¶
C = 1, A's current Position = 1, A's OldPosition = 1, A's Priority, NoControllers = 2, A's ID, B's ID¶
When failures happen in the cluster, the live controllers act as follows:¶
For the primary controller (e.g., A), if it is alive, it continues to be the primary controller.¶
For the secondary controller (e.g., B) alive, if the primary controller is dead, it promotes itself as the new primary controller; if the primary controller is alive but separated from the secondary controller, the secondary controller will not promote itself to be a new primary controller.¶
With the extensions to PCEP, the secondary controller can determine the status of the primary controller based on the information about the primary controller received. The conditions that the primary controller is alive but separated from the secondary controller (i.e., condition a: the connection between the primary controller and the secondary controller in the cluster failed, but condition b: the two controllers are alive) can be determined by the secondary controller as follows:¶
For condition a, when the heartbeat from the primary stops, the secondary knows that the connection between the primary and secondary controller failed.¶
For condition b, it checks whether the information about the primary controller is updated within a given time. If so, the primary controller is alive; otherwise, it is dead.¶
This section describes extensions to PCEP.¶
During a PCEP session establishment, PCEP Speakers (PCE or PCC) advertise their support for PCEP extensions for network reliability, especially the High Availability of Controller cluster (HAC). A new Controller HA Support Capability TLV is defined for HAC below. A PCEP speaker indicates its support for HAC by including the TLV in the OPEN object in its OPEN message if it supports for HAC.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type (TBD1) | Length (4) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags |C| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
When two PCEP speakers establish a PCEP session between them, each of the speakers indicates its support for HAC by including a Controller HA Support Capability TLV in the OPEN object in its OPEN message if it supports for HAC.¶
For a PCEP speaker supporting for HAC, if it receives the Controller HA Support Capability TLV in the OPEN message from the other PCEP speaker over the PCEP session, it records that the other PCEP speaker (i.e., the other/remote end of the session) supports for HAC; otherwise, it records that the other speaker does not. Thus for all its PCEP sessions, it knows whether each session's remote end PCEP speaker supports for HAC. If the C-bit in the TLV is set to one, the PCEP speaker is a controller; otherwise, it is a NE.¶
A PCE as a controller supporting for HAC acts on the information about the controllers in its cluster or group as follows:¶
It sends the information in a PCEP message to each of a given set of NEs that runs PCEP with HAC support whenever the information changes. The given set of NEs may be the one NE with the highest ID.¶
It adjusts the positions of the controllers accordingly whenever there is a change in the information about the controllers received from the NE supporting for HAC.¶
An NE running PCEP with HAC support receives the information about the controllers from the PCE as a controller supporting for HAC, and sends the information to every PCE as a controller supporting for HAC and having a PCEP session with the NE except for the one from which the information is received.¶
A new object, called Controllers Object, is defined to contain the information about controllers. A controller in a cluster may advertise the information in a PCEP Report message containing a Controllers Object of the following format.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Object-Class | OT |Res|P|I| Object Length (bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + TLVs + | (including Controllers TLV) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Under the Controllers Object, a new TLV, called Controllers TLV, is defined to contain the information about controllers. It has the following format.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type (TBD2) | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Flags |C| Position | OldPosition | Priority | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reserved | NoControllers | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Connected Controller 1 ID | : : | | Connected Controller n ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
This section describes the recovery procedure for a controller cluster of n (n > 2) controllers, which are the primary controller A, the secondary controller B, ..., the n-th controller N.¶
When failures happen in the cluster, it may be split into a few separated groups of controllers. In one policy, the group with the maximum number of controllers is responsible for controlling the network as the primary group of the cluster, in which the new primary controller, secondary controller, and so on are elected.¶
For each separated group of controllers, the intent primary controller, secondary controller, and so on are elected. The intent primary controller of the group advertises the information about its group. The information includes its intent position, its old position, its priority to become a primary controller, the number of controllers in the group, and identifiers of the controllers in the group. The identifiers of the controllers are ordered according to their positions. The identifier of the intent primary controller, which has position 1, is the first one; The identifier of the intent secondary controller, which has position 2, is the second one; and so on. Thus every separated group has the information about the other groups and can determine which group has the maximum number of controllers.¶
In the case of tie (i.e., two or more groups have the same maximum number of controllers), the group with the highest old position controller (e.g., the old primary controller) wins in one policy. In another policy, the group with the highest priority controller wins.¶
Some details of the recovery procedures in the current and intent primary controller in a controller cluster or group are as follows.¶
In normal operations, it advertises the information about controllers containing:¶
C = 1, Position = 1, Old Position = 1, Primary Controller's priority, NoControllers = n, Primary Controller's ID, secondary controller's ID, ..., and n-th Controller's ID.¶
When failures cause the cluster split, it advertises the information about controllers containing:¶
C = 0, Position = 1, Old Position = 1, Intent Primary Controller's priority, NoControllers = m (m is the number of controllers in the group to which the intent primary controller belongs after the failures), Intent Primary Controller's ID, IDs of the other controllers connected.¶
Then after a given time, it checks if the group is elected as the primary group. If so, it advertises the information about controllers containing:¶
C = 1, Position = 1, Old Position = 1, its Priority, NoControllers = m, the IDs of the controllers in the group.¶
One example is that failures split the cluster into two separated groups: group 1 comprising A and C, group 2 consisting of B and N. Each group elects its intent primary controller, secondary controller, and so on. Suppose that controller A and C are elected as the intent primary and secondary controller respectively in group 1; controller B and N are elected as the intent primary and secondary controller respectively in group 2.¶
Each of the intent primary controllers A and B advertises the information about the controllers in its group. The information advertised by A includes:¶
C = 0, Position = 1, OldPosition = 1, A's Priority, NoControllers = 2, A's ID, C's ID.¶
The information advertised by B includes:¶
C = 0, Position = 1, OldPosition = 2, B's Priority, NoControllers = 2, B's ID, N's ID.¶
Group 1 and 2 have the same number of controllers, which is 2. But OldPosition in group 1 is higher than that in group 2. Group 1 is elected as the primary group, and the intent primary controller A in the primary group is determined as the current primary controller. After the determination, the information about the controllers in group 1 (i.e., the primary group) is changed. The updated information advertised by A includes:¶
C = 1, Position = 1, OldPosition = 1, A's Priority, NoControllers = 2, A's ID, C's ID.¶
TBD¶