How the orchestration engine works

The following section explains how the orchestration engine works in technical details.

OpenStack Heat vs. Cloud Topology Designer

To understand the CTD orchestration engine, we compare it with the OpenStack HEAT orchestration engine as in Figure 1:

Fig. Cloud Topology Designer vs OpenStack Heat Figure 1

Active connection is not required

The HEAT orchestration (Figure 1, on the left) requires an agent pre-installed on the VMs. After the VMs start, the agent runs as a daemon process on the VMs. Both the agent and the HEAT orchestration connect to the same messaging queue (i.e., RabbitMQ). The HEAT orchestrator pushes tasks to the queue. The agent pulls tasks from it and sets up the VMs accordingly. This orchestration requires the AMQP connection remaining active all the time.

Unlike the HEAT orchestration, the CTD orchestrator does not require an agent pre-installed and running all the time in the VMs. Instead, it pushes Ansible modules and code to the VMs over a bastion host, then it executes on the target VMs via the remote SSH connections. Before the deployment, the CTD orchestrator creates a security group on the bastion host that allows it to access the bastion host. After the deployment, the security group rule is auto deleted.

Figure 2 shows an example: At the beginning of the workflow, the CTD orchestrator creates a security group on the bastion host (step 1), then enables TCP forwarding on the bastion host (step 2), and auto-deletes the security group rule after the deployment completes (step 3).

Fig. Deployment workflow Figure 2

note

In comparison to the HEAT orchestration, the CTD orchestrator does not require a connection remaining active all the time.

Fully isolated workflow

For each workflow step in the orchestration process, the CTD orchestrator starts a Sandbox container and executes the workflow step inside the Sandbox. After a workflow step completes or failes, the orchestrator terminates the Sandbox container. As a result, workflow steps from multiple tenants are fully isolated with each other.

Fig. CTD orchestrator starts a sandbox container

Key Management System supported

When users deploy their applications, the orchestration engine uses the user authentication token (i.e., the OpenStack token) to work on behalf of them and provision resources on Open Telekom Cloud. It means, the orchestration engine itself cannot make any changes to the user tenant without the user authentication. This is similar to the OpenStack Heat orchestration engine, which also uses the OpenStack tokens to provision resource on behalf of the authenticated users.

To strengthen the security even further, in our case, the authentication token is encrypted in a Key Management System. After the deployment completes (or fails), the authentication token is deleted from the Key Management System immediately.

note

The OpenStack Heat orchestration engine does not support a secret management system that protects user secrets. In comparison to the OpenStack Heat, the CTD orchestrator supports a Key Management System to resolve user secrets during the deployment.

The CTD orchestrator uses the ansible module to zip and copy the software components on the compute nodes. For this purpose, the orchestration engine auto-generates an SSH key for each deployment. The public part of the SSH key is installed on the compute nodes (via cloud-init). The private part is encrypted in the Key Management System using the user authentication token. It means, the orchestration engine itself (or system administrators of the CTD) cannot decrypt user secrets.

tip

The CTD supports multi-tenancy by isolating workflow step in a sandbox container. If you do not want to "share" the orchestration engine with another tenants, you may want to host your own CTD instance and fully operate it. Please contact us for further request.

Error handling

Terraform error

  • Create or delete an Openstack compute, router, floating IP, network, port, security group, security group rule.
  • Retry: 1

HTTP error: The backend server is unreachable

  • The backend server of Open Telekom Cloud is temporarily not available and the REST API responses an HTTP error message The backend server is unreachable.
  • Retry: 5, Waiting time between retries: 10s.

Error during VM bootstrap

  • The VM is created successfully, but the orchestrator cannot check the connection to the VM after 3 min. Fig. Orchestration engine deploys the HelloWorld python script via SSH

  • The orchestrator fails immediately. Users should login to the VM and check why cloud-init fails.

Ansible execution

  • Execute ansible on a compute node but terminate with exit code 4 (i.e., unreachable host).
  • Retry: 1

Workflow step error

  • One workflow step fails
  • The orchestrator waits 2 minutes for a graceful termination of other concurrent steps to complete then set them on error.