The following section explains how the orchestration engine works in details.
OpenStack Heat vs. Cloud Topology Designer
To understand the orchestration engine, we compare it with the
OpenStack HEAT orchestration engine as in Figure 1:
Active connection is not required
The HEAT orchestrator (Figure 1, on the left) requires an agent pre-installed on the VMs. After the VMs start, the agent runs as a daemon process on the VMs. Both the agent and the HEAT orchestrator connect to the same messaging queue (i.e., RabbitMQ). The HEAT orchestrator pushes tasks to the queue. The agent pulls tasks from it and sets up the VMs accordingly. This orchestration type requires an active
AMQP connection all the time.
Unlike HEAT, our orchestrator does not require an agent pre-installed and running all the time on the VMs. Instead, it uses
Ansible to deploy a service catalog on the target VMs over a bastion host. Before the deployment, the orchestrator creates a security group on the bastion host that allows it to access the bastion host (i.e., allow
TCP incoming traffic on port
22 from the remote IP of the orchestrator). After the deployment completes, this security group rule is auto deleted.
Figure 2 shows an example: At the beginning of the workflow, the orchestrator creates a security group on the bastion host (step 1), then enables
TCP forwarding on the bastion host (step 2), and auto-deletes the security group rule after the deployment completes (step 3).
In comparison to HEAT, this orchestrator does not require an active AMQP connection on the VMs all the time.
Fully isolated workflow
For each ansible execution in the workflow, the orchestrator starts a new ansible controller container and executes ansible from inside the container. After a workflow step completes or fails, the orchestrator terminates the container. As a result, a workflow execution from multiple tenants are fully isolated with each other.
Key Management System supported
When users deploy their applications, the orchestration engine uses the user OpenStack token to work on behalf of them and provision resources on Open Telekom Cloud. It means, the orchestration engine itself cannot make any changes to the tenant without the user authentication. This is similar to the
OpenStack Heat orchestration engine, which also uses the OpenStack token to provision resource on behalf of the authenticated users.
Recall that the orchestrator uses ansible to deploy service catalogs on the computes via SSH. For this purpose, the orchestrator auto-generates an SSH key for each deployment. The public part of the SSH key is installed on the VMs (via cloud-init). The private part is encrypted in the Key Management System using the user OpenStack token. Without the user OpenStack token, the orchestrator itself cannot decrypt this private key.
Error handling of the orchestration engine
The orchestrator retries a workflow step when the following errors occur:
- The orchestrator fails to apply terraform while creating (or deleting) an OpenStack compute, router, floating IP, network, port, security group, security group rule.
- Retry: 1 time.
REST APIs error
The backend server is unreachable
- The backend server of Open Telekom Cloud is temporarily not available and the REST API responses an HTTP error message
The backend server is unreachable.
- Retry: 5 times.
Error during VM bootstrap
The VM is created successfully, but the orchestrator cannot check the connection to the VM after 3 min.
The deployment fails immediately. Users should login to the VM and check why cloud-init fails.
Ansible execution connection error
- Ansible is executed on a VM but terminated with exit code 4 (i.e., unreachable host).
- Retry: 1 time.
Concurrent workflow steps error
- When one step in the workflow fails, the orchestrator waits 2 minutes for the other concurrent steps in the workflow to complete. If they do not terminate after 2 minutes, the orchestrator also sets them as failed.