Ansible tasks

MetalSoft has the ability to execute Ansible playbooks via the site controller at certain moments in time. This is done via the Ansible Task Type (taskType: ExtensionTaskAnsible) attached to an workflow or other Extension types.

Warning

The Ansible Runner capability must be enabled on the site controller in order for this task type to be supported. See Enabling the Ansible Runner Capability for more details.

The user is expected to provide the ansible playbook and associated roles and mark which callback points the ansible playbook should be attached to. At runtime MetalSoft generates a variables.json file that can be used inside the ansible playbook to reference elements from the execution context such as details about the server being registered.

Execution process:

  1. Depending on the stage a Job Graph will be updated with several tasks that will prepare and execute the ansible playbook on the site controller.

  2. The global controller downloads the ansible bundle specified in the extension’ assets[*].url section and sends it to the site controller. For example this is https://repo.metalsoft.io/.extensions_ms/workflows/power_dns.zip in the example below.

  3. The site controller then unzips it and executes ansible against the specified playbook such as deploy_dns_flexible in the example below with the provided variables.json see below more details. The ansible-playbook will be executed like this:

ansible-playbook -i /opt/metalsoft/ansible-jobs/c8bed144-bd0a-41ce-9ef5-cd9feddf9be2/ansible/inventory.yml /opt/metalsoft/ansible-jobs/c8bed144-bd0a-41ce-9ef5-cd9feddf9be2/ansible/job.yml -e @/opt/metalsoft/ansible-jobs/c8bed144-bd0a-41ce-9ef5-cd9feddf9be2/ansible/variables.json

Developing the Ansible playbook

The Ansible files that need to be created of course depend on the task at hand. One should first develop and test the ansible playbook locally before registering the extension. To test locally we recommend using a sample variables.json file from the examples below.

Creating and registering a simple Ansible extension

The following steps show the process of creating a simple ansible playbook that prints the model and the serial number of a server being registered:

  1. Create a file called test-playbook.yaml.

---
- name: Print server model and serial number of server being registered
  hosts: localhost
  connection: local
  gather_facts: false
  vars:
    variables_file: "{{ playbook_dir }}/variables.json"
  tasks:
    - name: Load JSON configuration
      ansible.builtin.include_vars:
        file: "{{ variables_file }}"
        name: task_vars
  
    - name: Print the server model
      debug:
        msg: "Model: {{ task_vars.server.model }}"

    - name: Print the server serial Number
      debug:
        msg: "Serial Number: {{ task_vars.server.serialNumber }}"

Note that the variables.json is created automatically when the task is executed as per the description below, based on the server being registered.

  1. Create a zip file withe Ansible files. We refer to this file as an “ansible bundle”.

Note that the playbook should be in the root directory of the zip file.

zip ansible.zip test-playbook.yaml
  1. Upload the zip file to an http repository reachable to the global controller so that the file is then accessible via an URL such as: https://repo.metalsoft.io/.extensions_ms/workflows/ansible.zip

  2. Create a file called ansible-extension.json:

{
  "kind": "ExtensionDefinition",
  "schemaVersion": "1.1",
  "name": "powerdns-automation",
  "label": "powerdnsautomation",
  "extensionType": "workflow",
  "vendor": "MetalSoft",
  "extensionVersion": "1.0.0",
  "description": "Manages DNS records via PowerDNS API during server lifecycle",
  "icon": "dns",
  "dependencies": {
    "controllerVersion": "string"
  },
  "inputs": [],
  "outputs": [],
  "assets": [
    {
      "label": "test_bundle",
      "name": "test ansible bundle",
      "assetType": "Bundle",
      "url": "https://repo.metalsoft.io/.extensions_ms/workflows/ansible.zip"
    }
  ],
  "onAssetChange": [
    {
      "stage": "serverRegistered",
      "tasks": [
        {
          "label": "test workflow",
          "taskType": "ExtensionTaskAnsible",
          "options": {
            "asset": "test_bundle",
            "playbook": "test_playbook.yaml"
          }
        }
      ]
    },
  ]
}
  1. Ensure that the following are correct in this file:

  • The URL of the ansible bundle (http://…ansible.zip) is accessible from the Global Controller

  • The name of the playbook file inside the ansible bundle is matches the file in the root directory (in the tonAssetChange.asks.options.playbook field)

  • The label of the asset (onAssetChange.tasks.options.asset field) referenced by the task matches the provided bundle (assets.label field)

  1. Register & publish the extension

metalcloud-cli extension create test-workflow workflow "test workflow" --definition-source pdns-workflow-full-example.json --format json
metalcloud-cli extension publish 1
metalcloud-cli extension make-public 1
  1. Ensure that the The Ansible Runner Capability is enabled on the Site Controller.

You are now ready to test. Try to register the server. A series of workflow related tasks will be inserted in the graph queue.

Callback Stages supported

The following are callback stages to which tasks (such as the Ansible task) can be attached to:

  • serverRegistered - Executed after a server is registered

  • serverDecommissioned - Executed after a server is decommissioned or deleted

  • switchRegistered - Executed after a switch is registered

  • switchDecommissioned - Executed after a switch is decommissioned or deleted

  • serverInstanceUpdate - Executed during an instance deployment

  • serverInstanceGroupCreateDNS - Executed when DNS entries are created for servers instance groups

  • serverInstanceGroupUpdateDNS - Executed when DNS entries are updated for servers instance groups

  • serverInstanceGroupDeleteDNS - Executed when DNS entries are deleted for servers instance groups

  • serverInstanceUpdateDNS - Executed when DNS entries are created and updated for server instances

  • serverInstanceDeleteDNS - Executed when DNS entries are deleted for server instances

  • serverCreateDNS - Executed when DNS entries are created for servers’s BMCs

  • serverDeleteDNS - Executed when DNS entries are deleted for servers’s BMCs

  • switchCreateDNS - Executed when DNS entries are created for switch’s Management Interface

  • switchDeleteDNS - Executed when DNS entries are deleted for switch’s Management Interface

Task Object Schema

A workflow can have one or more tasks of type ansible which will be executed in order. The following is an example task definition for the ExtensionTaskAnsible task type.

{
    "label": "create-or-update-dns-and-ptr-records-for-instance",
    "taskType": "ExtensionTaskAnsible",
    "options": {
        "asset": "power-dns-configuration",
        "playbook": "deploy_dns_flexible.yaml"
    }
}

Options

  • asset - The asset to call

  • playbook - The playbook to execute that must exist within the asset bundle.

  • executionTimeout - Timeout for the execution

  • executionTimeoutTick - How often to retry in case of an error

variables.json

When the Ansible bundle is executed a file called variables.json will be generated by the system and will be provided as a parameter to the ansible playbook. The content will depend on the execution stage:

  • for serverInstanceUpdate the variables.json receives:

  {
    "serverInstanceRecordSet": {
        "deployStatus": "ongoing",
        "deployType": "create",
        "deploymentId": 5388,
        "instanceIpv4IpRanges": [],
        "instanceIpv4Ips": [
            {
                "cidr": "10.0.0.4/24",
                "gateway": "10.0.0.1",
                "ip": "10.0.0.4",
                "logicalNetworkId": 1214,
                "maskBits": 24,
                "netmask": "255.255.255.0",
                "networkAddress": "10.0.0.0",
                "status": "allocated"
            }
        ],
        "instanceIpv6IpRanges": [],
        "instanceIpv6Ips": [],
        "logicalNetworks": [
            {
                "interfaces": [
                    {
                        "macAddress": "8c:84:74:0e:6c:34",
                        "redundancyIndex": null,
                        "serverInterfaceId": 688,
                        "tagged": false
                    }
                ],
                "ipv4Subnets": [
                    {
                        "gateway": "10.0.0.1",
                        "gatewayPlacement": "default",
                        "id": 237,
                        "networkAddress": "10.0.0.0",
                        "prefixLength": 24,
                        "scope": {
                            "kind": "fabric",
                            "resourceId": 1931
                        },
                        "status": "allocated"
                    }
                ],
                "logicalNetworkId": 1214,
                "logicalNetworkLabel": "alex-private-net",
                "logicalNetworkName": "alex-private-net",
                "vlans": [
                    {
                        "id": 314,
                        "scope": {
                            "kind": "fabric",
                            "resourceId": 1931
                        },
                        "status": "allocated",
                        "vlanId": 826
                    }
                ]
            }
        ],
        "serverId": 204,
        "serverInstanceId": 4434,
        "serviceStatus": "ordered",
        "siteLabel": "dc-eveng-qa02"
    }
}
  • For serverRegistered, serverDecommissioned The Server object is present in the variables.json, for example:

{
  "server": {
    "administrationState": "managed",
    "bdkDebug": 0,
    "biosInfo": {
      "vendor": "Dell Inc.",
      "version": "1.17.2"
    },
    "bmcMacAddress": "08:92:04:a4:d9:98",
    "bootLastUpdateTimestamp": "0000-00-00T00:00:00Z",
    "bootingCustomIsoInProgress": 0,
    "chassisRackId": null,
    "customInfo": null,
    "datacenterName": "sonic-qts",
    "diskCount": 4,
    "disks": [],
    "extensionInfo": null,
    "gpuCount": 0,
    "gpuInfo": [],
    "instanceCustomInfo": null,
    "interfaces": [],
    "inventoryId": null,
    "ipmiCredentialsNeedUpdate": 0,
    "ipmiVersion": "2",
    "isBasicCampusEndpoint": 0,
    "jobInfo": {
      "jobGroupId": null,
      "jobId": null
    },
    "links": [],
    "managementAddress": "172.18.33.189",
    "mgmtSnmpPasswordEncrypted": "sonic-qts_r1|aes-cbc|GsO+10B80u+bZZ9ppepYrV/tA16siQw4Arb6FYI3feJT5TQTkzXqlmbCGGuY6sAZ",
    "mgmtSnmpPort": 161,
    "model": "PowerEdge R450",
    "passwordEncrypted": "....",
    "powerStatus": "off",
    "powerStatusLastUpdateTimestamp": "2025-09-16T18:48:50Z",
    "processorCoreCount": 12,
    "processorCoreMhz": 4000,
    "processorCount": 1,
    "processorCpuMark": null,
    "processorName": "Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz",
    "processorThreads": 24,
    "rackName": null,
    "rackPositionLowerUnit": null,
    "rackPositionUpperUnit": null,
    "ramGbytes": 32,
    "registeredTimestamp": "2025-09-16T18:48:50Z",
    "requiresManualCleaning": 0,
    "requiresReRegister": 0,
    "resourcePoolId": null,
    "revision": 16,
    "serialNumber": "6SBT0R3",
    "serverAllocationTimestamp": null,
    "serverCapacityMbps": 20000,
    "serverClass": "bigdata",
    "serverCleanupPolicyId": null,
    "serverComments": null,
    "serverCreatedTimestamp": "2025-09-16T18:13:26Z",
    "serverDhcpStatus": "deny_requests",
    "serverDiskCount": 4,
    "serverDiskWipe": 1,
    "serverId": 27,
    "serverIsProduction": 0,
    "serverMetricsMetadata": {
      "fans": [],
      "temperatures": [
        {
          "Label": "temperature.cpu.1",
          "Name": "CPU1 Temp",
          "Number": 1,
          "Oem": null,
          "PhysicalContext": "CPU",
          "Units": "Celsius",
          "UpperThresholdCritical": null,
          "UpperThresholdFatal": null
        },
        {
          "Label": "temperature.systemboard.5",
          "Name": "System Board Inlet Temp",
          "Number": 5,
          "Oem": null,
          "PhysicalContext": "SystemBoard",
          "Units": "Celsius",
          "UpperThresholdCritical": 47,
          "UpperThresholdFatal": null
        },
        {
          "Label": "temperature.systemboard.6",
          "Name": "System Board Exhaust Temp",
          "Number": 6,
          "Oem": null,
          "PhysicalContext": "SystemBoard",
          "Units": "Celsius",
          "UpperThresholdCritical": null,
          "UpperThresholdFatal": null
        }
      ]
    },
    "serverStatus": "deleting",
    "serverSupportsOobProvisioning": 1,
    "serverSupportsSol": 1,
    "serverSupportsVirtualMedia": 1,
    "serverTypeId": 9,
    "serverUUID": "44454c4c-5300-1042-8054-b6c04f305233",
    "siteId": 1,
    "storageControllers": [
      {
        "description": "Embedded AHCI 1",
        "id": 114,
        "label": "C620 Series Chipset Family SSATA Controller [AHCI mode]",
        "mode": "HBA",
        "name": "AHCI.Embedded.1-1",
        "options": {
          "controllerModesSupported": [],
          "raidTypesSupported": []
        },
        "serverId": 27
      },
      {
        "description": "Embedded AHCI 2",
        "id": 115,
        "label": "C620 Series Chipset Family SATA Controller [AHCI mode]",
        "mode": "HBA",
        "name": "AHCI.Embedded.2-1",
        "options": {
          "controllerModesSupported": [],
          "raidTypesSupported": []
        },
        "serverId": 27
      },
      {
        "description": "AHCI controller in slot 1",
        "id": 113,
        "label": "BOSS-S1",
        "mode": "RAID",
        "name": "AHCI.Slot.1-1",
        "options": {
          "controllerModesSupported": [
            "RAID"
          ],
          "raidTypesSupported": [
            "RAID1"
          ]
        },
        "serverId": 27
      },
      {
        "description": "RAID Controller in SL 3",
        "id": 112,
        "label": "PERC H745 Front",
        "mode": "RAID",
        "name": "RAID.SL.3-1",
        "options": {
          "controllerModesSupported": [
            "RAID",
            "HBA"
          ],
          "raidTypesSupported": [
            "RAID0",
            "RAID1",
            "RAID5",
            "RAID6",
            "RAID10",
            "RAID50",
            "RAID60"
          ]
        },
        "serverId": 27
      }
    ],
    "submodel": null,
    "supportsFcProvisioning": 0,
    "tags": null,
    "username": "root",
    "vendor": "Dell",
    "vendorInfo": {
      "management": "iDRAC",
      "version": "iDRAC9"
    },
    "vendorSkuId": "PowerEdge R450",
    "vncPasswordEncrypted": "rqi|aes-cbc|1so23myI+2ymfZzcacFnk1EN7Yx726lGH0/jTI5pLc/nFhAJZYPSfRiiEO0PY6ak",
    "vncPort": 5901
  }
}
  • For switchRegistered,switchDecomissioned The Network object is available.

{
  "networkDevice":{
    "id": "ND-001",
    "revision": 2,
    "status": "active",
    "siteId": 101,
    "identifierString": "switch-01",
    "description": "Core switch in datacenter rack 5",
    "chassisIdentifier": "CHS-12345",
    "country": "USA",
    "city": "San Francisco",
    "datacenterMeta": "DC-West",
    "datacenterRoom": "Room A",
    "datacenterRack": "Rack 5",
    "rackPositionUpperUnit": 42,
    "rackPositionLowerUnit": 37,
    "managementAddress": "192.168.1.10",
    "managementAddressPrefixLength": 24,
    "managementAddressGateway": "192.168.1.1",
    "managementPort": 22,
    "syslogEnabled": 1,
    "username": "admin",
    "managementPassword": "password",
    "managementMacAddress": "00:1A:2B:3C:4D:5E",
    "serialNumber": "SN-987654321",
    "driver": {
      "name": "sonic_enterprise"
    },
    "position": {
      "role": "leaf"
    },
    "orderIndex": 1,
    "tags": ["production", "core", "leaf"],
    "readyForInitialConfiguration": 1,
    "bootstrapReadinessCheckInProgress": 0,
    "subnetOobId": 2001,
    "subnetOobIndex": 1,
    "requiresOsInstall": true,
    "bootstrapSkipInitialConfiguration": 0,
    "bootstrapExpectedPartnerHostname": "switch-02",
    "loopbackAddressIpv4": "10.0.0.1",
    "loopbackAddressIpv6": "fe80::1",
    "asn": 65001,
    "vtepAddressIpv4": "10.1.1.1",
    "vtepAddressIpv6": "fe80::2",
    "mlagSystemMac": "00:1A:2B:3C:4D:5F",
    "mlagDomainId": 10,
    "quarantineVlan": 999,
    "variablesMaterializedForOSAssets": {
      "osVersion": "1.2.3"
    },
    "secretsMaterializedForOSAssets": {
      "apiKey": "secret-key"
    },
    "bootstrapReadinessCheckResult": {
      "status": "ready"
    },
    "isGateway": false,
    "extensionInfo": {
      "lastRun": "2025-09-17T12:00:00Z"
    },
    "links": [
      {
        "rel": "self",
        "href": "/networkdevices/ND-001"
      }
    ]
  }
}
  • For serverInstanceGroupCreateDNS, serverInstanceGroupUpdateDNS, serverInstanceGroupDeleteDN,serverInstanceUpdateDNS, serverInstanceDeleteDNS check the RecordSet object in the API documentation. A server DNS record set object similar to this:

"serverInstanceGroupDNSRecordSet": {
    "zone": {
    "zoneName": "eveng-qa02.metalcloud.io",
    "soaEmail": "admin.eveng-qa02.metalcloud.io",
    "nameServers": [
        "ns1.evenq-qa02.metalcloud.io"
    ],
    "ttl": 3600,
    "isDefault": true
    },
    "infrastructureId": 3870,
    "serverInstanceGroup": {
    "label": "instance-array-3386"
    },
    "hostname": "lambda",
    "fqdn": "lambda.eveng-qa02.metalcloud.io",
    "ips": [
    {
        "status": "allocated",
        "ip": "10.20.50.36"
    }
    ]
}
  • For serverCreateDNS, serverDeleteDNS an object similar to is provided in variables.json:

"serverDNSRecordSet": {
    "zone": {
      "zoneName": "us08.metalsoft.io",
      "soaEmail": "admin.us08.metalsoft.io",
      "nameServers": ["n1.metalsoft.io"],
      "ttl": 3600,
      "isDefault": true
    },
    "serverId": 10,
    "serialNumber": "serial-number",
    "managementAddress": "192.168.100.100",
    "hostname": "server-10",
    "fqdn": "server-10.us08.metalsoft.io",
      "ip": {
      "status": "allocated",
      "ip": "192.168.100.100"
    },
    "operation": "create"
}
  • For switchCreateDNS, switchDeleteDNS the following payload is provided:

"switchDNSRecordSet": {
    "zone": {
      "zoneName": "us08.metalsoft.io",
      "soaEmail": "admin.us08.metalsoft.io",
      "nameServers": ["n1.metalsoft.io"],
      "ttl": 3600,
      "isDefault": true
    },
    "switchId": 10,
    "managementAddress": "192.168.100.100",
    "hostname": "switch-10",
    "fqdn": "switch-10.us08.metalsoft.io",
    "ip": {
      "status": "allocated",
      "ip": "192.168.100.100"
    },
    "operation": "create"
}

Extension Example

{
  "kind": "ExtensionDefinition",
  "schemaVersion": "1.1",
  "name": "powerdns-automation",
  "label": "powerdnsautomation",
  "extensionType": "workflow",
  "vendor": "MetalSoft",
  "extensionVersion": "1.0.0",
  "description": "Manages DNS records via PowerDNS API during server lifecycle",
  "icon": "dns",
  "dependencies": {
    "controllerVersion": "string"
  },
  "inputs": [],
  "outputs": [],
  "assets": [
    {
      "label": "power-dns-configuration",
      "name": "power-dns-configuration",
      "assetType": "Bundle",
      "url": "https://repo.metalsoft.io/.extensions_ms/workflows/power_dns.zip"
    }
  ],
  "onAssetChange": [
    {
      "stage": "serverInstanceGroupCreateDNS",
      "tasks": [
        {
          "label": "create-dns-records-for-instance-group",
          "taskType": "ExtensionTaskAnsible",
          "options": {
            "asset": "power-dns-configuration",
            "playbook": "deploy_dns_flexible.yaml"
          }
        }
      ]
    },
  ]
}

Other examples

Other examples are available on github:

Accessing the ansible logs

The logs as well as the extracted ansible archive is available inside the <task_uuid>/logs directory inside the volume attached to the ansible-runner docker container. For an example directory path:

/opt/metalsoft/ansible-jobs/5ee17203-13ba-40c9-9ed3-db12d111ee5e/ansible/

Troubleshooting

  1. playbook not found This is often caused by a zip file that does not have the playbook on its root directory but rather inside a folder. Rezip the bundle and upload to the repository. Also check the playbook file name in the extension definition.

  2. variable not found or undefined If the variable name does not match what is provided in the variables.json try extracting the variables.json file to view its contents. This is available in the ansible-jobs directory inside the ansible-runner docker container.

  3. Other ansible related errors To access the logs of the ansible execution see the logs directory in the ansible

Known issues

  • In some cases, when there is an issue with the ansible execution, the MetalSoft task in the deployment graph might hang for a long time before showing the error in the MetalSoft UI(aprox. 1h). WORKAROUND: Use the logs to determine the issue and kill the Task in the graph to be able to retry it or skip it. Use the logs in the ansible-jobs directory to diagnose the issue.

  • In some cases, killing the task will not kill the ansible processes leaving running (and retrying) ansible processes in the ansible runner. WORKAROUND: Delete all files in the ansible-jobs directory in the ansible-runner docker container on the Site Controller.

  • Simply retrying the workflow task does not re-download the updated ansible bundle. Retry the tasks above the workflow tasks to force the redownload.