Ansible tasks¶
MetalSoft has the ability to execute Ansible playbooks via the site controller at certain moments in time. This is done via the Ansible Task Type (taskType: ExtensionTaskAnsible
) attached to an workflow or other Extension types.
Warning
The Ansible Runner capability must be enabled on the site controller in order for this task type to be supported. See Enabling the Ansible Runner Capability for more details.
The user is expected to provide the ansible playbook and associated roles and mark which callback points the ansible playbook should be attached to. At runtime MetalSoft generates a variables.json
file that can be used inside the ansible playbook to reference elements from the execution context such as details about the server being registered.
Execution process:¶
Depending on the
stage
a Job Graph will be updated with several tasks that will prepare and execute the ansible playbook on the site controller.The global controller downloads the ansible bundle specified in the extension’
assets[*].url
section and sends it to the site controller. For example this ishttps://repo.metalsoft.io/.extensions_ms/workflows/power_dns.zip
in the example below.The site controller then unzips it and executes ansible against the specified playbook such as
deploy_dns_flexible
in the example below with the providedvariables.json
see below more details. Theansible-playbook
will be executed like this:
ansible-playbook -i /opt/metalsoft/ansible-jobs/c8bed144-bd0a-41ce-9ef5-cd9feddf9be2/ansible/inventory.yml /opt/metalsoft/ansible-jobs/c8bed144-bd0a-41ce-9ef5-cd9feddf9be2/ansible/job.yml -e @/opt/metalsoft/ansible-jobs/c8bed144-bd0a-41ce-9ef5-cd9feddf9be2/ansible/variables.json
Developing the Ansible playbook¶
The Ansible files that need to be created of course depend on the task at hand. One should first develop and test the ansible playbook locally before registering the extension. To test locally we recommend using a sample variables.json
file from the examples below.
Creating and registering a simple Ansible extension¶
The following steps show the process of creating a simple ansible playbook that prints the model and the serial number of a server being registered:
Create a file called
test-playbook.yaml
.
---
- name: Print server model and serial number of server being registered
hosts: localhost
connection: local
gather_facts: false
vars:
variables_file: "{{ playbook_dir }}/variables.json"
tasks:
- name: Load JSON configuration
ansible.builtin.include_vars:
file: "{{ variables_file }}"
name: task_vars
- name: Print the server model
debug:
msg: "Model: {{ task_vars.server.model }}"
- name: Print the server serial Number
debug:
msg: "Serial Number: {{ task_vars.server.serialNumber }}"
Note that the variables.json
is created automatically when the task is executed as per the description below, based on the server being registered.
Create a zip file withe Ansible files. We refer to this file as an “ansible bundle”.
Note that the playbook should be in the root directory of the zip file.
zip ansible.zip test-playbook.yaml
Upload the zip file to an http repository reachable to the global controller so that the file is then accessible via an URL such as:
https://repo.metalsoft.io/.extensions_ms/workflows/ansible.zip
Create a file called
ansible-extension.json
:
{
"kind": "ExtensionDefinition",
"schemaVersion": "1.1",
"name": "powerdns-automation",
"label": "powerdnsautomation",
"extensionType": "workflow",
"vendor": "MetalSoft",
"extensionVersion": "1.0.0",
"description": "Manages DNS records via PowerDNS API during server lifecycle",
"icon": "dns",
"dependencies": {
"controllerVersion": "string"
},
"inputs": [],
"outputs": [],
"assets": [
{
"label": "test_bundle",
"name": "test ansible bundle",
"assetType": "Bundle",
"url": "https://repo.metalsoft.io/.extensions_ms/workflows/ansible.zip"
}
],
"onAssetChange": [
{
"stage": "serverRegistered",
"tasks": [
{
"label": "test workflow",
"taskType": "ExtensionTaskAnsible",
"options": {
"asset": "test_bundle",
"playbook": "test_playbook.yaml"
}
}
]
},
]
}
Ensure that the following are correct in this file:
The URL of the ansible bundle (http://…ansible.zip) is accessible from the Global Controller
The name of the playbook file inside the ansible bundle is matches the file in the root directory (in the
tonAssetChange.asks.options.playbook
field)The label of the asset (
onAssetChange.tasks.options.asset
field) referenced by the task matches the provided bundle (assets.label
field)
Register & publish the extension
metalcloud-cli extension create test-workflow workflow "test workflow" --definition-source pdns-workflow-full-example.json --format json
metalcloud-cli extension publish 1
metalcloud-cli extension make-public 1
Ensure that the The Ansible Runner Capability is enabled on the Site Controller.
You are now ready to test. Try to register the server. A series of workflow related tasks will be inserted in the graph queue.
Callback Stages supported¶
The following are callback stages to which tasks (such as the Ansible task) can be attached to:
serverRegistered
- Executed after a server is registeredserverDecommissioned
- Executed after a server is decommissioned or deletedswitchRegistered
- Executed after a switch is registeredswitchDecommissioned
- Executed after a switch is decommissioned or deletedserverInstanceUpdate
- Executed during an instance deploymentserverInstanceGroupCreateDNS
- Executed when DNS entries are created for servers instance groupsserverInstanceGroupUpdateDNS
- Executed when DNS entries are updated for servers instance groupsserverInstanceGroupDeleteDNS
- Executed when DNS entries are deleted for servers instance groupsserverInstanceUpdateDNS
- Executed when DNS entries are created and updated for server instancesserverInstanceDeleteDNS
- Executed when DNS entries are deleted for server instancesserverCreateDNS
- Executed when DNS entries are created for servers’s BMCsserverDeleteDNS
- Executed when DNS entries are deleted for servers’s BMCsswitchCreateDNS
- Executed when DNS entries are created for switch’s Management InterfaceswitchDeleteDNS
- Executed when DNS entries are deleted for switch’s Management Interface
Task Object Schema¶
A workflow can have one or more tasks of type ansible which will be executed in order. The following is an example task definition for the ExtensionTaskAnsible
task type.
{
"label": "create-or-update-dns-and-ptr-records-for-instance",
"taskType": "ExtensionTaskAnsible",
"options": {
"asset": "power-dns-configuration",
"playbook": "deploy_dns_flexible.yaml"
}
}
Options¶
asset
- The asset to callplaybook
- The playbook to execute that must exist within the asset bundle.executionTimeout
- Timeout for the executionexecutionTimeoutTick
- How often to retry in case of an error
variables.json¶
When the Ansible bundle is executed a file called variables.json
will be generated by the system and will be provided as a parameter to the ansible playbook. The content will depend on the execution stage
:
for
serverInstanceUpdate
thevariables.json
receives:
{
"serverInstanceRecordSet": {
"deployStatus": "ongoing",
"deployType": "create",
"deploymentId": 5388,
"instanceIpv4IpRanges": [],
"instanceIpv4Ips": [
{
"cidr": "10.0.0.4/24",
"gateway": "10.0.0.1",
"ip": "10.0.0.4",
"logicalNetworkId": 1214,
"maskBits": 24,
"netmask": "255.255.255.0",
"networkAddress": "10.0.0.0",
"status": "allocated"
}
],
"instanceIpv6IpRanges": [],
"instanceIpv6Ips": [],
"logicalNetworks": [
{
"interfaces": [
{
"macAddress": "8c:84:74:0e:6c:34",
"redundancyIndex": null,
"serverInterfaceId": 688,
"tagged": false
}
],
"ipv4Subnets": [
{
"gateway": "10.0.0.1",
"gatewayPlacement": "default",
"id": 237,
"networkAddress": "10.0.0.0",
"prefixLength": 24,
"scope": {
"kind": "fabric",
"resourceId": 1931
},
"status": "allocated"
}
],
"logicalNetworkId": 1214,
"logicalNetworkLabel": "alex-private-net",
"logicalNetworkName": "alex-private-net",
"vlans": [
{
"id": 314,
"scope": {
"kind": "fabric",
"resourceId": 1931
},
"status": "allocated",
"vlanId": 826
}
]
}
],
"serverId": 204,
"serverInstanceId": 4434,
"serviceStatus": "ordered",
"siteLabel": "dc-eveng-qa02"
}
}
For
serverRegistered
,serverDecommissioned
TheServer
object is present in thevariables.json
, for example:
{
"server": {
"administrationState": "managed",
"bdkDebug": 0,
"biosInfo": {
"vendor": "Dell Inc.",
"version": "1.17.2"
},
"bmcMacAddress": "08:92:04:a4:d9:98",
"bootLastUpdateTimestamp": "0000-00-00T00:00:00Z",
"bootingCustomIsoInProgress": 0,
"chassisRackId": null,
"customInfo": null,
"datacenterName": "sonic-qts",
"diskCount": 4,
"disks": [],
"extensionInfo": null,
"gpuCount": 0,
"gpuInfo": [],
"instanceCustomInfo": null,
"interfaces": [],
"inventoryId": null,
"ipmiCredentialsNeedUpdate": 0,
"ipmiVersion": "2",
"isBasicCampusEndpoint": 0,
"jobInfo": {
"jobGroupId": null,
"jobId": null
},
"links": [],
"managementAddress": "172.18.33.189",
"mgmtSnmpPasswordEncrypted": "sonic-qts_r1|aes-cbc|GsO+10B80u+bZZ9ppepYrV/tA16siQw4Arb6FYI3feJT5TQTkzXqlmbCGGuY6sAZ",
"mgmtSnmpPort": 161,
"model": "PowerEdge R450",
"passwordEncrypted": "....",
"powerStatus": "off",
"powerStatusLastUpdateTimestamp": "2025-09-16T18:48:50Z",
"processorCoreCount": 12,
"processorCoreMhz": 4000,
"processorCount": 1,
"processorCpuMark": null,
"processorName": "Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz",
"processorThreads": 24,
"rackName": null,
"rackPositionLowerUnit": null,
"rackPositionUpperUnit": null,
"ramGbytes": 32,
"registeredTimestamp": "2025-09-16T18:48:50Z",
"requiresManualCleaning": 0,
"requiresReRegister": 0,
"resourcePoolId": null,
"revision": 16,
"serialNumber": "6SBT0R3",
"serverAllocationTimestamp": null,
"serverCapacityMbps": 20000,
"serverClass": "bigdata",
"serverCleanupPolicyId": null,
"serverComments": null,
"serverCreatedTimestamp": "2025-09-16T18:13:26Z",
"serverDhcpStatus": "deny_requests",
"serverDiskCount": 4,
"serverDiskWipe": 1,
"serverId": 27,
"serverIsProduction": 0,
"serverMetricsMetadata": {
"fans": [],
"temperatures": [
{
"Label": "temperature.cpu.1",
"Name": "CPU1 Temp",
"Number": 1,
"Oem": null,
"PhysicalContext": "CPU",
"Units": "Celsius",
"UpperThresholdCritical": null,
"UpperThresholdFatal": null
},
{
"Label": "temperature.systemboard.5",
"Name": "System Board Inlet Temp",
"Number": 5,
"Oem": null,
"PhysicalContext": "SystemBoard",
"Units": "Celsius",
"UpperThresholdCritical": 47,
"UpperThresholdFatal": null
},
{
"Label": "temperature.systemboard.6",
"Name": "System Board Exhaust Temp",
"Number": 6,
"Oem": null,
"PhysicalContext": "SystemBoard",
"Units": "Celsius",
"UpperThresholdCritical": null,
"UpperThresholdFatal": null
}
]
},
"serverStatus": "deleting",
"serverSupportsOobProvisioning": 1,
"serverSupportsSol": 1,
"serverSupportsVirtualMedia": 1,
"serverTypeId": 9,
"serverUUID": "44454c4c-5300-1042-8054-b6c04f305233",
"siteId": 1,
"storageControllers": [
{
"description": "Embedded AHCI 1",
"id": 114,
"label": "C620 Series Chipset Family SSATA Controller [AHCI mode]",
"mode": "HBA",
"name": "AHCI.Embedded.1-1",
"options": {
"controllerModesSupported": [],
"raidTypesSupported": []
},
"serverId": 27
},
{
"description": "Embedded AHCI 2",
"id": 115,
"label": "C620 Series Chipset Family SATA Controller [AHCI mode]",
"mode": "HBA",
"name": "AHCI.Embedded.2-1",
"options": {
"controllerModesSupported": [],
"raidTypesSupported": []
},
"serverId": 27
},
{
"description": "AHCI controller in slot 1",
"id": 113,
"label": "BOSS-S1",
"mode": "RAID",
"name": "AHCI.Slot.1-1",
"options": {
"controllerModesSupported": [
"RAID"
],
"raidTypesSupported": [
"RAID1"
]
},
"serverId": 27
},
{
"description": "RAID Controller in SL 3",
"id": 112,
"label": "PERC H745 Front",
"mode": "RAID",
"name": "RAID.SL.3-1",
"options": {
"controllerModesSupported": [
"RAID",
"HBA"
],
"raidTypesSupported": [
"RAID0",
"RAID1",
"RAID5",
"RAID6",
"RAID10",
"RAID50",
"RAID60"
]
},
"serverId": 27
}
],
"submodel": null,
"supportsFcProvisioning": 0,
"tags": null,
"username": "root",
"vendor": "Dell",
"vendorInfo": {
"management": "iDRAC",
"version": "iDRAC9"
},
"vendorSkuId": "PowerEdge R450",
"vncPasswordEncrypted": "rqi|aes-cbc|1so23myI+2ymfZzcacFnk1EN7Yx726lGH0/jTI5pLc/nFhAJZYPSfRiiEO0PY6ak",
"vncPort": 5901
}
}
For
switchRegistered
,switchDecomissioned
TheNetwork
object is available.
{
"networkDevice":{
"id": "ND-001",
"revision": 2,
"status": "active",
"siteId": 101,
"identifierString": "switch-01",
"description": "Core switch in datacenter rack 5",
"chassisIdentifier": "CHS-12345",
"country": "USA",
"city": "San Francisco",
"datacenterMeta": "DC-West",
"datacenterRoom": "Room A",
"datacenterRack": "Rack 5",
"rackPositionUpperUnit": 42,
"rackPositionLowerUnit": 37,
"managementAddress": "192.168.1.10",
"managementAddressPrefixLength": 24,
"managementAddressGateway": "192.168.1.1",
"managementPort": 22,
"syslogEnabled": 1,
"username": "admin",
"managementPassword": "password",
"managementMacAddress": "00:1A:2B:3C:4D:5E",
"serialNumber": "SN-987654321",
"driver": {
"name": "sonic_enterprise"
},
"position": {
"role": "leaf"
},
"orderIndex": 1,
"tags": ["production", "core", "leaf"],
"readyForInitialConfiguration": 1,
"bootstrapReadinessCheckInProgress": 0,
"subnetOobId": 2001,
"subnetOobIndex": 1,
"requiresOsInstall": true,
"bootstrapSkipInitialConfiguration": 0,
"bootstrapExpectedPartnerHostname": "switch-02",
"loopbackAddressIpv4": "10.0.0.1",
"loopbackAddressIpv6": "fe80::1",
"asn": 65001,
"vtepAddressIpv4": "10.1.1.1",
"vtepAddressIpv6": "fe80::2",
"mlagSystemMac": "00:1A:2B:3C:4D:5F",
"mlagDomainId": 10,
"quarantineVlan": 999,
"variablesMaterializedForOSAssets": {
"osVersion": "1.2.3"
},
"secretsMaterializedForOSAssets": {
"apiKey": "secret-key"
},
"bootstrapReadinessCheckResult": {
"status": "ready"
},
"isGateway": false,
"extensionInfo": {
"lastRun": "2025-09-17T12:00:00Z"
},
"links": [
{
"rel": "self",
"href": "/networkdevices/ND-001"
}
]
}
}
For
serverInstanceGroupCreateDNS
,serverInstanceGroupUpdateDNS
,serverInstanceGroupDeleteDN
,serverInstanceUpdateDNS
,serverInstanceDeleteDNS
check theRecordSet
object in the API documentation. A server DNS record set object similar to this:
"serverInstanceGroupDNSRecordSet": {
"zone": {
"zoneName": "eveng-qa02.metalcloud.io",
"soaEmail": "admin.eveng-qa02.metalcloud.io",
"nameServers": [
"ns1.evenq-qa02.metalcloud.io"
],
"ttl": 3600,
"isDefault": true
},
"infrastructureId": 3870,
"serverInstanceGroup": {
"label": "instance-array-3386"
},
"hostname": "lambda",
"fqdn": "lambda.eveng-qa02.metalcloud.io",
"ips": [
{
"status": "allocated",
"ip": "10.20.50.36"
}
]
}
For
serverCreateDNS
,serverDeleteDNS
an object similar to is provided invariables.json
:
"serverDNSRecordSet": {
"zone": {
"zoneName": "us08.metalsoft.io",
"soaEmail": "admin.us08.metalsoft.io",
"nameServers": ["n1.metalsoft.io"],
"ttl": 3600,
"isDefault": true
},
"serverId": 10,
"serialNumber": "serial-number",
"managementAddress": "192.168.100.100",
"hostname": "server-10",
"fqdn": "server-10.us08.metalsoft.io",
"ip": {
"status": "allocated",
"ip": "192.168.100.100"
},
"operation": "create"
}
For
switchCreateDNS
,switchDeleteDNS
the following payload is provided:
"switchDNSRecordSet": {
"zone": {
"zoneName": "us08.metalsoft.io",
"soaEmail": "admin.us08.metalsoft.io",
"nameServers": ["n1.metalsoft.io"],
"ttl": 3600,
"isDefault": true
},
"switchId": 10,
"managementAddress": "192.168.100.100",
"hostname": "switch-10",
"fqdn": "switch-10.us08.metalsoft.io",
"ip": {
"status": "allocated",
"ip": "192.168.100.100"
},
"operation": "create"
}
Extension Example¶
{
"kind": "ExtensionDefinition",
"schemaVersion": "1.1",
"name": "powerdns-automation",
"label": "powerdnsautomation",
"extensionType": "workflow",
"vendor": "MetalSoft",
"extensionVersion": "1.0.0",
"description": "Manages DNS records via PowerDNS API during server lifecycle",
"icon": "dns",
"dependencies": {
"controllerVersion": "string"
},
"inputs": [],
"outputs": [],
"assets": [
{
"label": "power-dns-configuration",
"name": "power-dns-configuration",
"assetType": "Bundle",
"url": "https://repo.metalsoft.io/.extensions_ms/workflows/power_dns.zip"
}
],
"onAssetChange": [
{
"stage": "serverInstanceGroupCreateDNS",
"tasks": [
{
"label": "create-dns-records-for-instance-group",
"taskType": "ExtensionTaskAnsible",
"options": {
"asset": "power-dns-configuration",
"playbook": "deploy_dns_flexible.yaml"
}
}
]
},
]
}
Other examples¶
Other examples are available on github:
Accessing the ansible logs¶
The logs as well as the extracted ansible archive is available inside the <task_uuid>/logs
directory inside the volume attached to the ansible-runner
docker container. For an example directory path:
/opt/metalsoft/ansible-jobs/5ee17203-13ba-40c9-9ed3-db12d111ee5e/ansible/
Troubleshooting¶
playbook not found This is often caused by a zip file that does not have the playbook on its root directory but rather inside a folder. Rezip the bundle and upload to the repository. Also check the playbook file name in the extension definition.
variable not found or undefined If the variable name does not match what is provided in the
variables.json
try extracting thevariables.json
file to view its contents. This is available in theansible-jobs
directory inside theansible-runner
docker container.Other ansible related errors To access the logs of the ansible execution see the
logs
directory in the ansible
Known issues¶
In some cases, when there is an issue with the ansible execution, the MetalSoft task in the deployment graph might hang for a long time before showing the error in the MetalSoft UI(aprox. 1h). WORKAROUND: Use the logs to determine the issue and kill the Task in the graph to be able to retry it or skip it. Use the logs in the
ansible-jobs
directory to diagnose the issue.In some cases, killing the task will not kill the ansible processes leaving running (and retrying) ansible processes in the ansible runner. WORKAROUND: Delete all files in the
ansible-jobs
directory in the ansible-runner docker container on the Site Controller.Simply retrying the workflow task does not re-download the updated ansible bundle. Retry the tasks above the workflow tasks to force the redownload.