Troubleshooting server registrations using the Operation’s Graph (AFC tree)¶
Once you commence registering a server, whether it be using the server registration form or ZTP, navigate to the Job Queue in the Admin Dashboard under Advanced or click on the server under Servers tab and click on the AFC Group ID under Allocation:
If using the job queue, Search for the job which has the server ID which is being deployed and click on the Graph number:
This will show the AFC graph. You can use the + and - on the right to zoom in and out. AFC steps which have passed will be in green and will be marked with Success AFC steps which are running will be in orange and will be marked with Running AFC steps which are retrying will be in orange and will be marked with Retrying and will go back to Running AFC steps which have failed will be in red and will be marked with Failed
To see details of an AFC step, click on the specific AFC. From here you will be able to see its status, and if it has failed, the reason
If an AFC step has failed, here you will see this reason
In some instances, you can just retry the job by clicking on Retry in the top right (Skip and advanced steps should be avoided unless you know what you are doing and you have found the reason for the issue)
To assist in narrowing down the problem, you can see what the AFC step is doing in the name which is shown here (In this example, server_start_cleanup_via_oob)
There are various reasons for an AFC to fail, including network issues, hardware failures and others.
Below are some guidelines to assist in troubleshooting failed AFC steps.
If you wish to cancel the registration:
You can Kill AFC Running Process and skip from any AFC to clear the registration, then you can delete the failed server from the Servers/Advanced tab.
If any of the following fail: “server_registering_disks_setup” “applyDefaultRaidProfile” “server_setup_sol” “server_detach_virtual_media” “server_start_cleanup_via_oob” “server_setup_tpm” “server_setup_inteltxt” “server_vnc_console_enable” “server_enable_ipmi_over_lan_via_redfish” “server_management_snmp_change_if_not_set” “server_firmware_info_update” “server_registering_collect_monitoring_metadata” “server_sol_support_set”
Please connect to the BMC of the server and check if there are any errors in the BMC logs. You may need to reset the BMC in some circumstances if it is not operating as expected. If the job has completed, you should be able to retry the event
If “server_boot_bdk_from_virtual_media” fails:
This could either be an issue with NFS or the Agents VM. Check the BMC to see if the BDK iso has been mounted and if the server has booted from the BDK. If it has not, then retry this step
If “server_gather_nic_info_via_sol” or “server_registering_interfaces_setup” fail:
This is normally if LLDP is not set on the switch ports or if there are cabling issues (flapping links or faulty cable) or if the port is disabled by your networking team. You can troubleshoot further by logging into the booted BDK image and checking lldptool. If the networking team has resolved the issue, you can retry this step. If retrying this step does not work, then the BDK may no longer be booted and you may need to retry !!server_boot_bdk_from_virtual_media” and then retry “server_gather_nic_info_via_sol”
If “server_bdk_network_setup_stage” or “server_registering_interfaces_setup” fail:
This is normally a networking issue. Please see the step above. “server_bdk_network_setup_stage” can be retried, but if “server_registering_interfaces_setup” fails, then you may need to start by retrying “server_boot_bdk_from_virtual_media”, then “server_gather_nic_info_via_sol”, then “server_power_set”, then “server_detach_virtual_media” and lastly “server_registering_interfaces_setup”