AI Ops overview
ELI (now at version 2.0) is MetalSoft’s AI component. It is designed to support admins and users in day-to-day activities, reduce toil and improve efficiency. As an consolidated system, MetalSoft leverages clean cross-function data and enables agents with capabilities unmatched by other solutions.
| Capability | Description | Strategic Value |
|---|---|---|
| Tenant Infrastructure Operation and Rightsizing | Tenants can use Natural Language to describe what they need. This is an agentic system in the sense that it can implement a multi-step process to implement the desired intent | This, connected with Service Now or a Service Desk system could implement the user’s requirements optimally. |
| Admin Operation and Troubleshooting | Admins can use the AI assistant to quickly retrieve information from the Database or from devices | This helps reduce toil and learning curves. |
| Root Cause Analsyis | If enabled ELI can perform automatic troubleshooting when an event is matched. | This helps expedite ticket resolution, MTTR and generally reduce toil. |
ELI offers multiple services:
- A user-level chat interface that can be used to build or modify tenant infrastructures.
- An admin-level chat interface that can be used to perform troubleshooting and generally to interact with both the MetalSoft database and with MetalSoft-managed equipment (readonly for the moment)
- Automatic RCA (Root Cause Analysis) that, if enabled, will perform a troubleshooting session that goes beyond the traditional methods of RCA and actively plans and executes a troubleshooting process.
Safety and security
Section titled “Safety and security”Types of access
Section titled “Types of access”There are multiple types of access ELI uses:
- ELI in the infrastructure editor uses the user’s access level. It is no different than using the API key to access the API. Only the respective functions will be accesable.
- ELI in the Admin uses the Admin user’s access.
- ELI in the troubleshooting process uses readonly admin access.
Data access
Section titled “Data access”An important advantage of using MetalSoft in conjunction with AI is that our System maintains accurate System of Record for all relevant information such as that the AI can use directly:
| MCP Tool | Description | Strategic Value to an agent |
|---|---|---|
| Customer intent | Stores the desired end-state for all infrastructure elements. | Critical: Detects drift and suggests automated fixes. |
| Inventory data | Provides a real-time, accurate catalog of available hardware and resources. | Optimizes resource allocation for specific use cases. |
| Topology information | Mapping of physical cabling and high-level logical interconnects. | Visualizes dependencies for rapid troubleshooting. |
| Log & Event Access | Real-time stream of equipment logs and system events. | Ingests live telemetry to diagnose incidents. |
| MetalSoft Metadata | Access to server profiles, network policies, and system states. | Performs impact analysis before executing actions. |
| Live Protocol Data | Direct access to live switch info (LLDP, BGP, etc.). | Provides ground truth for protocol-level repair. |
This data access allows the Agent to make proper planning in the case of troubleshooting, compare with what is actually expected, determine anomalies and more.
It does not however have access to:
- Credentials
- PII data
Equipment access
Section titled “Equipment access”MetalSoft provides safe access to equipment to the agents by interposing our deterministic services between the agent and the devices. The agents never have access to the credentials of the devices or access to execute arbitrary commands. For example to be able to retrieve the logs of a switch the Agent calls an MCP tool called “retrieve logs from switch x”.
Generally the agents have only read-only access from an Admin perspective. The agents do not have access to:
- Deploy tenant infrastructures. It can design them but the human must press the deploy button.
- Deploy underlay or other switch changes.
- Delete servers or effect changes on the servers, storage or other equipment at the Admin level.
Models used
Section titled “Models used”Closed source models
Section titled “Closed source models”ELI can be connected to any model that is thinking and can use tools such as those provided by OpenAI or Google. ELI expects an OpenAI-compatible interface.
Open Source Models
Section titled “Open Source Models”ELI can be deployed fully on-prem, air-gapped environments, leveraging an open-source model such as qwen3:235b or Mistral. At the minimum 2x H100 NVIDIA GPUs are required. MetalSoft provides all of the the required components for a fully dedicated setup but a shared Enterprise-wide AI model inference setup can also be leveraged.
ELI in the Infrastructure editor
Section titled “ELI in the Infrastructure editor”Inside the Infrastructure editor you can use the new Agent manage, optimize, rightsize infrastructure elements using natural language. For example:
- “Create a kubernetes cluster for 100 cores, select the appropriate server types”
- “Create a server group with the smallest available bare metal server”
- “Create a Terraform manifest for implementing this infrastructure”

Everything that can be done via the UI can also be done via the new agent.
ELI in the Admin interface
Section titled “ELI in the Admin interface”The Admin agent has access to a subset of the functionality available via the API.
Example prompts:
Section titled “Example prompts:”These skills can be combined in any way in complex prompts such as:
get me all the vlans configured on ports where LLDP neighbors show a switch
or
what is the mac address of the Management0 switch connected to switch 15?
Eli will execute a multi-stage process to retrieve the needed information and finally combine it into a single result etc. Eli is able to iterate of elements, do complex logic and more.

Getting the switch ports of a switch
Section titled “Getting the switch ports of a switch”prompt example: what switch ports are admin up on switch 5?
output:
The following switch ports are admin up on switch 5:ethernet1/1/2ethernet1/1/13:2ethernet1/1/13:4ethernet1/1/14:2ethernet1/1/14:4ethernet1/1/15:4ethernet1/1/7ethernet1/1/11ethernet1/1/15:1ethernet1/1/15:2ethernet1/1/15:3ethernet1/1/5ethernet1/1/6ethernet1/1/8ethernet1/1/12ethernet1/1/13:1ethernet1/1/14:1ethernet1/1/1ethernet1/1/13:3ethernet1/1/14:3Getting the VLANs configurations on a switch
Section titled “Getting the VLANs configurations on a switch”prompt example: get all vlans configured on switch 15
Ports with VLAN 5 configured on switch 15: Eth1/2/2, Eth1/1/1, Eth1/2/3, Eth1/2/4, Eth1/4/2, Eth1/3/3, Eth1/1/2, PortChannel256, Eth1/4/1, Eth1/3/4, Eth1/1/3, Eth1/2/1, Eth1/3/1, Eth1/1/4, Eth1/4/4, Eth1/3/2, Eth1/4/3another prompt example:is there any port that is down configured with vlan 5 on switch 15?
Analysis result
The following down ports are configured with VLAN 5 on switch 15:Eth1/1/3Eth1/3/3Eth1/2/1Eth1/4/2Eth1/1/2Eth1/2/3Eth1/2/4Eth1/3/2Eth1/1/1Eth1/3/4Eth1/4/3Eth1/2/2Eth1/4/4Eth1/3/1Eth1/1/4Eth1/4/1Getting BGP Sessions status of a switch
Section titled “Getting BGP Sessions status of a switch”prompt example:get the bgp sessions of switch 5
output:
BGP Sessions for switch 5:Session 1: Neighbor Name: 192.168.122.1 Enabled: true Active: true Uptime Seconds: 1203847756 State: established Last Error: Reset by peerGetting LLDP neighbors of a switch
Section titled “Getting LLDP neighbors of a switch”prompt example: is there any lldp neighbour of type switch on switch 15?
LLDP Neighbors Analysis Results
The following LLDP neighbors of type 'switch' were found on switch 15:Neighbor Name: sw1-dell-s5232fNeighbor Chassis MAC Address: c4:5a:b1:48:e4:85Neighbor Name: US_CHG_QTS01_01_MJ40_ML43_01Neighbor Chassis MAC Address: d8:94:03:96:5c:fbNeighbor Name: sw1-dell-s5232fNeighbor Chassis MAC Address: c4:5a:b1:48:e4:85Neighbor Name: US_CHG_QTS01_01_MJ40_MGMT_01Neighbor Chassis MAC Address: d0:7e:28:ec:3b:e8Neighbor Name: US_CHG_QTS01_01_MJ40_ML43_01Neighbor Chassis MAC Address: d8:94:03:96:5c:fb- Getting a switch log
prompt examples:
get the switch logs of switch 5output:
Thu Jun 27 22:25:26 2024 informational - LACP_PORT_UNGROUPED port-channel30 : ethernet1/1/4Thu Jun 27 22:25:26 2024 informational - VLT_PORT_CHANNEL_DOWN 30 is downThu Jun 27 22:25:53 2024 informational - IFM_OSTATE_UP :ethernet1/1/4Thu Jun 27 22:25:54 2024 informational - IFM_OSTATE_DN :ethernet1/1/4Thu Jun 27 22:25:55 2024 informational - IFM_OSTATE_UP :ethernet1/1/4Thu Jun 27 22:26:10 2024 informational - VLT_PORT_CHANNEL_UP 30 is upThu Jun 27 22:26:11 2024 informational - IFM_OSTATE_DN :ethernet1/1/4Thu Jun 27 22:26:11 2024 informational - LACP_PORT_GROUPED port-channel30 : ethernet1/1/4Thu Jun 27 22:26:11 2024 informational - LACP_PORT_UNGROUPED port-channel30 : ethernet1/1/4Thu Jun 27 22:26:11 2024 informational - IFM_OSTATE_UP :ethernet1/1/4Thu Jun 27 22:26:11 2024 informational - VLT_PORT_CHANNEL_DOWN 30 is downThu Jun 27 22:26:11 2024 informational - IFM_OSTATE_DN :ethernet1/1/4Thu Jun 27 22:26:12 2024 informational - IFM_OSTATE_UP :ethernet1/1/4Thu Jun 27 22:26:27 2024 informational - VLT_PORT_CHANNEL_UP 30 is upThu Jun 27 22:30:13 2024 informational - VLT_PORT_CHANNEL_DOWN 30 is downThu Jun 27 22:30:15 2024 informational - IFM_OSTATE_UP :port-channel30Thu Jun 27 22:30:15 2024 informational - LACP_PORT_GROUPED port-channel30 : ethernet1/1/4Thu Jun 27 22:30:15 2024 informational - VLT_PORT_CHANNEL_UP 30 is upFri Jun 28 01:29:05 2024 informational - IFM_OSTATE_DN :ethernet1/1/4Fri Jun 28 01:29:06 2024 informational - IFM_OSTATE_DN :port-channel30Fri Jun 28 01:29:06 2024 informational - LACP_PORT_UNGROUPED port-channel30 : ethernet1/1/4Fri Jun 28 01:29:06 2024 informational - VLT_PORT_CHANNEL_DOWN 30 is downFri Jun 28 01:29:34 2024 informational - IFM_OSTATE_UP :ethernet1/1/4Fri Jun 28 01:29:34 2024 informational - IFM_OSTATE_DN :ethernet1/1/4Fri Jun 28 01:29:35 2024 informational - IFM_OSTATE_UP :ethernet1/1/4Fri Jun 28 01:29:50 2024 informational - VLT_PORT_CHANNEL_UP 30 is upFri Jun 28 01:29:51 2024 informational - IFM_OSTATE_DN :ethernet1/1/4Fri Jun 28 01:29:51 2024 informational - IFM_OSTATE_UP :ethernet1/1/4Fri Jun 28 01:29:51 2024 informational - VLT_PORT_CHANNEL_DOWN 30 is downFri Jun 28 01:29:52 2024 informational - IFM_OSTATE_DN :ethernet1/1/4Fri Jun 28 01:29:53 2024 informational - IFM_OSTATE_UP :ethernet1/1/4Fri Jun 28 01:30:08 2024 informational - VLT_PORT_CHANNEL_UP 30 is upFri Jun 28 01:33:50 2024 informational - VLT_PORT_CHANNEL_DOWN 30 is downFri Jun 28 01:33:52 2024 informational - LACP_PORT_GROUPED port-channel30 : ethernet1/1/4Fri Jun 28 01:33:52 2024 informational - IFM_OSTATE_UP :port-channel30Fri Jun 28 01:33:53 2024 informational - VLT_PORT_CHANNEL_UP 30 is upPrompt example: is there any error in the log for switch 15?

Using the topology information
Section titled “Using the topology information”Note that this part is subject to change.
The topology information is available to the AI component in the form of a graph with nodes. Note that the AI might confuse lldp-based toplogy with the known topology. When questioned if switch 5 is connected to server 15 it will try an lldp query first. To force it to use the known topology use explicit prompts:
prompt: based on the topology what are the peers of switch 15?
ELI for Automatic Troubleshooting
Section titled “ELI for Automatic Troubleshooting”ELI can also be automatically invoked when an equipment-generated syslog event is triggered. To enable this behaviour check the “Send to ELI” checkbox in the Global Configurations -> Device Alerts for the rules that need to invoke the agent.
The Alert emails that will sent our will then contain the output of the investigation. Note that this will delay somewhat the alert from being received with perhaps a few minutes.
