Designing a Universal Automation Protocol with Five Meta-Commands for UI, CLI, and API - Devuly | Smart Analytics for Developers & Projects

A universal environment interaction protocol for AI agents uses five meta-commands—do, get, look, wait, and assert—to unify UI, terminal, API, and database automation. It addresses three major pain points: tightly coupled scripts, poor cross-environment portability, and weak agent generalization. Keywords: meta-command protocol, adapter architecture, AI automation.

Table of Contents

The technical specification snapshot defines the protocol at a glance

Parameter	Details
Core theme	Universal meta-command automation protocol
Interaction targets	UI, CLI, HTTP API, databases
Number of meta-commands	5 (`do` / `get` / `look` / `wait` / `assert`)
Architecture pattern	Protocol router + pluggable adapters
Reference language	C# / generic protocol implementation
Key interface	`IEnvironmentAdapter`
Protocol goals	Domain-agnostic, discoverable, generalizable
Core dependencies	Asynchronous task model, JSON parameters, pluggable adapter layer
Star count	Not provided in the original article
Protocol value	Unified automation and improved AI operability

This protocol elevates automation from interface scripting to an environment interaction layer

Traditional automation frameworks usually split by environment: UI automation handles clicks and text input, terminals handle command execution, and APIs handle request-response flows. As a result, scripts are hard to migrate, interpreters are hard to reuse, and AI agents must learn a separate operating model for each environment.

The key idea in this design is not to keep expanding UI-specific instructions. Instead, it abstracts “operating on an interface” into “interacting with an environment.” Once that abstraction is in place, interfaces, terminals, and server-side endpoints become just different execution surfaces.

Traditional and universal designs differ in several critical ways

Dimension	Traditional automation	Universal protocol
Verb semantics	UI-colored and domain-specific	Domain-agnostic
Target object	Buttons, input fields, and similar controls	Any addressable entity
Execution engine	Tightly coupled to a specific framework	Decoupled through adapters
Portability	Low	High
AI generalization	Weak	Strong

Five meta-commands form the minimum viable interaction primitives

The protocol keeps only five actions: do, get, look, wait, and assert. Together, they cover five classes of behavior: applying an effect, reading state, perceiving the environment, synchronizing through waiting, and validating outcomes.

The value of this design is that the verbs themselves carry no domain knowledge. do can mean clicking a button, sending an HTTP request, or even executing a SQL write. The adapter interprets the actual semantics, not the command itself.

do    [action] [target] [params]   # Apply an action to the environment
get   [entity] [query]             # Read state or a value from the environment
look  [scope]                      # Retrieve the current world model
wait  [condition] [timeout]        # Wait until a condition is satisfied
assert [predicate] [params]        # Verify whether an assertion holds

These five meta-commands cover most automation loops: observe first, act next, read again, wait if needed, and validate at the end.

`look` is the foundational capability for AI environment perception

Unlike traditional automation, look does not require a fixed return format. It only requires the adapter to return the observable state for that domain. For a UI, it can return a component tree. For a CLI, it can return terminal output. For an API, it can return a resource list or an OpenAPI description.

This allows AI to stop depending on hard-coded element locators and instead understand the current environment structure at runtime.

Pluggable adapters are the bridge that makes the protocol practical

Once the protocol is unified, the remaining question is how to connect concrete environments. The answer is to define a standard adapter interface and let each domain handle its own translation.

public interface IEnvironmentAdapter
{
    Task
<OperationResult> DoAsync(string action, string? target, Dictionary<string, object>? parameters); // Execute an action
    Task
<OperationResult> GetAsync(string entity, Dictionary<string, object>? queryParams); // Read state
    Task
<OperationResult> LookAsync(string? scope, Dictionary<string, object>? options); // Retrieve the world model
    Task
<OperationResult> WaitAsync(string condition, int timeoutMs); // Wait for a condition
    Task
<OperationResult> AssertAsync(string predicate, Dictionary<string, object>? parameters); // Validate an assertion
    Task<OperationResult<List>> DiscoverAsync(string? scope); // Discover operable entities
}

This interface cleanly separates the protocol layer from the execution layer. The core framework only knows that it should call DoAsync or LookAsync. Whether the underlying engine is Selenium, a shell, an HTTP client, or a database driver is entirely up to the adapter.

Code purpose: This interface defines a unified environment adapter contract so any execution engine can plug into the same protocol.

A command flows through the UI adapter in a predictable sequence

Take do click btnSubmit as an example. The router first parses the verb, action, and target. It then dispatches the request to the currently active UI adapter. Finally, the adapter maps click to a concrete framework call, such as Selenium’s click().

This shows that the protocol layer does not need to understand what a “button” is. It only preserves interaction intent, while all environment semantics are pushed down into the adapter.

command = {
    "verb": "do",
    "action": "click",
    "target": "btnSubmit"
}

# Select the adapter based on context
adapter = registry.get_active_adapter("ui")

# Route the abstract command to the concrete environment
result = adapter.do_async(
    command["action"],  # The adapter interprets the action
    command["target"],  # The adapter resolves the target
    None
)

Code purpose: This example shows how a command router hands an abstract meta-command to a concrete environment for execution.

The command interpreter should be refactored into a pure router

Legacy interpreters often pack element location, event triggering, and retry logic into the core layer, eventually producing an unmaintainable coupling mess. A better approach is to reduce the interpreter to a CommandRouter.

It should do only three things: parse commands, select adapters, and return results. No domain behavior should appear inside the router. This is the only way to keep the core stable and testing straightforward.

The router’s responsibility boundary must stay minimal

Parse Command { Verb, Target, Parameters }
Select an adapter based on context
Call the unified interface and return OperationResult

An architecture with this kind of clean boundary is naturally suited for AI agents, because AI needs a consistent interface more than it needs environment-specific details.

This protocol gives AI a unified operational view

When the protocol is exposed to AI, the agent sees the same set of primitives when dealing with GUIs, terminals, and APIs—instead of three completely different automation systems.

# GUI
do click btnLoad
get lblStatus
look mainWindow

# CLI
do execute ls -la directory=/var/log
get fileSize /var/log/syslog
look .

# HTTP API
do request method=POST url=https://api.example.com/data
get responseHeader X-Request-Id
look /endpoints

Code purpose: This example demonstrates how the same protocol expresses interactions consistently across three environment types.

The combination of look and DiscoverAsync is especially important. It allows AI to inspect the environment first and construct actions afterward, rather than depending entirely on manually orchestrated scripts. That is the foundation for zero-configuration generalization across environments.

The architecture can absorb and extend existing automation systems

This approach does not replace existing UI automation. Instead, it wraps existing executors inside a unified protocol. Existing configuration files can become the world model returned by look, existing component calls can map to do and get, and existing AI integration hubs can evolve into multi-adapter aggregation entry points.

That means previous investments do not become obsolete. On the contrary, they gain cross-environment orchestration capabilities through protocol unification.

The design matters not just because it is more general, but because it is more evolvable

The long-term value of this model is that even if AR/VR, robotic terminals, or new interface protocols emerge in the future, the system only needs new adapters. It does not need to rewrite the core interaction logic.

The protocol stays stable while environments keep changing. The core remains constant while the boundary expands. That is exactly the kind of abstraction that creates lasting engineering value.

FAQ structured Q&A

1. Why not directly extend existing UI automation frameworks?

Because UI commands naturally carry interface-specific semantics. Once you extend them to CLI or API scenarios, you introduce conceptual pollution and execution coupling. A universal protocol separates interaction intent from execution surfaces through domain-agnostic verbs.

2. What is the difference between `look` and `get`?

get reads the value of a known entity and is ideal for precise queries. look returns the overall or partial world model of the current environment and is better suited for perception, exploration, and dynamic decision-making.

3. Which scenarios is this protocol best suited for?

It is ideal for AI agents, multi-system orchestration, automation testing platforms, digital worker platforms, and engineering systems that need a unified entry point for GUI, terminal, API, and database operations.

Core summary

This article reconstructs a universal meta-command protocol for AI agents and automation systems. It uses five domain-agnostic verbs—do, get, look, wait, and assert—to unify UI, terminal, HTTP API, and database interactions, while adapters and routers provide cross-environment reuse, testability, and zero-configuration generalization.