Kuavo Robot ROS Tutorial: AprilTag Grasping, IK Integration, Simulation, and Real-World Deployment

This article walks through a standard Kuavo robot workflow in a ROS environment, covering package compilation, simulation and hardware startup, AprilTag perception, inverse kinematics, and grasp execution. It helps developers avoid common integration issues such as incorrect startup order, pose errors, and simulation lag. Keywords: Kuavo robot, AprilTag, ROS.

Technical Specification Snapshot

Parameter Details
Core Languages Python, Shell, YAML
Middleware / Framework ROS1, catkin
Main Protocols / Interfaces ROS Topics, WebSocket, AprilTag perception pipeline
Simulation Environment MuJoCo
Core Dependencies humanoid_controllers, kuavo_sdk, motion_capture_ik, humanoid_plan_arm_trajectory
Star Count Not provided in the source material
Typical Tasks Tag-based grasping, arm trajectory planning, action script execution

This workflow provides a standard integration path for Kuavo robot development

The core value of the original material is not any single command. It is the complete closed-loop workflow from the low-level controller to the high-level grasping script. Developers can proceed in the order of build, launch, perceive, verify, and execute to avoid the dependency breaks that commonly occur in ROS systems.

For humanoid robot development, this workflow is especially important. If the controller is not ready, the IK service is not running, or vision data is not being published, a grasping script may appear to start successfully while the robot performs no actual motion.

Compilation and the base environment must be stable first

First, build the three key packages: the controller, the SDK, and the inverse kinematics module. These packages handle robot control, interface capabilities, and the mapping from target poses to executable joint motions.

cd 
<kuavo_ros_opensource>   # Enter the workspace
sudo su                     # Switch to root for a consistent build environment
catkin build humanoid_controllers   # Build the controller
catkin build kuavo_sdk              # Build the SDK
catkin build motion_capture_ik      # Build the inverse kinematics module

These commands create the minimum runnable environment for the Kuavo robot grasping example.

The robot startup method must match the runtime environment

The source material clearly defines two startup paths: MuJoCo simulation and the physical robot. The controller launch files are different and must not be mixed. Otherwise, topics, hardware interfaces, or calibration flows may not align.

roslaunch humanoid_controllers load_kuavo_mujoco_sim.launch   # Start in simulation
roslaunch humanoid_controllers load_kuavo_real.launch cali:=true   # Start on real hardware and calibrate

This step determines the host environment for all subsequent perception and motion data flows.

The IK service is the control hub of the grasping motion

The vision layer only provides the target tag pose. The IK service is what actually moves the arm to the target position. After the robot controller is up, you must start the IK node.

source devel/setup.bash           # Load the workspace environment
roslaunch motion_capture_ik ik_node.launch   # Start the IK service

These commands map the target pose to a joint-space solution that the arm can execute.

The AprilTag data source determines whether the grasp target is visible

In simulation, you can use a mock publisher directly. On real hardware, you must start the head or related sensor pipeline. The essential difference is virtual tag data versus real visual perception input.

python3 src/demo/arm_capture_apriltag/mock_tag_publisher.py   # Simulated tag data

cd kuavo_ros_application
sros1                                           # Switch to the ROS1 environment
roslaunch dynamic_biped load_robot_head.launch  # Start sensors on real hardware

This step continuously publishes tag localization data such as /robot_tag_info into the system.

Validate the data pipeline before executing motion

In any robotic system, checking topics before running actions is a more reliable engineering practice than executing scripts immediately. Only after tag data is being published correctly does the grasping program have a realistic chance to succeed.

rostopic list | grep tag      # Check whether tag-related topics exist
rostopic echo /robot_tag_info # Inspect live tag data

These commands verify that the perception pipeline is actually working end to end.

The grasping script distinguishes simulation from real hardware through an offset switch

The original example includes both a dexterous-hand version and a claw version. The main difference is the end effector, but the input assumptions are the same: the robot is running, IK is working, and the tag is visible.

python3 src/demo/arm_capture_apriltag/arm_capture_apriltag.py --offset_start True   # Real robot with dexterous hand
python3 src/demo/arm_capture_apriltag/arm_capture_apriltag.py --offset_start False  # Simulation with dexterous hand
python3 src/demo/arm_capture_apriltag/claw_capture_apriltag.py --offset_start True  # Claw version

These scripts convert tag localization results into concrete grasping actions.

Key configuration values directly affect grasp accuracy and system stability

AprilTag must use the 36h11 family with ID=0, and it should be attached directly above the bottle cap because the default grasp point is defined directly below the tag. This means the visual reference frame and the grasp reference frame are already assumed in the code. If the target is not arranged according to that assumption, the error will increase directly.

standalone_tags:
  - {id: 0, size: 0.042, name: 'tag_0'}   # Configure the tag ID and physical size

This configuration ensures that the perception module interprets the tag pose at the correct scale.

Offset parameters are the main handles for real-world fine-tuning

If the grasp point is too high or too low, adjust offset_z. If the error is primarily forward or backward, modify temp_x_l/r. Left and right deviation corresponds to temp_y_l/r. These parameters compensate for camera mounting error, end-effector calibration error, and object placement error.

If the simulation stutters, increase the motion time parameter in publish_arm_target_poses and add more time.sleep delay. This reduces trajectory jitter caused by target updates arriving too quickly.

Simulation and real-hardware motion examples share the same upper-layer structure

In addition to the grasping example, the source material also provides a standard startup flow for arm trajectory planning and the WebSocket action service. This flow is suitable for demonstration motions, transport tasks, and remote-control integration.

source devel/setup.zsh
roslaunch humanoid_plan_arm_trajectory humanoid_plan_arm_trajectory.launch   # Start arm planning
roslaunch planarmwebsocketservice plan_arm_action_websocket_server.launch    # Start the action service
python3 ./src/demo/video/bump_salute.py                                      # Run the salute demo

These commands verify that the planning layer and the action service layer work together correctly.

The essential difference between simulation and real hardware lies in the control entry point, not the business script

Simulation uses load_kuavo_mujoco_sim.launch, while real hardware uses load_kuavo_real.launch. The upper-layer motion scripts remain mostly the same, which shows that the system design decouples the environment access layer from the task execution layer.

Action files are stored uniformly at /home/lab/.config/lejuconfig/action_files/xxx.tact. This means motion assets can be reused across environments, although low-level safety policies still prioritize real hardware constraints.

Developers should follow a fixed sequence before deployment

The safest sequence is to start the robot controller first, then the IK service, then the tag data source, then inspect the topics, and finally run the grasping script. This order covers nearly all of the hidden dependencies in the original example.

If you plan to extend the system, abstract these three layers first: the perception input layer, the pose-to-joint IK layer, and the end-effector action layer. This structure is better suited for supporting different tags, different end effectors, and different task objects.

FAQ

Why does the grasping script start, but the robot does not move?

In most cases, an upstream dependency is missing. First check whether the controller started successfully, whether the IK service is online, and whether /robot_tag_info has continuous data. If any of these is missing, the script may run without producing motion.

Why is the grasp position always slightly off on the real robot?

The most common causes are mismatches in AprilTag placement, tag size configuration, or offset parameters. First verify that the size value in tags.yaml matches the physical tag size, then tune offset_z, temp_x_l/r, and temp_y_l/r.

What should I tune first if the simulation is not running smoothly?

Start by increasing the motion publish duration and sleep interval, then observe whether the trajectory becomes smoother. If the issue remains, inspect MuJoCo simulation load, ROS publish frequency, and whether arm planning latency is out of balance.

AI Readability Summary

This article reconstructs a practical Kuavo robot workflow and focuses on package compilation, controller startup, AprilTag perception integration, IK service startup, and grasp-script execution in a ROS environment. It also provides a structured explanation of the differences between simulation and real hardware, key tuning parameters, and common troubleshooting paths.