Build a HarmonyOS 6 PC Spatial Interaction Workbench with Face AR, Body AR, and Gesture-Driven 3D Modeling

This PC-based spatial interaction design workbench, built on HarmonyOS 6 (API 23), integrates Face AR expressions and gaze points with Body AR gestures and posture into the 3D modeling workflow, solving the limitations of traditional mouse-and-keyboard-based 3D manipulation. Keywords: HarmonyOS 6, Face AR, Body AR.

The technical specification snapshot provides a quick overview

Parameter Description
Language ArkTS / TypeScript
Platform HarmonyOS 6 (API 23) for PC
Core protocols/capabilities Face AR, Body AR, multi-window UI, AppStorage state synchronization
Star count Not provided in the source content
Core dependencies @hms.core.ar.arengine 6.1.0, @hms.core.ar.arview 6.1.0, @kit.ArkUI, @kit.AbilityKit

Spatial interaction design workbench diagram AI Visual Insight: This image presents a spatial interaction workbench interface designed for large-screen PC environments. Its defining traits include an immersive dark primary canvas, floating control panels, and an AR monitoring area, suggesting a professional workflow built around multi-window collaboration, real-time state visualization, and tight integration with a 3D design canvas.

This approach elevates PC AR from entertainment to a productivity entry point

The value of HarmonyOS 6 is not in simply moving mobile AR onto a computer. It lies in redefining 3D interaction on the PC. A larger display, higher-resolution camera options, and coordinated GPU and NPU processing allow Face AR and Body AR to serve design-centric workflows simultaneously.

Traditional 3D software relies on mapping 2D input devices into 3D space, which creates a steep learning curve and makes precision adjustments cumbersome. This approach maps interaction directly to natural designer actions: raise your eyebrows to switch materials, open your mouth to confirm, pinch with both hands to zoom, and lean forward to enter detail mode.

PC capability upgrades appear across input, compute, and workflow layers

Dimension Mobile PC
Camera input 1080P is common Optional 4K/8K peripherals
Tracking frame rate 30fps 60fps
Interaction model Usually single modality: Face or Body Concurrent Face + Body
UI organization Single window Main window + subwindow + HUD
const config = new ARConfig();
config.featureType = ARFeatureType.ARENGINE_FEATURE_TYPE_FACE
  | ARFeatureType.ARENGINE_FEATURE_TYPE_BODY; // Enable dual-modality tracking
config.multiFaceMode = ARMultiFaceMode.MULTIFACE_ENABLE; // Support multi-user review scenarios
config.imageResolution = { width: 1920, height: 1080 }; // Enable high-definition input
await this.arSession.start(); // Start the AR session

This configuration enables the PC AR session to run in high-definition, dual-modality, and multi-target mode.

The system architecture relies on three coordinated layers: the main workspace, monitoring window, and floating panels

The main window hosts the 3D modeling canvas and handles model rendering, material presentation, and view mode switching. The AR monitoring subwindow visualizes tracking results to reduce debugging complexity. The bottom HUD provides access to the material library, tracking status, and gesture mapping hints.

The key idea is not simply opening multiple windows. It is building a one-way data flow with AppStorage: the AR Engine produces state, and UI components subscribe to it as needed, which avoids direct coupling between windows.

Global state bus design is the core of multi-window synchronization

AppStorage.setOrCreate('face_expression', expressionMap); // Synchronize expression parameters
AppStorage.setOrCreate('gaze_point', { x: gazeX, y: gazeY }); // Synchronize the gaze point
AppStorage.setOrCreate('body_gesture', gesture); // Synchronize gesture state
AppStorage.setOrCreate('body_posture', posture); // Synchronize posture mode

This code writes Face AR and Body AR recognition results into global state so multiple windows can consume them in parallel.

AR engine initialization must be tailored for PC scenarios

PC-side initialization should focus on three areas: window mode configuration, high-resolution camera setup, and the AR data loop. The main window is well suited for fullscreen immersive rendering, while subwindows work better as always-on-top floating panels for debugging tracking output.

Face AR primarily provides BlendShape data and facial landmarks, while Body AR provides skeletal landmarks and posture data. Both are collected inside a unified loop so the UI layer always sees a fused state from the same moment in time.

Face AR and Body AR processing logic should remain decoupled

if (faceAnchors.length > 0) {
  const primaryFace = faceAnchors[0];
  this.processFaceData(primaryFace); // Process expressions and gaze points
}

if (bodies.length > 0) {
  const primaryBody = bodies[0];
  this.processBodyData(primaryBody); // Process gestures and posture
}

frame.release(); // Release frame resources to avoid buildup

This loop collects face and body data within the same frame and releases resources promptly.

The 3D design canvas maps natural actions into model transformations

On the main canvas, pinch and spread gestures trigger zoom out and zoom in, one-hand pointing drives rotation, leaning forward switches into detail mode, and leaning back returns to global mode. Facial expressions do not act directly on geometry. Instead, they control the material system and confirmation actions.

This separation matters. Body AR is better suited for continuous control, while Face AR is better suited for discrete commands. The former handles transformation, and the latter handles selection.

Gesture and expression mapping should be implemented in separate layers

switch (gesture.type) {
  case 'pinch':
    this.modelTransform.scale = Math.max(0.3, this.modelTransform.scale - 0.02); // Pinch to scale down
    break;
  case 'spread':
    this.modelTransform.scale = Math.min(3.0, this.modelTransform.scale + 0.02); // Spread to scale up
    break;
  case 'point':
    this.modelTransform.rotationY += 2; // One-hand pointing drives rotation
    break;
}

This logic maps skeletal recognition results to model scaling and rotation behavior.

The AR monitoring window significantly reduces interaction debugging cost

In the original design, the monitoring page displays Face AR state, expression parameter bars, a gaze-point heatmap, and Body AR gesture status at the same time. For debugging, this is far more effective than watching only the main canvas because developers can quickly determine whether a problem originates in the recognition layer or the rendering layer.

This is especially important on PCs, where external cameras, distance changes, and lighting conditions all affect recognition stability. A dedicated monitoring window makes it easier to calibrate thresholds and interaction mappings quickly.

The monitoring view should retain status indicators and key parameter bars

ctx.fillStyle = this.faceDetected ? '#00FF88' : '#FF4444';
ctx.beginPath();
ctx.arc(15, 45, 5, 0, Math.PI * 2); // Draw the detection status indicator
ctx.fill();
ctx.fillText(this.faceDetected ? 'Face detected' : 'No face detected', 28, 50);

This code displays recognition state intuitively in the monitoring window, making it easier to debug the camera and tracking pipeline.

Performance optimization must focus on frame-rate isolation and state throttling

The bottleneck in this system usually does not come from a single recognition pass. It comes from the combined cost of 60fps tracking, multi-window refresh, and canvas rendering. The right strategy is not to refresh everything blindly. Instead, keep AR tracking at high frequency and let the UI consume updates at 30fps or lower.

Gesture recognition also needs debouncing to prevent skeletal jitter from causing the model to oscillate. The monitoring window can run at a lower frame rate as well, ensuring the main canvas gets GPU priority.

if (now - this.lastUIUpdate > 33) {
  AppStorage.setOrCreate('face_expression', expressionMap); // Update the UI at about 30fps
  this.lastUIUpdate = now;
}

This throttling logic reduces UI update frequency and helps stabilize the overall rendering frame rate.

Device adaptation strategy should be built around camera quality and tracking distance

A built-in 720P camera can support basic Face AR interaction, but complex gestures and multi-user scenarios are better served by an external 4K camera or a depth camera. A recommended recognition distance is 1 to 2 meters to avoid faces being too close or skeletons moving out of frame.

For third-party PCs, add capability detection and graceful fallback strategies. For example, enable only Face AR, lower the resolution, or disable part of the BlendShape precision so lower-end devices can still run the core workflow.

FAQ provides structured answers to common implementation questions

Q1: Why is the PC better suited than mobile for combining Face AR and Body AR in 3D modeling?

A: PCs provide a larger interaction space, higher-resolution camera options, and stronger GPU/NPU coordination. That makes them better suited for processing expressions, gaze points, gestures, and posture at the same time while presenting both design and monitoring interfaces in a multi-window layout.

Q2: Why should Face AR and Body AR be mapped to different functions?

A: Face AR is better for discrete commands such as material switching, confirmation, and gaze-based focus. Body AR is better for continuous control such as rotation, scaling, and translation. This separation reduces accidental input and improves interaction stability.

Q3: How do you prevent frame drops and lag in a multi-window AR application?

A: Use a layered refresh strategy with high-frame-rate AR tracking and lower-frame-rate UI updates. Debounce gesture output, lower the monitoring window refresh rate, and use AppStorage for state broadcasting to avoid direct cross-window calls that introduce coupling and blocking.

AI Readability Summary: This article reconstructs a PC-based AR design workbench built on HarmonyOS 6 API 23. It covers Face AR, Body AR, multi-window architecture, 3D canvas integration, and performance optimization to help developers quickly understand how to build a contactless 3D modeling system.