Built on HarmonyOS 6 (API 23) with Face AR and Body AR, this article implements an intelligent workspace that can recognize emotion, understand posture, and drive adaptive UI behavior. It addresses the limitations of traditional AI assistants that rely on passive responses, cumbersome operations, and no contextual awareness. Keywords: HarmonyOS 6, Face AR, Body AR.
The technical specification snapshot outlines the implementation baseline
| Parameter | Description |
|---|---|
| Platform | HarmonyOS 6 / API 23 |
| Core Languages | ArkTS, ETS, JSON5 |
| Interaction Protocols / Capabilities | Face AR, Body AR, ArkUI multi-window |
| Typical Scenarios | PC smart workspace, touchless AI assistant |
| GitHub Stars | Not provided in the source |
| Core Dependencies | @hms.core.ar.arengine, @kit.ArkUI, @kit.AbilityKit, @kit.SensorServiceKit |
This solution upgrades AI assistants from passive response to proactive perception
Traditional assistants depend on clicks, wake words, and menu navigation. They cannot understand the user’s current emotional state or level of attention. The result is long interaction chains, mechanical feedback, and frequent workflow interruptions.
This design closes the loop across camera perception, on-device analysis, adaptive UI, and gesture control. Face AR identifies expression and fatigue, Body AR detects posture and motion, and ArkUI rewrites interface feedback in real time.
AI Visual Insight: This image shows an immersive PC-oriented workspace interface. Its core elements include bottom-floating navigation, emotion-driven ambient glow, and an independent floating AI panel, indicating a desktop-class interaction design built on layered UI overlays and emotion-linked themes.
The system architecture consists of three layers
The first layer is the AR perception layer, which captures facial BlendShapes, skeletal keypoints, pose offsets, and gaze features. The second layer is the on-device decision layer, which maps emotion, fatigue, and focus into recommendations or interaction actions. The third layer is the ArkUI presentation layer, which dynamically adjusts color, transparency, animation speed, and navigation complexity.
// Core idea: write AR perception results into global state to drive real-time UI updates
AppStorage.setOrCreate('current_emotion', stableEmotion) // Store stable emotion
AppStorage.setOrCreate('fatigue_level', fatigue) // Store fatigue level
AppStorage.setOrCreate('attention_level', attention) // Store attention level
AppStorage.setOrCreate('body_gesture', gesture) // Store gesture state
This code persists recognition results as global state so navigation, panels, and the main page can react in a coordinated way.
Environment initialization must enable both Face AR and Body AR capabilities
At the dependency level, the project is built around AR Engine, ArkUI, AbilityKit, and sensor services. The key goal here is not simply to make the app run, but to establish a unified capability foundation for multimodal interaction.
{
"dependencies": {
"@hms.core.ar.arengine": "^6.1.0",
"@kit.ArkUI": "^6.1.0",
"@kit.AbilityKit": "^6.1.0",
"@kit.SensorServiceKit": "^6.1.0"
}
}
This configuration declares the core SDKs required by the workspace and serves as the foundation for compilation and perception.
The Ability must manage both windowing and the AR session lifecycle
In EmotionAIAbility.ets, the window is configured in freeform and floating mode, which fits the floating assistant scenario on PCs. At the same time, the ARSession must enable both face and body features and start the recognition loop after the page loads.
const config = new ARConfig()
config.featureType = ARFeatureType.ARENGINE_FEATURE_TYPE_FACE |
ARFeatureType.ARENGINE_FEATURE_TYPE_BODY // Enable both face and body capabilities
config.maxDetectedBodyNum = 1 // Limit body detection count to reduce overhead
config.cameraLensFacing = arEngine.ARCameraLensFacing.FRONT // Use the front camera
this.arSession.configure(config)
await this.arSession.start()
This code completes dual-modal AR engine initialization, which is the prerequisite for running emotion recognition and gesture control at the same time.
The emotion recognition logic uses a rule engine with sliding-window stabilization
The original design infers localized features such as smiling, frowning, mouth opening, and eye widening from BlendShape parameters, then maps them to states such as happiness, sadness, irritation, surprise, and anxiety. To avoid single-frame jitter, the system uses a 30-second time window to compute a stable emotion.
This approach works well for prototypes and demo systems because it is interpretable, low-latency, and fully runnable on-device. In production, you should consider replacing the rule engine with a lightweight classification model, while keeping the sliding-window stabilization mechanism.
if (mouthSmile > 0.6 && eyeWide > 0.3) return EmotionState.HAPPY // Smile + wide eyes => happy
if (mouthFrown > 0.5 && browDown > 0.4) return EmotionState.SAD // Frown + lowered brows => sad
if (browDown > 0.7 && noseWrinkle > 0.3) return EmotionState.ANGRY // Lowered brows + wrinkled nose => irritated
return EmotionState.NEUTRAL // Fall back to a neutral state by default
This rule block converts local expression parameters into discrete emotion labels, making them easy to map into downstream UI behavior and AI recommendations.
Gesture control turns body keypoints into executable commands
Body AR recognizes gestures such as raised hands, pinching, spreading, forward leaning, and backward leaning through the positional relationships of keypoints like shoulders, wrists, and nose. Its value is not novelty alone. It reduces switching between mouse and keyboard and makes the AI assistant a low-interruption interaction entry point.
In this design, raising both hands wakes the assistant, moving both hands closer minimizes it, spreading both hands expands the panel, and leaning forward enters focus mode. The gesture layer also includes a 500 ms debounce interval to prevent repeated triggers.
if (leftUp && rightUp && distance < 80) {
return { type: 'pinch', confidence: 0.9 } // Both hands close together => pinch
}
if (leftUp && rightUp && distance > 200) {
return { type: 'spread', confidence: 0.85 } // Both hands apart => spread
}
if (leftUp || rightUp) {
return { type: 'point', confidence: 0.75 } // One hand raised => point
}
This logic maps skeletal coordinate relationships directly into high-level gesture events, making it straightforward to drive window behavior.
Emotion-adaptive navigation is the most product-ready interface practice in the stack
The core of EmotionAdaptiveNav.ets is not just a navigation bar, but a mapping table from emotion to visual parameters. Color, transparency, animation speed, glow intensity, and whether to simplify the UI are all driven by a unified emotional state.
For example, in anxious or fatigued states, the navigation reduces information density and keeps only Home, AI Assistant, and Profile. In happy states, it increases lighting effects and animation energy to reinforce positive feedback. This “perception becomes interface” design significantly improves the system’s sense of companionship.
The key emotion-to-UI mapping table is shown below
| Emotion | Primary Color | Animation Speed | UI Strategy | Special Behavior |
|---|---|---|---|---|
| Neutral | #4A90E2 | 1.0x | Standard mode | Normal navigation |
| Happy | #FF9500 | 1.2x | Enhanced feedback | Stronger lighting effects |
| Sad | #5B8BD4 | 0.8x | Simplified interface | Encouraging prompts |
| Irritated | #E74C3C | 0.5x | Reduced distraction | Noise-reduction suggestions |
| Anxious | #2ECC71 | 0.6x | Streamlined content | Breathing guidance |
| Fatigued | #95A5A6 | 0.4x | Strong reminders | Rest mode |
The AI assistant panel uses subwindow mechanics for desktop-class interaction
GestureAIPanel.ets uses createSubWindow to create an independent child window that supports free movement, minimization, and always-on-top display. That means the AI assistant is no longer just an in-page component. It becomes a persistent object in the desktop workflow.
The panel itself listens to global gesture events and focus state. When the user leans forward and enters focus mode, the panel automatically docks to the edge to avoid blocking content. When a wake gesture is detected, the panel slides in from the bottom and forms a stable interaction pattern.
AppStorage.watch('ai_awaken', (timestamp: number) => {
if (timestamp) {
this.panelVisible = true // Show the panel after gesture wake-up
this.panelMinimized = false // Exit minimized state at the same time
this.animatePanelEntry() // Run the entry animation
}
})
This watcher turns gesture events into window behavior and delivers true touchless panel control.
Performance and privacy strategies determine whether the solution can ship
In multimodal perception systems, the main risks are not recognition accuracy alone, but the performance cost of continuous computation and the privacy exposure of camera data. The original design points in the right direction: throttling, on-device processing, desensitized outputs, and explicit user control.
private emotionThrottleMs = 100 // Process once every 100 ms to reduce CPU/GPU pressure
AppStorage.setOrCreate('ar_perception_enabled', true) // Provide a privacy toggle
const storeEmotionSummaryOnly = true // Store only emotion summaries, not raw images
This configuration reflects three practical principles: compute less, store less, and allow users to turn it off.
You should validate four things first during on-device debugging
- Confirm that camera permissions are fully granted.
- Ensure the face and hands are under clear lighting conditions.
- Keep the user-to-camera distance within 0.5 to 1.5 meters.
- Verify that the z-order between the main window and the AI subwindow remains stable.
This type of application is best suited to office, creative, and design-oriented desktop scenarios
This solution is best suited to PC environments where users need long periods of focus, frequent multi-window switching, and fewer interruptions from traditional input methods. It is more than an AR demo. It is closer to a perception-driven system interaction prototype.
If you extend it further, you can integrate distributed soft bus synchronization for cross-device state sharing, or combine it with large language models to generate personalized recommendations. That would elevate emotion recognition from a UI trigger into a productivity orchestrator.
FAQ
1. Can rule-based Face AR recognition be used directly in production?
It works for prototype validation and moderately complex scenarios. For large-scale production, however, you should introduce a lightweight classification model while keeping sliding-window debouncing and exception fallback mechanisms.
2. What is the biggest challenge of Body AR gesture control in PC scenarios?
The biggest challenge is false triggering and environmental dependency. You need to tune camera angle, gesture thresholds, debounce timing, and the user activity area to avoid frequent misrecognition.
3. How can you balance emotion perception with user privacy?
The best practice is on-device inference, storing only emotion summaries, never uploading raw images, and providing an explicit toggle, clear permission messaging, and a degraded interaction path when the feature is disabled.
Core Summary
This article reconstructs an emotion-aware smart workspace solution built on HarmonyOS 6 API 23. It covers Face AR emotion recognition, Body AR gesture control, ArkUI adaptive interfaces, a multi-window AI assistant, and privacy optimization, making it a strong reference for interactive office application development on HarmonyOS PCs.