Build a Smart Interactive Classroom on HarmonyOS 6 with Face AR Attention Tracking and Body AR Gesture Control

Technical Specifications at a Glance

Parameter Details
Platform HarmonyOS 6 (API 23)
Language ETS / TypeScript
Core Protocols Distributed device discovery, on-device AR data processing
GitHub Stars Not provided in the source
Key Dependencies @hms.core.ar.arengine 6.1.0, @kit.ArkUI, @kit.AbilityKit, @kit.DistributedServiceKit, @kit.SensorServiceKit

Built on HarmonyOS 6 and AR Engine 6.1.0, this teacher-side smart interactive classroom system uses Face AR to evaluate student attention in real time, Body AR to enable touchless teaching controls, and the distributed soft bus to synchronize learning-state data across devices. It addresses three common online teaching pain points: invisible student status, inefficient interaction, and no usable data for post-class review. Keywords: HarmonyOS 6, Face AR, Body AR.

This solution restructures the teacher-side digital classroom experience around real teaching pain points

The biggest challenge in online teaching is not content production. It is the lack of continuous visibility into student state. In traditional live classes, teachers can only rely on students speaking up, chat feedback, or session replays, which makes pacing highly dependent on experience.

This solution brings HarmonyOS 6 Face AR and Body AR into the teaching workflow. Face AR identifies student expressions, gaze direction, and fatigue signals. Body AR recognizes teacher gestures, creating a dual closed loop of learning-state awareness and touchless control.

The classroom system architecture is divided into four layers

  1. Student devices collect Face AR data.
  2. The teacher device aggregates attention and emotional state.
  3. Body AR maps teacher movements to page turns, zoom, and menu controls.
  4. The UI layer delivers real-time feedback through floating navigation and immersive lighting effects.

Smart Interactive Classroom Architecture AI Visual Insight: The image shows a classroom architecture where a large teacher-side display works with multiple student devices. The left side is the courseware presentation area, the right side is an AR learning-state overlay, and multiple student endpoints connect at the bottom. The key technical idea is that the teacher display uses a split-screen layout to host both teaching content and a status panel, while student devices continuously send back Face AR results. The system then unifies courseware control, attention monitoring, and interaction feedback in a single interface.

Environment setup must center on the AR engine and distributed capabilities

The critical project dependencies are not in the UI layer, but in AR Engine and distributed services. The minimum visible dependency set includes AR Engine, ArkUI, AbilityKit, DistributedServiceKit, and sensor services.

{
  "dependencies": {
    "@hms.core.ar.arengine": "^6.1.0",
    "@kit.ArkUI": "^6.1.0",
    "@kit.AbilityKit": "^6.1.0",
    "@kit.DistributedServiceKit": "^6.1.0",
    "@kit.SensorServiceKit": "^6.1.0"
  }
}

This configuration defines the minimum runtime dependencies for the classroom system. The core objective is to make AR perception, window capabilities, and cross-device collaboration available at the same time.

The teacher window should be designed to show both courseware and learning-state panels

On the teacher PC, use a fullscreen freeform window and disable the title bar and system gestures to prevent accidental touches during instruction. For layout, allocate 70% of the screen to courseware on the left and 30% to the learning-state panel on the right. This better matches the primary workflow of live teaching.

private async setupTeacherWindow(windowStage: window.WindowStage): Promise
<void> {
  this.mainWindow = windowStage.getMainWindowSync();
  await this.mainWindow.setWindowMode(window.WindowMode.FULLSCREEN); // Fullscreen teaching mode
  await this.mainWindow.setWindowTitleBarEnable(false); // Hide the title bar
  await this.mainWindow.setWindowGestureDisabled(true); // Disable system gestures to avoid accidental touches
  windowStage.loadContent('pages/TeacherClassroomPage'); // Load the main classroom page
}

This code completes immersive initialization of the teaching window and provides the foundation for a stable classroom experience.

The attention analysis engine determines whether the system is explainable

In the original solution, FocusAnalyzer is the most valuable module. It does not stop at face recognition. Instead, it breaks attention into three dimensions: focus, fatigue, and engagement. It then aggregates them with weighted scoring to produce a final attention score from 0 to 100.

Focus is derived from gaze stability, fatigue is derived from blink and yawning signals, and engagement is derived from facial expression activity. The advantage of this design is explainability: teachers can understand why the system marked a student as distracted.

const focusScore = Math.round(
  attention * 0.4 +      // Focus contributes 40%
  (100 - fatigue) * 0.3 + // Lower fatigue produces a higher score
  engagement * 0.3       // Engagement contributes 30%
);

This formula compresses multidimensional facial signals into a unified score, making it easier to run class-level statistics and trend analysis.

Gaze heatmaps turn learning-state data from numbers into spatial distribution

The system does not stop at a single score. It also outputs a 10×10 gaze heatmap. This allows teachers to see not only who is losing focus, but also whether students’ gaze is drifting away from the center of the screen.

In practice, this type of heatmap is more useful than raw numbers because it provides a visual diagnostic entry point. If the hotspot shifts broadly across the whole class, the issue may not be the students. It may be the courseware design or camera placement.

Body AR translates teacher gestures into classroom control commands

The teacher gesture control layer maps body movements to courseware commands. Typical mappings include: raise the left hand for the previous slide, raise the right hand for the next slide, raise both hands to toggle the menu, pinch to zoom out, spread to zoom in, and lean forward to enter laser-pointer mode.

private executeGestureCommand(gesture: TeacherGesture): void {
  switch (gesture) {
    case TeacherGesture.LEFT_HAND_UP:
      this.broadcastSlideCommand({ type: 'prev', timestamp: Date.now() }); // Left hand up: previous slide
      break;
    case TeacherGesture.RIGHT_HAND_UP:
      this.broadcastSlideCommand({ type: 'next', timestamp: Date.now() }); // Right hand up: next slide
      break;
    case TeacherGesture.BOTH_HANDS_UP:
      this.navExpanded = !this.navExpanded; // Both hands up: expand or collapse the menu
      break;
  }
}

This logic converts Body AR recognition output into broadcastable teaching commands and enables true touchless classroom interaction.

Floating navigation and immersive lighting handle real-time state expression

The UI layer is not decorative. It is the feedback system. The floating panel stays minimized during lecture delivery and expands during interaction, reducing visual obstruction. Immersive lighting changes color based on average classroom attention: green for high attention, orange for fluctuation, and red for low attention.

This design lets teachers judge the overall classroom state through color cues and a small set of metrics, without reading complex dashboards. That aligns with the need to reduce cognitive load in high-pressure teaching environments.

Deployment should prioritize multi-device stability and privacy compliance

This system is naturally suited for large live classes, small interactive classes, dual-teacher classrooms, and smart classrooms. However, real deployment depends on four prerequisites: stable device connectivity, controllable lighting on student devices, calibratable gesture thresholds, and on-device facial data processing.

Privacy governance is especially important in education. Raw student facial data must not be uploaded to the cloud. Upload only anonymized aggregate metrics such as focusScore, trend, and alertStudents.

Developers should add three capabilities before production rollout

First, build a historical trend engine that supports 30-second and 5-minute comparison windows. Second, add gesture debouncing and false-trigger suppression to prevent repeated page turns. Third, export classroom reports so real-time perception becomes a usable post-class review asset.

If you continue extending the architecture, you can also integrate an AI teaching assistant to enable pace recommendations, question-risk alerts, and individualized intervention suggestions.

FAQ

1. Why does Face AR attention scoring use multiple dimensions?

Because a single expression or a one-time gaze shift is not enough to represent real attention. Breaking the score into focus, fatigue, and engagement makes the system more stable and makes the result easier for teachers to interpret.

2. How can Body AR gesture control avoid false triggers?

You can introduce a minimum confidence threshold, multi-frame confirmation, and a cooldown mechanism. Trigger page-turning or zoom commands only when a gesture appears consistently across multiple frames.

3. Is this solution suitable for real classroom deployment?

Yes, but only after completing on-device compute evaluation, device compatibility validation, and privacy-compliant design. The prototype already covers the core workflow, but a production system still needs logging, exception recovery, and stronger data governance.

Core summary

This article reconstructs a smart interactive classroom solution built on HarmonyOS 6 (API 23). It focuses on Face AR student attention analysis, Body AR teacher gesture control, distributed device collaboration, and immersive lighting feedback, making it a practical reference for AR application development in education scenarios.