Build a Spatial Photo Gallery on HarmonyOS 6 PC with Face AR, Body AR, and Immersive Lighting - Devuly | Smart Analytics for Developers & Projects

This spatial photo gallery for HarmonyOS 6 PC uses Face AR to detect expressions for page turning and favorites, Body AR for zoom, rotation, and mode switching, and photo-driven immersive lighting to create a more natural and engaging browsing experience. Keywords: HarmonyOS 6, Face AR, immersive lighting

Table of Contents

Technical specifications at a glance

Parameter	Details
Target platform	HarmonyOS 6 (API 23) PC
Development language	ArkTS / ETS
Core protocols/capabilities	Face AR, Body AR, media library access, immersive materials
Core dependencies	@hms.core.ar.arengine, @kit.MediaLibraryKit, @kit.UIDesignKit, @kit.SensorServiceKit, @kit.Graphics2DKit
Code organization	Controllers + components + pages + theme engine
GitHub stars	Not provided in the source

This project redefines photo gallery interaction on PC

Traditional photo galleries rely on the mouse wheel, touchpad, or keyboard to switch images. The interaction is linear, and the experience feels flat. This solution turns photo browsing into spatial interaction.

The core design has three layers: the perception layer uses Face AR and Body AR to capture facial and skeletal data; the semantic mapping layer translates expressions and poses into browsing commands; and the UI layer adjusts lighting and materials in real time based on photo content.

Image description AI Visual Insight: This image shows a preview of the spatial gallery interface. The main area is a large photo canvas, overlaid with an immersive title bar and a floating bottom navigation bar. The visual hierarchy emphasizes a UI lighting system driven by photo content, which reflects a centered presentation model for large-screen PC scenarios with minimal control occlusion.

The interaction mapping layer determines whether the system is usable

Face AR handles high-frequency, lightweight commands such as raising eyebrows for next photo, frowning for previous photo, and opening the mouth to favorite a photo. Body AR handles low-frequency but continuous spatial transformations such as zooming with both hands apart, rotating with one hand, and leaning forward to enter detail mode.

export enum PhotoCommand {
  NEXT = 'NEXT',
  PREV = 'PREV',
  FAVORITE = 'FAVORITE',
  FULLSCREEN = 'FULLSCREEN'
}

// Map expression recognition results to business commands
function mapBlendShapeToCommand(face: any): PhotoCommand | null {
  const shapes = face.getBlendShapes(); // Read BlendShape parameters
  if (shapes.browInnerUp > 0.55) return PhotoCommand.NEXT; // Raise eyebrows to turn page
  if (shapes.jawOpen > 0.35) return PhotoCommand.FAVORITE; // Open mouth to favorite
  return null;
}

This code shows the smallest closed loop from Face AR data to gallery commands.

The system architecture should center on perception, interpretation, and rendering

The perception layer comes from AR Engine 6.1.0 and outputs raw data such as facial BlendShapes, head pose, and skeletal keypoints. The interpretation layer is handled by ArkTS controllers, which are responsible for threshold evaluation, state cooldown, conflict priority, and feedback broadcasting.

The rendering layer is built with ArkUI and HDS components. The title bar, floating navigation, photo canvas, and ambient light background are all controlled by a unified theme engine. This matters because it keeps interaction and visual feedback within one coherent system.

Dependencies and permissions are the first deployment barrier

To run this solution, you need at minimum AR Engine, media library access, UI Design Kit, haptic feedback support, and 2D graphics capabilities. At the permission level, the critical items are camera access, image reading, and network access.

{
  "dependencies": {
    "@hms.core.ar.arengine": "^6.1.0",
    "@kit.MediaLibraryKit": "^6.0.0",
    "@kit.UIDesignKit": "^6.0.0",
    "@kit.SensorServiceKit": "^6.0.0",
    "@kit.Graphics2DKit": "^6.0.0"
  }
}

This configuration defines the core capability boundary required to run the spatial gallery.

{
  "module": {
    "requestPermissions": [
      { "name": "ohos.permission.CAMERA" },
      { "name": "ohos.permission.READ_IMAGEVIDEO" },
      { "name": "ohos.permission.INTERNET" }
    ]
  }
}

This permission declaration ensures that camera sensing, gallery access, and extended capabilities work correctly.

The color extraction engine determines whether immersive lighting feels real

The most distinctive part of the project is not AR itself, but the idea that the photo becomes the light source. The implementation samples the current PixelMap at a reduced scale, computes dominant color buckets, and then derives the primary color, secondary color, ambient light, and light or dark theme.

The dominant color affects not only the title bar background opacity, but also shadows, borders, the navigation panel, and the global ambient glow. This coupling makes the UI feel less like an overlay and more like something that grows out of the photo itself.

async function extractTheme(pixelMap: image.PixelMap) {
  const scaledMap = await pixelMap.scale(64, 64); // Downscale sampling to reduce computation
  const buffer = new ArrayBuffer(64 * 64 * 4);
  await scaledMap.readPixelsToBuffer(buffer); // Read pixel buffer
  // Continue with color quantization, sorting, and luminance calculation
}

This code outlines the entry point for extracting the dominant theme color from a photo.

The expression controller needs a cooldown mechanism to prevent false triggers

Expression recognition naturally contains jitter. For example, speaking may accidentally trigger mouth-open favorite, and blinking may be misclassified as fullscreen. The controller therefore needs thresholds, cooldown windows, and bilateral validation.

The original solution uses an 800 ms cooldown and requires bilateral activation for frowning and squinting. This is more stable than simply increasing thresholds because it balances sensitivity and accuracy.

The gesture controller is responsible for continuous transformation computation

The value of Body AR is not just recognizing a gesture, but continuously outputting computable spatial coordinates. The distance between the wrists can map to zoom, the right hand offset relative to the shoulder can map to rotation, and the midpoint between both hands can map to translation.

Leaning forward and backward can switch between detail mode and gallery mode, upgrading photo presentation logic from 2D control switching to posture-driven scene switching.

if (handDistance < 0.12) {
  newScale = Math.max(0.5, lastScale - 0.05); // Pinch with both hands to zoom out
} else if (handDistance > 0.45) {
  newScale = Math.min(3.0, lastScale + 0.05); // Spread both hands to zoom in
}

This code demonstrates the core mapping from skeletal distance to view scaling.

The immersive title bar and floating navigation create a distinctive UI

The title bar uses SystemMaterialEffect.IMMERSIVE to create a translucent material, then layers on dominant-photo-color shadows, a tracking-quality status indicator, and expression hints. It is not static decoration, but a feedback panel for AR tracking state.

The bottom navigation uses the floating style of HdsTabs, preserves whitespace around the edges, supports opacity transitions, and avoids covering the main photo subject. Gallery, favorites, gesture help, and settings are integrated into one floating layer, which matches the low-interruption principle of large-screen scenarios.

The main page orchestrates the AR loop, gallery data, and visual state

The main page should not just stack controls. Its role is to unify the state source: load gallery assets, switch the current image, synchronize the theme, reset transformations, drive the AR loop, and broadcast states such as trackingQuality, photo_theme, and photo_index to child components.

This structure makes the page the orchestration layer, the controllers responsible for interpretation logic, and the components responsible for presentation logic. It also makes it easier to replace mock frame-processing code when integrating a real ARSession later.

private handlePhotoCommand(command: PhotoCommand): void {
  switch (command) {
    case PhotoCommand.NEXT:
      this.loadPhotoAtIndex(this.currentIndex + 1); // Next photo
      break;
    case PhotoCommand.FAVORITE:
      this.isFavorite = !this.isFavorite; // Toggle favorite state
      break;
  }
}

This code shows how the main page consumes business commands produced by the expression controller.

Performance optimization should be linked to tracking quality

In AR interaction, higher frame rates are not always better. You need to dynamically balance latency, power consumption, and stability according to tracking quality. Use 60 fps when tracking is high quality, drop to 30 fps in normal conditions, and reduce further to 15 fps when quality is low.

It is also recommended to add first-frame calibration, low-light compensation, valid interaction region limits, and smoothing filters for skeletal points. Otherwise, the experience will fluctuate significantly in PC camera setups with elevated viewing angles and complex lighting conditions.

private adjustARPerformance(trackingQuality: number): void {
  if (trackingQuality > 0.9) {
    this.arSession?.setTargetFps(60); // Increase frame rate when tracking is stable
  } else if (trackingQuality > 0.6) {
    this.arSession?.setTargetFps(30); // Balance performance and power consumption
  } else {
    this.arSession?.setTargetFps(15); // Reduce resource usage during low-quality tracking
  }
}

This code dynamically adjusts AR session performance based on tracking quality.

The engineering value of this solution lies in its portability

Although this example is a spatial photo gallery, its abstract pattern can be directly reused for readers, music workstations, fitness mirrors, and data dashboards. What is reusable is not a single page, but the combined paradigm of AR signal mapping, content-driven UI, and floating-layer interaction.

If you want to extend it further, the recommended priorities are: integrate a real ARSession, add personalized user threshold calibration, improve favorite data persistence, introduce distributed gallery synchronization, and then consider multi-user collaboration and recommendation algorithms.

FAQ

1. Why does Face AR expression control require a cooldown window?

Because facial expressions change continuously. Without a cooldown, page turns or favorite actions may be triggered multiple times within one second. A cooldown window significantly reduces jitter and false positives.

2. Why scale the image down to 64×64 before extracting the dominant color?

Downscaled sampling greatly reduces the cost of pixel traversal while preserving the main color distribution characteristics, which makes it well suited for real-time theme updates.

3. What problem should you solve first when integrating real AR data into the project?

Prioritize tracking stability and threshold calibration, then optimize UI animation. Without stable input, even the best immersive lighting cannot produce a reliable experience.

AI Readability Summary: This article rebuilds a spatial photo gallery solution for HarmonyOS 6 API 23. It covers Face AR expression-based page turning, Body AR gesture-based zoom and rotation, dominant-color extraction from photos, an immersive lighting title bar, floating navigation design, ArkTS code skeletons, permission configuration, and performance optimization strategies.