HarmonyOS 6.0 AVCodec Kit Synchronous Video Encoding and Decoding: API Guide, Performance Analysis, and Best Practices

HarmonyOS 6.0 introduces synchronous video encoding and decoding to AVCodec Kit. Developers can actively poll input and output buffers to gain tighter timing control, a more linear code structure, and more predictable latency. This article focuses on API evolution, hands-on encoding and decoding workflows, and guidance for choosing the right mode. Keywords: HarmonyOS, AVCodec Kit, video encoding and decoding.

Technical specifications are summarized below

Parameter Description
Platform HarmonyOS 6.0
Capability module AVCodec Kit
Development language C / C++
API form Native NDK API
New feature Synchronous video encoding and decoding mode
Initial API version 20
Video codec examples H.264 / AVC
Input and output modes Surface / Buffer
Core dependencies libnative_media_codecbase.so, libnative_media_core.so, libnative_media_venc.so, libnative_media_vdec.so
Original article traction 226 views, 22 likes, 21 bookmarks

AVCodec Kit synchronous mode diagram AI Visual Insight: The image illustrates the HarmonyOS AVCodec Kit synchronous mode concept. Its core message is that the media codec pipeline is driven by the application thread: the developer actively acquires input buffers, pushes data, and polls output results, replacing the traditional callback-driven model.

HarmonyOS 6.0 expands AVCodec Kit into parallel synchronous and asynchronous control models

AVCodec Kit is a key entry point into HarmonyOS low-level multimedia capabilities. It covers audio and video encoding and decoding, media data input, parsing, and packaging. In HarmonyOS 6.0, synchronous video encoding and decoding officially joins the supported API set, starting from API version 20.

The core of this evolution is not simply adding another entry point. Instead, it returns more control of the codec lifecycle to the application layer. In the past, developers mainly relied on asynchronous callbacks. Now, AVCodec Kit also supports active buffer polling, making it easier to build media pipelines with stronger determinism.

The essence of synchronous mode is that the application thread actively drives the codec forward

The strength of asynchronous mode lies in its non-blocking behavior. When input becomes available or output is ready to read, the system notifies the application through callbacks. That makes it a good fit for high-concurrency pipeline scenarios such as media players and real-time communication.

However, asynchronous mode also introduces the cost of state machines, thread coordination, and callback management.

Synchronous mode is closer to sequential programming. The application actively acquires an input buffer, writes data, submits it, then actively acquires an output buffer and processes the result. For transcoding tools, offline analysis, and secondary bitstream processing, this model is often more natural.

Application thread -> Acquire input buffer -> Write data -> PushInputBuffer
Application thread -> Acquire output buffer -> Read result -> FreeOutputBuffer

This flow shows that, in synchronous mode, the encoding or decoding rhythm is fully controlled by the business thread, resulting in a more linear code path.

Synchronous mode is a strong fit for workloads that require precise buffer lifecycle control

When an application needs to inject SEI messages into the output bitstream, modify NAL units, perform offline transcoding, or run frame-by-frame analysis, synchronous mode can significantly reduce implementation complexity. Because reading, processing, and releasing all happen within the same execution context, cross-thread synchronization overhead stays lower.

In addition, synchronous mode is not inherently slower. With a well-designed thread model, it can reduce callback scheduling and context-switch overhead while improving cache locality. However, it should not run directly on the UI thread, because polling and waiting can amplify jank risk.

Native projects must prepare linking and headers first

target_link_libraries(sample PUBLIC libnative_media_codecbase.so)
target_link_libraries(sample PUBLIC libnative_media_core.so)
target_link_libraries(sample PUBLIC libnative_media_venc.so) # Video encoding library
target_link_libraries(sample PUBLIC libnative_media_vdec.so) # Video decoding library

This configuration links the core shared libraries required by AVCodec Kit in CMake.

#include <multimedia/player_framework/native_avcodec_videoencoder.h>
#include <multimedia/player_framework/native_avcodec_videodecoder.h>
#include <multimedia/player_framework/native_avcapability.h>
#include <multimedia/player_framework/native_avcodec_base.h>
#include <multimedia/player_framework/native_avbuffer.h>
#include 
<fstream>
#include 
<thread>

These headers cover the key APIs for the encoder, decoder, capability queries, and buffer operations.

Video encoding should begin with capability discovery and format configuration

The first step in synchronous encoding is not creating an instance immediately. You should first query the capabilities supported by the system, then create the hardware codec by encoding type. This step helps avoid compatibility issues caused by differences across device specifications.

OH_AVCapability *capability = OH_AVCodec_GetCapabilityByCategory(
    OH_AVCODEC_MIMETYPE_VIDEO_AVC,
    false, // false means the query prefers hardware capabilities
    HARDWARE);

const char *name = OH_AVCapability_GetName(capability);
OH_AVCodec *videoEnc = OH_VideoEncoder_CreateByName(name); // Create the encoder by capability name

This code selects an H.264 hardware encoder and instantiates it.

Next, configure width, height, pixel format, bitrate, frame rate, and keyframe interval. These parameters directly determine output quality, compression efficiency, and whether the hardware path is available.

OH_AVFormat *format = OH_AVFormat_Create();
OH_AVFormat_SetIntValue(format, OH_MD_KEY_WIDTH, 1920);      // Set the video width
OH_AVFormat_SetIntValue(format, OH_MD_KEY_HEIGHT, 1080);     // Set the video height
OH_AVFormat_SetIntValue(format, OH_MD_KEY_FRAME_RATE, 30);   // Set the frame rate
OH_AVFormat_SetIntValue(format, OH_MD_KEY_BITRATE, 4000000); // Set the bitrate to 4 Mbps
OH_AVFormat_SetIntValue(format, OH_MD_KEY_I_FRAME_INTERVAL, 2); // Set the keyframe interval
OH_VideoEncoder_Configure(videoEnc, format);

This configuration creates a baseline parameter template for 1080p H.264 encoding.

Surface mode is the preferred input option for high-performance encoding pipelines

The value of Surface mode lies in its zero-copy tendency. The graphics pipeline, camera, or OpenGL rendering output can write directly to the Surface consumed by the encoder, avoiding an extra pixel copy in the application layer.

OH_NativeBuffer *nativeBuffer = nullptr;
OH_VideoEncoder_GetSurface(videoEnc, &nativeBuffer); // Obtain the Surface input source bound to the encoder

This code retrieves the encoder Surface so it can connect to a camera or rendering producer.

The core of the synchronous encoding loop is polling input and output while controlling cadence

After the encoder starts, the application thread must repeatedly acquire input and output buffers. In Buffer mode, you manually write pixel data. In Surface mode, the upstream producer fills the image data.

OH_VideoEncoder_Start(videoEnc);
while (true) {
    OH_AVBuffer *inputBuffer = nullptr;
    if (OH_VideoEncoder_GetInputBuffer(videoEnc, &inputBuffer) == AV_ERR_OK && inputBuffer) {
        // Raw frames can be written here, or the Surface side can produce them
        OH_VideoEncoder_PushInputBuffer(videoEnc, inputBuffer); // Submit the input buffer
    }

    OH_AVBuffer *outputBuffer = nullptr;
    if (OH_VideoEncoder_GetOutputBuffer(videoEnc, &outputBuffer) == AV_ERR_OK && outputBuffer) {
        uint8_t *data = OH_AVBuffer_GetAddr(outputBuffer); // Get the encoded bitstream address
        OH_VideoEncoder_FreeOutputBuffer(videoEnc, outputBuffer); // Release the output buffer
    }
}

This loop demonstrates the core pull-based encoding framework in synchronous mode.

The decoding flow mirrors encoding, but focuses more on bitstream input and rendering strategy

Video decoding uses the same active model of feeding input and pulling output. The main difference is that input usually comes from a file or network bitstream, while output may be a YUV buffer or may be rendered directly to a Surface.

OH_VideoDecoder_Start(videoDec);
while (hasMoreData) {
    OH_AVBuffer *inputBuffer = nullptr;
    if (OH_VideoDecoder_GetInputBuffer(videoDec, &inputBuffer) == AV_ERR_OK && inputBuffer) {
        // Read the bitstream into the input buffer
        OH_VideoDecoder_PushInputBuffer(videoDec, inputBuffer); // Submit data for decoding
    }

    OH_AVBuffer *outputBuffer = nullptr;
    if (OH_VideoDecoder_GetOutputBuffer(videoDec, &outputBuffer) == AV_ERR_OK && outputBuffer) {
        // Retrieve YUV data here and perform rendering or analysis
        OH_VideoDecoder_FreeOutputBuffer(videoDec, outputBuffer); // Release the decoded output
    }
}

This code summarizes the basic skeleton of synchronous decoding and can be extended into a player or analysis tool.

Mode selection should be driven by the thread model, latency targets, and data path

If your workload emphasizes pipeline concurrency, audio-video synchronization, and coordination with system-level scheduling, asynchronous mode is usually the safer choice. If your workload emphasizes per-frame control, sequential processing, and lower implementation complexity, synchronous mode is usually a better fit.

The choice between Surface and Buffer determines the performance ceiling and processing flexibility

Surface mode is closer to a zero-copy pipeline and works well for screen recording, camera capture, game recording, and real-time high-resolution scenarios. Buffer mode gives up some performance in exchange for greater programmability, making it suitable for filters, scaling, AI analysis, and custom processing.

In production, a recommended default is a synchronous design that combines a worker thread with Surface mode. When you must access raw pixels or bitstream details, switch to Buffer mode and combine it with batching and rate-limiting to avoid saturating the CPU with polling.

HarmonyOS AVCodec Kit synchronous mode improves engineering control over core media primitives

The addition of synchronous mode in HarmonyOS 6.0 effectively fills a gap in the multimedia framework by providing a high-control interface. It is especially suitable for toolchain applications, media middleware, bitstream processing, offline analysis, and scenarios that require deterministic timing.

In the future, if it is combined with distributed media pipelines, NPU-based image enhancement, and finer-grained frame-level control APIs, AVCodec Kit will likely become even more important as media infrastructure within the HarmonyOS ecosystem.

FAQ

Q1: Is HarmonyOS AVCodec Kit synchronous mode always faster than asynchronous mode?

Not necessarily. The strengths of synchronous mode are control, determinism, and lower callback overhead. The strengths of asynchronous mode are better support for high-concurrency pipelines. Real performance depends on thread design, the input source, hardware capability, and whether you use a zero-copy Surface pipeline.

Q2: Which thread should run synchronous mode?

Run it on a dedicated worker thread. Calls such as GetInputBuffer and GetOutputBuffer may wait for resources to become available. If you place them on the main thread, they can affect UI responsiveness, especially at high bitrates or on lower-performance devices.

Q3: Should I choose Surface mode or Buffer mode first during development?

Choose Surface mode first if you want higher throughput and lower copy overhead. Use Buffer mode when you need access to raw YUV data, want to insert filters, run AI inference, or modify bitstream metadata.

AI Readability Summary

This article provides a restructured analysis of the new synchronous video encoding and decoding mode added to AVCodec Kit in HarmonyOS 6.0. It covers API evolution, the differences between synchronous and asynchronous execution, C/C++ Native development setup, encoding and decoding workflows, Surface versus Buffer trade-offs, performance optimization, and scenario-based architecture guidance.