How to Build a WAV File from Scratch in C++: Handcrafting RIFF, fmt, and data Chunks

This article uses a runnable C++ example to generate a 440Hz sine wave WAV file from scratch, breaks down the three core chunks—RIFF, fmt, and data—and explains how to calculate key parameters such as sample rate, bit depth, and byte rate. It addresses the common pain point of file formats being hard to understand. Keywords: WAV, RIFF, binary file.

This article focuses on how a WAV file is constructed by hand

Parameter Value
Primary language C++
File protocol/format RIFF / WAVE / PCM
Stars Not provided in the source material
Core dependencies cstdio, cstdint, cstring, cmath
Output file test.wav
Example signal 440Hz sine wave

The source page contains some noise in both the title and body, but the actual core topic is building a WAV file from zero. The emphasis is not on calling an audio player API, but on directly writing specification-compliant binary data into a file.

This approach is ideal for learning file formats. Once you understand field layout, length, and byte order, many formats that appear complicated can be assembled chunk by chunk in code.

A WAV file consists of three key chunks written in sequence

WAV is an implementation of the RIFF container format, and PCM is the most common audio encoding used within it. For a minimal playable WAV file, you must include at least three chunks: RIFF, fmt, and data.

  • RIFF: declares that the file type is WAVE
  • fmt: describes the audio parameters
  • data: stores the actual sample data
struct RiffHeader {
    char chunkId[4];      // Fixed as RIFF
    uint32_t chunkSize;   // Total file size - 8
    char format[4];       // Fixed as WAVE
};

This structure defines the outermost file container and tells the parser, “this is a WAV file.”

The RIFF chunk determines whether the parser can identify the file type

ChunkID must be RIFF, and Format must be WAVE. The ChunkSize field in the middle specifies the number of bytes from the end of that field to the end of the file, which is the total file size minus the first 8 bytes.

For mono, 16-bit, 44100Hz audio with a duration of 5 seconds, the sample count is 44100 × 5 = 220500. Each sample uses 2 bytes, so the data section size is 441000 bytes. Therefore, RIFF.ChunkSize = 36 + DataSize.

The fmt chunk describes how the player should interpret the sample data

The fmt chunk acts as the audio parameter specification. It tells the player how many samples to read per second, how many bits each sample contains, how many channels to read at a time, and how many bytes must be consumed per second.

struct FmtChunk {
    char chunkId[4];         // Fixed as fmt 
    uint32_t chunkSize;      // Fixed at 16 for PCM
    uint16_t audioFormat;    // 1 means PCM
    uint16_t numChannels;    // 1 for mono, 2 for stereo
    uint32_t sampleRate;     // Sample rate
    uint32_t byteRate;       // Bytes per second
    uint16_t blockAlign;     // Bytes per frame
    uint16_t bitsPerSample;  // Bit depth
};

This structure defines how the data should be decoded. Without it, the data section is just a meaningless byte stream.

The formula for ByteRate is SampleRate × NumChannels × BitsPerSample / 8. The formula for BlockAlign is NumChannels × BitsPerSample / 8. These two fields are among the most common places where beginners make mistakes.

The data chunk is the section that actually carries the sound

The data chunk header is very short and contains only two fields: an identifier and a size. It is followed immediately by the raw PCM sample data. For 16-bit PCM, samples are typically written as int16_t values.

struct DataChunk {
    char chunkId[4];       // Fixed as data
    uint32_t dataSize;     // Total byte size of the sample data
};

This structure defines the entry point of the audio payload, and all subsequent samples are appended continuously after it.

Generating a 440Hz sine wave is the smallest practical test for validating WAV output

The sample program generates the A4 pitch, which is a 440Hz sine wave, using a mathematical function. The core logic is simple: convert the sample index into time, compute the current amplitude with the sine function, and then map it into the 16-bit integer range.

for (uint32_t i = 0; i < numSamples; ++i) {
    float t = static_cast
<float>(i) / 44100.0f;           // Convert the sample index to time
    float y = sinf(t * 440.0f * 2.0f * 3.1415926f);       // Generate a 440Hz sine wave
    int16_t sample = static_cast
<int16_t>(y * 32767);     // Map to the 16-bit PCM range
    fwrite(&sample, sizeof(int16_t), 1, fp);              // Write one sample point
}

This loop continuously outputs sample points and produces a pure tone file that a standard audio player can play directly.

Writing binary data in order is the essential method for constructing file formats

The most important part of this kind of code is not a complex algorithm, but strict ordering. You must write RIFF first, then fmt, then the data header, and finally the sample values in sequence. Misaligned fields, incorrect lengths, or mismatched types will cause the player to fail when parsing the file.

More broadly, WAV is simply an easy-to-observe example. TXT, BMP, MP4, and EXE follow the same underlying principle: a program is not interpreting a “mysterious file,” but parsing a predefined binary layout.

A more robust implementation should explicitly include the standard headers

The original code snippet omits header includes and some edge-case handling. For reliable compilation in a real project, you should include at least the required standard library headers.

#include 
<cstdio>    // File I/O: fopen/fwrite/fclose
#include 
<cstdint>   // Fixed-width integers: uint32_t/int16_t
#include 
<cstring>   // Memory operations: memcpy
#include 
<cmath>     // Math functions: sinf

These headers cover the minimum compilation requirements for the WAV generation example.

Understanding file formats directly improves your low-level understanding of software systems

Once you adopt the perspective that a file is simply structured binary data governed by rules, many software concepts become easier to understand. An editor reads a format, modifies its in-memory representation, and writes it back according to the specification. Audio, image, and video software all follow the same pattern at a fundamental level.

That is why learning a simple format is so valuable. The WAV structure is compact, has few fields, and introduces almost no compression complexity, which makes it an excellent first step toward understanding file systems, media data, and encoding formats.

![Illustration from a WAV-related webpage](https://kunyu.csdn.net/1.png?p=56&adId=1071043&adBlockFlag=0&a=1071043&c=0&k=Vue + Iframe 实战:打造企业级流程配置中心&spm=1001.2101.3001.5000&articleId=160501427&d=1&t=3&u=8ab074e250ae43aaae0b563527bdd858) AI Visual Insight: This image comes from a sidebar advertisement on the source page. It does not show the WAV binary structure, waveform, byte layout, or file header parsing results, so it provides no direct value for understanding the technical details of RIFF, fmt, and data. You can treat it as weakly related page noise.

FAQ

Why is WAV better than MP3 for teaching file formats?

Because WAV typically stores PCM data directly in common use cases, its structure is simple, its fields are fixed, and you do not need to understand compression codecs first. That makes it an ideal entry-level example for learning binary file formats.

Why is there a trailing space after fmt?

Because the field must occupy exactly 4 bytes, and the specification defines the identifier as fmt with a space as the final character. It is not an ellipsis and not a string terminator.

Which fields must change if I switch to stereo or 8-bit samples?

At a minimum, you must update NumChannels, ByteRate, BlockAlign, BitsPerSample, and DataSize. If the sample data type changes, you must also adjust the sample generation logic and the byte length passed to fwrite.

Core takeaway: Using a hand-written WAV example in C++, this article systematically breaks down the three major chunks—RIFF, fmt, and data—their field meanings, calculation rules, and write order. It shows that computer files are fundamentally binary data organized according to a format specification, making this a practical introduction to file formats, audio encoding, and low-level file generation logic.