This article uses a runnable C++ example to generate a 440Hz sine wave WAV file from scratch, breaks down the three core chunks—RIFF, fmt, and data—and explains how to calculate key parameters such as sample rate, bit depth, and byte rate. It addresses the common pain point of file formats being hard to understand. Keywords: WAV, RIFF, binary file.
This article focuses on how a WAV file is constructed by hand
| Parameter | Value |
|---|---|
| Primary language | C++ |
| File protocol/format | RIFF / WAVE / PCM |
| Stars | Not provided in the source material |
| Core dependencies | cstdio, cstdint, cstring, cmath |
| Output file | test.wav |
| Example signal | 440Hz sine wave |
The source page contains some noise in both the title and body, but the actual core topic is building a WAV file from zero. The emphasis is not on calling an audio player API, but on directly writing specification-compliant binary data into a file.
This approach is ideal for learning file formats. Once you understand field layout, length, and byte order, many formats that appear complicated can be assembled chunk by chunk in code.
A WAV file consists of three key chunks written in sequence
WAV is an implementation of the RIFF container format, and PCM is the most common audio encoding used within it. For a minimal playable WAV file, you must include at least three chunks: RIFF, fmt, and data.
RIFF: declares that the file type is WAVEfmt: describes the audio parametersdata: stores the actual sample data
struct RiffHeader {
char chunkId[4]; // Fixed as RIFF
uint32_t chunkSize; // Total file size - 8
char format[4]; // Fixed as WAVE
};
This structure defines the outermost file container and tells the parser, “this is a WAV file.”
The RIFF chunk determines whether the parser can identify the file type
ChunkID must be RIFF, and Format must be WAVE. The ChunkSize field in the middle specifies the number of bytes from the end of that field to the end of the file, which is the total file size minus the first 8 bytes.
For mono, 16-bit, 44100Hz audio with a duration of 5 seconds, the sample count is 44100 × 5 = 220500. Each sample uses 2 bytes, so the data section size is 441000 bytes. Therefore, RIFF.ChunkSize = 36 + DataSize.
The fmt chunk describes how the player should interpret the sample data
The fmt chunk acts as the audio parameter specification. It tells the player how many samples to read per second, how many bits each sample contains, how many channels to read at a time, and how many bytes must be consumed per second.
struct FmtChunk {
char chunkId[4]; // Fixed as fmt
uint32_t chunkSize; // Fixed at 16 for PCM
uint16_t audioFormat; // 1 means PCM
uint16_t numChannels; // 1 for mono, 2 for stereo
uint32_t sampleRate; // Sample rate
uint32_t byteRate; // Bytes per second
uint16_t blockAlign; // Bytes per frame
uint16_t bitsPerSample; // Bit depth
};
This structure defines how the data should be decoded. Without it, the data section is just a meaningless byte stream.
The formula for ByteRate is SampleRate × NumChannels × BitsPerSample / 8. The formula for BlockAlign is NumChannels × BitsPerSample / 8. These two fields are among the most common places where beginners make mistakes.
The data chunk is the section that actually carries the sound
The data chunk header is very short and contains only two fields: an identifier and a size. It is followed immediately by the raw PCM sample data. For 16-bit PCM, samples are typically written as int16_t values.
struct DataChunk {
char chunkId[4]; // Fixed as data
uint32_t dataSize; // Total byte size of the sample data
};
This structure defines the entry point of the audio payload, and all subsequent samples are appended continuously after it.
Generating a 440Hz sine wave is the smallest practical test for validating WAV output
The sample program generates the A4 pitch, which is a 440Hz sine wave, using a mathematical function. The core logic is simple: convert the sample index into time, compute the current amplitude with the sine function, and then map it into the 16-bit integer range.
for (uint32_t i = 0; i < numSamples; ++i) {
float t = static_cast
<float>(i) / 44100.0f; // Convert the sample index to time
float y = sinf(t * 440.0f * 2.0f * 3.1415926f); // Generate a 440Hz sine wave
int16_t sample = static_cast
<int16_t>(y * 32767); // Map to the 16-bit PCM range
fwrite(&sample, sizeof(int16_t), 1, fp); // Write one sample point
}
This loop continuously outputs sample points and produces a pure tone file that a standard audio player can play directly.
Writing binary data in order is the essential method for constructing file formats
The most important part of this kind of code is not a complex algorithm, but strict ordering. You must write RIFF first, then fmt, then the data header, and finally the sample values in sequence. Misaligned fields, incorrect lengths, or mismatched types will cause the player to fail when parsing the file.
More broadly, WAV is simply an easy-to-observe example. TXT, BMP, MP4, and EXE follow the same underlying principle: a program is not interpreting a “mysterious file,” but parsing a predefined binary layout.
A more robust implementation should explicitly include the standard headers
The original code snippet omits header includes and some edge-case handling. For reliable compilation in a real project, you should include at least the required standard library headers.
#include
<cstdio> // File I/O: fopen/fwrite/fclose
#include
<cstdint> // Fixed-width integers: uint32_t/int16_t
#include
<cstring> // Memory operations: memcpy
#include
<cmath> // Math functions: sinf
These headers cover the minimum compilation requirements for the WAV generation example.
Understanding file formats directly improves your low-level understanding of software systems
Once you adopt the perspective that a file is simply structured binary data governed by rules, many software concepts become easier to understand. An editor reads a format, modifies its in-memory representation, and writes it back according to the specification. Audio, image, and video software all follow the same pattern at a fundamental level.
That is why learning a simple format is so valuable. The WAV structure is compact, has few fields, and introduces almost no compression complexity, which makes it an excellent first step toward understanding file systems, media data, and encoding formats.
 AI Visual Insight: This image comes from a sidebar advertisement on the source page. It does not show the WAV binary structure, waveform, byte layout, or file header parsing results, so it provides no direct value for understanding the technical details of RIFF, fmt, and data. You can treat it as weakly related page noise.
FAQ
Why is WAV better than MP3 for teaching file formats?
Because WAV typically stores PCM data directly in common use cases, its structure is simple, its fields are fixed, and you do not need to understand compression codecs first. That makes it an ideal entry-level example for learning binary file formats.
Why is there a trailing space after fmt?
Because the field must occupy exactly 4 bytes, and the specification defines the identifier as fmt with a space as the final character. It is not an ellipsis and not a string terminator.
Which fields must change if I switch to stereo or 8-bit samples?
At a minimum, you must update NumChannels, ByteRate, BlockAlign, BitsPerSample, and DataSize. If the sample data type changes, you must also adjust the sample generation logic and the byte length passed to fwrite.
Core takeaway: Using a hand-written WAV example in C++, this article systematically breaks down the three major chunks—RIFF, fmt, and data—their field meanings, calculation rules, and write order. It shows that computer files are fundamentally binary data organized according to a format specification, making this a practical introduction to file formats, audio encoding, and low-level file generation logic.