LZ77 Compression Format
Documentation of the LZ77 compression system used in DOKAPON! Sword of Fury.
Table of contents
- Overview
- File Header
- Compression Token Format
- Decompression Algorithm
- Block Type Markers
- Window Management
- Block Sequences
- File-Specific Variations
- Data Alignment
- Implementation Notes
- Example Implementation
- Research Status
- See Also
Overview
DOKAPON! Sword of Fury uses a Nintendo-style LZ77 (LZSS variant) compression format for many of its asset files. This format is used in:
- MDL model files
- SPRANM animation files
- Some texture files
- Font files
File Header
The LZ77 header is 16 bytes:
Offset Size Description
------ ---- -----------
0x00 4 Magic: "LZ77" (0x4C5A3737)
0x04 4 Decompressed size (little-endian)
0x08 4 Flag1 (compression parameters)
0x0C 4 Flag2 (additional parameters)
Header Example
4C 5A 37 37 ; "LZ77" magic
A8 E9 09 00 ; Decompressed size: 649,400 bytes
B5 82 01 00 ; Flag1: 0x000182B5
67 30 00 00 ; Flag2: 0x00003067
Compression Token Format
The compression uses a byte-oriented LZSS stream starting at offset 0x10:
Token Encoding
| Bit 7 | Type | Description |
|---|---|---|
| 0 | Literal | Token is a literal byte to output |
| 1 | Reference | Token is a back-reference |
Back-Reference Format
When bit 7 is set:
Token byte: [1][LLLLL][OO]
Next byte: [OOOOOOOO]
Length = ((token & 0x7C) >> 2) + 3 ; 5 bits, range 3-34
Offset = (((token & 0x03) << 8) | next_byte) + 1 ; 10 bits, range 1-1024
Token Decoding Example
Token: 0x88 = 10001000
^ ^^^^^ ^^
| | |
| | +-- Offset high bits: 0
| +------- Length bits: 00010 = 2 → length = 2 + 3 = 5
+--------------- Bit 7 set: back-reference
Next byte: 0x40 = 64
Offset = ((0 << 8) | 64) + 1 = 65
Result: Copy 5 bytes from position (current - 65)
Decompression Algorithm
def decompress_lz77(data: bytes) -> bytes:
"""Decompress LZ77 data from DOKAPON! Sword of Fury."""
# Validate header
if data[:4] != b'LZ77':
raise ValueError("Invalid LZ77 header")
decompressed_size = int.from_bytes(data[4:8], 'little')
output = bytearray()
pos = 16 # Start after header
while len(output) < decompressed_size and pos < len(data):
token = data[pos]
pos += 1
if (token & 0x80) == 0:
# Literal byte
output.append(token)
else:
# Back-reference
if pos >= len(data):
break
next_byte = data[pos]
pos += 1
length = ((token & 0x7C) >> 2) + 3
offset = (((token & 0x03) << 8) | next_byte) + 1
# Copy from sliding window
for i in range(length):
if len(output) >= offset:
output.append(output[-offset])
else:
output.append(0) # Handle underflow
return bytes(output[:decompressed_size])
Block Type Markers
After decompression, data contains block markers that identify content type:
| Marker | Hex Value | Type | Description |
|---|---|---|---|
| Vertex | 0x0000C000 | Geometry | 3× float32 (X, Y, Z) per vertex |
| Normal | 0x000040C1 | Geometry | 3× float32 normalized vectors |
| Index | 0x00004000 | Geometry | uint16 triangle indices |
| Frame | 0x000080B9 | Animation | 52-byte animation frame |
| Float | 0x3F800000 | Data | Float value (1.0) |
| Align | 0xAAAAAAAA | Structure | Alignment padding |
| Structure | 0x55555555 | Structure | Section marker |
| Transform | 0x000080BA | Transform | 4×4 matrix row |
Window Management
The compression uses different window sizes based on content type:
| Block Type | Window Size | Description |
|---|---|---|
| Geometry | 32-64 KB | Large window for vertex data |
| Animation | 16 KB | Medium for frame sequences |
| Float | 8 KB | Smaller for numeric data |
| Normal | 12 KB | Optimized for vectors |
| Index | 4 KB | Compact for indices |
Window Size Selection
The header flags influence window size:
base_size = flags & 0xFFFF0000
# Adjust for block type
if block_type == 'geometry':
window_size = base_size * 2 # Double for geometry
elif block_type == 'animation':
window_size = base_size // 2 # Half for animation
else:
window_size = base_size
# Cap at maximum
window_size = min(window_size, 65536)
Block Sequences
Common block sequence patterns:
Geometry Sequence
Structure (0x55555555)
↓
Vertex (0x0000C000)
↓
Normal (0x000040C1)
↓
Index (0x00004000)
Animation Sequence
Animation (0x000080B9)
↓
Float (0x3F800000)
↓
Data
Transform Sequence
Float
↓
Normal
↓
Geometry
File-Specific Variations
MDL Model Files
- Decompressed size: 280 KB - 720 KB typical
- Contains trailing raw data (34-36 KB after compressed stream)
- Uses all block types
Example sizes:
E000.mdl: Compressed 146,424 → Decompressed 649,400
E001.mdl: Compressed varies → Decompressed 716,072
E002.mdl: Compressed 85,256 → Decompressed 286,856
SPRANM Animation Files
Two variants:
- Compressed Format
- Small files (5-13 KB)
- Animation control data
- Uses standard LZ77
- Uncompressed Format
- Larger files (43-550 KB)
- Contains embedded PNG
- No LZ77 header
Data Alignment
Within decompressed data:
| Data Type | Alignment | Stride |
|---|---|---|
| Vertex | 12 bytes | 3× float32 |
| Normal | 12 bytes | 3× float32 |
| Index | 2 bytes | uint16 |
| Matrix | 16 bytes | 4× float32 |
| Frame | 52 bytes | Animation data |
Implementation Notes
Trailing Data
MDL files may have trailing data after the compressed stream:
# Compressed stream ends when decompressed_size reached
# Remaining bytes (typically 34-36 KB) are raw data
trailing_data = compressed_data[compressed_end:]
Error Handling
Common decompression issues:
- Invalid offset (exceeds window size)
- Premature end of stream
- Size mismatch
# Safe copy with offset validation
if offset > len(output):
# Handle invalid offset
output.append(output[-1] if output else 0)
else:
output.append(output[-offset])
Example Implementation
Complete Python implementation:
import struct
class LZ77Decompressor:
def __init__(self):
self.output = bytearray()
def decompress(self, data: bytes) -> bytes:
"""Decompress LZ77 data."""
# Validate and parse header
if data[:4] != b'LZ77':
raise ValueError("Not a valid LZ77 file")
decompressed_size = struct.unpack('<I', data[4:8])[0]
flags1 = struct.unpack('<I', data[8:12])[0]
flags2 = struct.unpack('<I', data[12:16])[0]
self.output = bytearray()
pos = 16
while len(self.output) < decompressed_size and pos < len(data):
token = data[pos]
pos += 1
if (token & 0x80) == 0:
# Literal
self.output.append(token)
else:
# Back-reference
if pos >= len(data):
break
next_byte = data[pos]
pos += 1
length = ((token & 0x7C) >> 2) + 3
offset = (((token & 0x03) << 8) | next_byte) + 1
self._copy_reference(offset, length)
return bytes(self.output[:decompressed_size])
def _copy_reference(self, offset: int, length: int):
"""Copy bytes from sliding window."""
for _ in range(length):
if offset <= len(self.output):
self.output.append(self.output[-offset])
else:
self.output.append(0)
Research Status
Decompression Accuracy: Current implementation achieves 100% accuracy for file sizes matching the header declaration. Block marker identification is complete for geometry and animation data types.
See Also
- MDL Model Format - Uses LZ77 compression
- SPRANM Format - Animation files with LZ77
- Dokapon Extract Tool - Tool implementing LZ77