Documentation
¶
Overview ¶
Package writer provides HDF5 file writing infrastructure.
The Allocator manages free space allocation in HDF5 files. For v0.11.0-beta MVP, it uses a simple end-of-file allocation strategy with no freed space reuse.
See ALLOCATOR_DESIGN.md for comprehensive design documentation.
Package writer provides HDF5 file writing capabilities.
Index ¶
- type AllocatedBlock
- type Allocator
- type BZIP2Filter
- type ChunkCoordinator
- func (cc *ChunkCoordinator) ChunkDims() []uint64
- func (cc *ChunkCoordinator) DatasetDims() []uint64
- func (cc *ChunkCoordinator) ExtractChunkData(data []byte, coord []uint64, elemSize uint32) []byte
- func (cc *ChunkCoordinator) GetChunkCoordinate(index uint64) []uint64
- func (cc *ChunkCoordinator) GetChunkSize(coord []uint64) []uint64
- func (cc *ChunkCoordinator) GetTotalChunks() uint64
- func (cc *ChunkCoordinator) NumChunks() []uint64
- type CreateMode
- type DenseAttributeWriter
- type DenseGroupWriter
- type FileWriter
- func (w *FileWriter) Allocate(size uint64) (uint64, error)
- func (w *FileWriter) Allocator() *Allocator
- func (w *FileWriter) Close() error
- func (w *FileWriter) EndOfFile() uint64
- func (w *FileWriter) File() *os.File
- func (w *FileWriter) Flush() error
- func (w *FileWriter) ReadAt(buf []byte, addr int64) (int, error)
- func (w *FileWriter) Reader() io.ReaderAt
- func (w *FileWriter) Seek(offset int64, whence int) (int64, error)
- func (w *FileWriter) WriteAt(data []byte, offset int64) (int, error)
- func (w *FileWriter) WriteAtAddress(data []byte, addr uint64) error
- func (w *FileWriter) WriteAtWithAllocation(data []byte) (uint64, error)
- type Filter
- type FilterID
- type FilterPipeline
- func (fp *FilterPipeline) AddFilter(f Filter)
- func (fp *FilterPipeline) AddFilterAtStart(f Filter)
- func (fp *FilterPipeline) Apply(data []byte) ([]byte, error)
- func (fp *FilterPipeline) Count() int
- func (fp *FilterPipeline) EncodePipelineMessage() ([]byte, error)
- func (fp *FilterPipeline) IsEmpty() bool
- func (fp *FilterPipeline) Remove(data []byte) ([]byte, error)
- type Fletcher32Filter
- type GZIPFilter
- type LZFFilter
- type SZIPFilter
- type ShuffleFilter
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type AllocatedBlock ¶
type AllocatedBlock struct {
Offset uint64 // Starting address in file
Size uint64 // Size of allocated block in bytes
}
AllocatedBlock tracks an allocated region of the file.
Each block represents a contiguous region that has been allocated and must not be overwritten or reused (in MVP version).
Blocks are tracked to prevent overlapping allocations and to validate allocator integrity during testing.
type Allocator ¶
type Allocator struct {
// contains filtered or unexported fields
}
Allocator manages space allocation in HDF5 files.
Strategy (MVP v0.11.0-beta):
- End-of-file allocation: All allocations occur at end of file
- No freed space reuse: Once allocated, space is never reclaimed
- No fragmentation: Perfect sequential layout
- Overlap prevention: All allocations tracked
Thread Safety:
- NOT thread-safe: Use external synchronization if needed
- Designed for single-threaded FileWriter
Performance:
- Allocate: O(1) - constant time
- IsAllocated: O(n) - linear scan over blocks
- Blocks: O(n log n) - copy and sort
- ValidateNoOverlaps: O(n log n) - sort and scan
Advanced features (deferred to v0.11.0-RC):
- Free space reuse (best-fit, first-fit strategies)
- Fragmentation management
- Thread safety (optional mutex)
- Alignment enforcement (8-byte)
See ALLOCATOR_DESIGN.md for detailed design documentation.
func NewAllocator ¶
NewAllocator creates a space allocator.
The allocator tracks all allocations and manages free space in the HDF5 file. It uses end-of-file allocation strategy (no freed space reuse in MVP).
Parameters:
- initialOffset: Starting address for allocations (typically after superblock)
- For superblock v2 (48 bytes): initialOffset = 48
- For superblock v0 (variable size): initialOffset = superblock_size + driver_info_size
Returns:
- *Allocator ready to allocate space
Example:
alloc := NewAllocator(48) // Start after superblock v2
addr, err := alloc.Allocate(1024)
if err != nil {
return err
}
func (*Allocator) Allocate ¶
Allocate reserves a block of space at the end of the file.
The block is allocated at the current end-of-file address and tracked to prevent overlapping allocations. This is the primary method for obtaining space for HDF5 objects (datasets, groups, attributes, metadata).
Strategy:
- Allocates at current end-of-file (sequential allocation)
- Updates end-of-file pointer to addr + size
- Tracks allocation in internal block list
- No alignment enforcement (deferred to RC)
- No size limit validation (OS will reject impossible sizes)
Parameters:
- size: Number of bytes to allocate (must be > 0)
Returns:
- address: File offset where block is allocated
- error: Non-nil if allocation fails
Errors:
- "cannot allocate zero bytes": Size must be greater than 0
Thread Safety:
- NOT thread-safe: Do not call concurrently
Example:
addr, err := allocator.Allocate(1024) // Allocate 1KB
if err != nil {
return err
}
// Use addr to write data to file
file.WriteAt(data, int64(addr))
func (*Allocator) Blocks ¶
func (a *Allocator) Blocks() []AllocatedBlock
Blocks returns a copy of all allocated blocks, sorted by offset.
The returned slice is a copy, so modifications do not affect the allocator's internal state. Blocks are sorted by offset in ascending order for consistent iteration and display.
Returns:
- []AllocatedBlock: Copy of all allocated blocks, sorted by offset
Performance:
- Time: O(n log n) where n is number of blocks (due to sorting)
- Space: O(n) - allocates copy of blocks
Use Cases:
- Debugging allocation patterns
- Testing allocator state
- Visualizing file layout
- Calculating total allocated space
Example:
blocks := alloc.Blocks()
for _, block := range blocks {
fmt.Printf("Block: [%d, %d) size=%d\n",
block.Offset, block.Offset+block.Size, block.Size)
}
// Calculate total allocated space
var total uint64
for _, block := range blocks {
total += block.Size
}
func (*Allocator) EndOfFile ¶
EndOfFile returns the current end-of-file address.
This is where the next allocation would occur. It represents the total file size including all allocated blocks.
Returns:
- uint64: Current end-of-file address (next allocation address)
Performance:
- Time: O(1) - constant time
- Space: O(1) - no allocations
Use Cases:
- Determine total file size
- Verify space usage
- Track file growth
Example:
eof := alloc.EndOfFile()
fmt.Printf("File size: %d bytes\n", eof)
func (*Allocator) IsAllocated ¶
IsAllocated checks if an address range overlaps with any allocated blocks.
This method is useful for validation and debugging to ensure no overlapping writes occur. It performs a linear scan over all allocated blocks.
Overlap Detection Logic:
- Two ranges [a1,a2) and [b1,b2) overlap if: a1 < b2 && b1 < a2
- Adjacent blocks (touching boundaries) do NOT overlap
- Zero-size ranges never overlap (returns false)
Parameters:
- offset: Starting address of range to check
- size: Size of range to check
Returns:
- true: Range overlaps with at least one allocated block
- false: Range is free (or size is 0)
Performance:
- Time: O(n) where n is number of allocated blocks
- Space: O(1) - no allocations
Use Cases:
- Validation before writing to file
- Debugging overlap issues
- Testing allocation correctness
Example:
if alloc.IsAllocated(1000, 100) {
fmt.Println("Warning: Range [1000, 1100) already allocated!")
}
func (*Allocator) ValidateNoOverlaps ¶
ValidateNoOverlaps checks that no allocated blocks overlap.
This method is primarily for debugging and testing to ensure the allocator maintains correct state. In a correctly functioning allocator with end-of-file allocation, overlaps should NEVER occur.
Detection Logic:
- Sorts blocks by offset
- Checks that each block ends before the next block starts
- Adjacent blocks (touching boundaries) are NOT considered overlapping
Returns:
- nil: No overlaps detected (allocator state is valid)
- error: Overlap detected (indicates allocator bug)
Performance:
- Time: O(n log n) where n is number of blocks (due to sorting)
- Space: O(n) - allocates sorted copy of blocks
Use Cases:
- Debugging allocator implementation
- Pre-release validation
- Testing allocation correctness
- Detecting memory corruption
Example:
if err := alloc.ValidateNoOverlaps(); err != nil {
panic(fmt.Sprintf("BUG: Allocator corrupted: %v", err))
}
type BZIP2Filter ¶
type BZIP2Filter struct {
// contains filtered or unexported fields
}
BZIP2Filter implements BZIP2 compression (FilterID = 307). BZIP2 is a high-quality compression algorithm designed by Julian Seward. It provides better compression than GZIP (typically 10-15% smaller) but is slower.
BZIP2 is commonly used for scientific datasets where storage space is critical. Filter ID 307 is registered with the HDF Group.
Reference: https://sourceware.org/bzip2/ HDF5 Registration: https://github.com/HDFGroup/hdf5_plugins
func NewBZIP2Filter ¶
func NewBZIP2Filter(blockSize int) *BZIP2Filter
NewBZIP2Filter creates a BZIP2 compression filter. blockSize specifies compression level (1-9):
- 1 = fastest, lowest compression (100KB blocks)
- 9 = slowest, highest compression (900KB blocks) - default
func (*BZIP2Filter) Apply ¶
func (f *BZIP2Filter) Apply(_ []byte) ([]byte, error)
Apply compresses data using BZIP2 algorithm. Returns compressed data suitable for storage.
NOTE: Go stdlib compress/bzip2 only provides decompression. For write support, consider using github.com/dsnet/compress/bzip2 or waiting for future implementation.
func (*BZIP2Filter) Encode ¶
func (f *BZIP2Filter) Encode() (flags uint16, cdValues []uint32)
Encode returns the filter parameters for the Pipeline message.
For BZIP2 in HDF5, the client data typically contains:
- cd_values[0]: Block size (1-9, in 100KB units)
Reference: https://github.com/HDFGroup/hdf5_plugins/blob/master/BZIP2/src/H5Zbzip2.c
func (*BZIP2Filter) ID ¶
func (f *BZIP2Filter) ID() FilterID
ID returns the HDF5 filter identifier for BZIP2.
type ChunkCoordinator ¶
type ChunkCoordinator struct {
// contains filtered or unexported fields
}
ChunkCoordinator handles N-dimensional dataset chunking.
This coordinator manages the mapping between: - Dataset dimensions and chunk dimensions - Linear chunk indices and N-dimensional chunk coordinates - Dataset data layout and chunk data extraction
Key Concepts:
- Dataset dimensions: Total size of dataset in each dimension
- Chunk dimensions: Size of each chunk in each dimension
- Chunk coordinates: Scaled indices [dim0, dim1, ..., dimN] where coordinate[i] = element_index[i] / chunk_dim[i]
- Edge chunks: Partial chunks at dataset boundaries
Example (2D dataset):
Dataset: 25x35 elements Chunks: 10x10 elements Result: 3x4 = 12 total chunks - Chunk [0,0]: 10x10 (full) - Chunk [0,3]: 10x5 (partial in dim 1) - Chunk [2,0]: 5x10 (partial in dim 0) - Chunk [2,3]: 5x5 (partial in both dims)
func NewChunkCoordinator ¶
func NewChunkCoordinator(datasetDims, chunkDims []uint64) (*ChunkCoordinator, error)
NewChunkCoordinator creates coordinator.
Calculates the number of chunks needed in each dimension using ceiling division: numChunks[i] = ceil(datasetDims[i] / chunkDims[i])
Parameters:
- datasetDims: Dataset size in each dimension
- chunkDims: Chunk size in each dimension
Returns:
- ChunkCoordinator: Ready to use
- error: If dimensions mismatch
Example:
// 2D dataset: 100x200 elements, chunks: 10x20
coord, err := NewChunkCoordinator(
[]uint64{100, 200},
[]uint64{10, 20},
)
// Result: 10x10 = 100 total chunks
func (*ChunkCoordinator) ChunkDims ¶
func (cc *ChunkCoordinator) ChunkDims() []uint64
ChunkDims returns chunk dimensions (read-only copy).
func (*ChunkCoordinator) DatasetDims ¶
func (cc *ChunkCoordinator) DatasetDims() []uint64
DatasetDims returns dataset dimensions (read-only copy).
func (*ChunkCoordinator) ExtractChunkData ¶
func (cc *ChunkCoordinator) ExtractChunkData(data []byte, coord []uint64, elemSize uint32) []byte
ExtractChunkData extracts chunk data from full dataset.
Extracts the data for a specific chunk from the full dataset buffer. The dataset is laid out in row-major order (C order), and the chunk data is extracted maintaining this layout.
Parameters:
- data: Full dataset buffer (row-major layout)
- coord: Chunk coordinate to extract
- elemSize: Size of each element in bytes
Returns:
- []byte: Extracted chunk data (contiguous buffer)
Example (2D, dataset 20x30 uint32, chunks 10x10):
chunk [0,0]: extract data[0:10, 0:10] chunk [0,1]: extract data[0:10, 10:20] chunk [1,0]: extract data[10:20, 0:10]
Algorithm:
For each element in chunk: 1. Calculate position in dataset coordinates 2. Calculate linear offset in dataset buffer 3. Copy element to chunk buffer
func (*ChunkCoordinator) GetChunkCoordinate ¶
func (cc *ChunkCoordinator) GetChunkCoordinate(index uint64) []uint64
GetChunkCoordinate converts linear index to N-D coordinate.
Uses row-major layout to convert a linear chunk index to its N-dimensional coordinate.
Row-major layout means: - Rightmost dimension varies fastest - Leftmost dimension varies slowest
Parameters:
- index: Linear chunk index (0 to GetTotalChunks()-1)
Returns:
- []uint64: N-dimensional chunk coordinate
Example (2D, 3x4 chunks):
index=0 → [0,0] index=1 → [0,1] index=3 → [0,3] index=4 → [1,0] index=11 → [2,3]
Algorithm:
coord[N-1] = index % numChunks[N-1] coord[N-2] = (index / numChunks[N-1]) % numChunks[N-2] ... coord[0] = index / (numChunks[1] * numChunks[2] * ... * numChunks[N-1])
func (*ChunkCoordinator) GetChunkSize ¶
func (cc *ChunkCoordinator) GetChunkSize(coord []uint64) []uint64
GetChunkSize returns actual chunk size (may be partial).
Edge chunks at dataset boundaries may be smaller than the nominal chunk size. This method calculates the actual size of a chunk given its coordinate.
Parameters:
- coord: Chunk coordinate [dim0, dim1, ..., dimN]
Returns:
- []uint64: Actual chunk size in each dimension
Example (dataset 25x35, chunks 10x10):
[0,0] → [10,10] (full chunk) [0,3] → [10,5] (partial in dim 1) [2,0] → [5,10] (partial in dim 0) [2,3] → [5,5] (partial in both)
Algorithm:
start[i] = coord[i] * chunkDims[i] end[i] = min(start[i] + chunkDims[i], datasetDims[i]) size[i] = end[i] - start[i]
func (*ChunkCoordinator) GetTotalChunks ¶
func (cc *ChunkCoordinator) GetTotalChunks() uint64
GetTotalChunks returns total chunk count.
Calculates the total number of chunks by multiplying the number of chunks in each dimension.
Returns:
- uint64: Total number of chunks in dataset
Example:
// Dataset: 100x200, chunks: 10x20 // numChunks = [10, 10] // total = 10 * 10 = 100
func (*ChunkCoordinator) NumChunks ¶
func (cc *ChunkCoordinator) NumChunks() []uint64
NumChunks returns number of chunks per dimension (read-only copy).
type CreateMode ¶
type CreateMode int
CreateMode specifies the file creation/opening behavior.
const ( // ModeTruncate creates a new file, truncating if it exists. // Equivalent to os.Create() behavior. ModeTruncate CreateMode = iota // ModeExclusive creates a new file, fails if it exists. // Equivalent to os.O_CREATE | os.O_EXCL. ModeExclusive // ModeReadWrite opens an existing file for reading and writing. // Used for read-modify-write operations on existing HDF5 files. ModeReadWrite // ModeReadOnly opens an existing file for reading only. // Used when opening files without modification intent. ModeReadOnly )
type DenseAttributeWriter ¶
type DenseAttributeWriter struct {
// contains filtered or unexported fields
}
DenseAttributeWriter manages dense attribute storage for a single object.
Dense attributes (8+ attributes) use: - Fractal Heap: Storage for attribute data (name + type + space + value) - B-tree v2: Index for fast attribute lookup by name - Attribute Info Message: Metadata with heap/B-tree addresses
This writer REUSES infrastructure from dense groups: - structures.WritableFractalHeap (already exists!) - structures.WritableBTreeV2 (already exists!)
Reference: H5Adense.c - H5A__dense_create(), H5A__dense_insert().
func NewDenseAttributeWriter ¶
func NewDenseAttributeWriter(objectAddr uint64) *DenseAttributeWriter
NewDenseAttributeWriter creates new dense attribute writer.
Parameters:
- objectAddr: Address of object header (for reference)
Returns:
- DenseAttributeWriter ready to use
func (*DenseAttributeWriter) AddAttribute ¶
func (daw *DenseAttributeWriter) AddAttribute(attr *core.Attribute, sb *core.Superblock) error
AddAttribute adds an attribute to dense storage.
Process: 1. Encode attribute (name + type + space + data) 2. Insert into fractal heap → get heap ID 3. Insert into B-tree v2 (name → heap ID)
Parameters:
- attr: Attribute to add
- sb: Superblock for encoding
Returns:
- error: Non-nil if add fails or duplicate name
Reference: H5Adense.c - H5A__dense_insert().
func (*DenseAttributeWriter) WriteToFile ¶
func (daw *DenseAttributeWriter) WriteToFile(fw *FileWriter, allocator *Allocator, sb *core.Superblock) (*core.AttributeInfoMessage, error)
WriteToFile writes dense attribute storage to file.
Process: 1. Write fractal heap → get heap address 2. Write B-tree v2 → get B-tree address 3. Create Attribute Info Message with addresses 4. Return Attribute Info Message (caller adds to object header)
Parameters:
- fw: FileWriter for write operations
- allocator: Space allocator (pointer to match existing infrastructure)
- sb: Superblock
Returns:
- *core.AttributeInfoMessage: Message to add to object header
- error: Non-nil if write fails
Reference: H5Adense.c - H5A__dense_create().
type DenseGroupWriter ¶
type DenseGroupWriter struct {
// contains filtered or unexported fields
}
DenseGroupWriter manages dense group creation.
Dense groups (HDF5 1.8+) use:
- Link Info Message: Metadata about link storage
- Fractal Heap: Storage for link names and messages
- B-tree v2: Index for fast link lookup by name
This coordinator:
- Creates Fractal Heap for link storage
- Creates B-tree v2 for link indexing
- Stores link names and metadata in heap
- Indexes links in B-tree
- Builds Link Info Message with addresses
- Constructs object header with all messages
Reference: H5Gdense.c - H5G_dense_create(), H5G_dense_insert().
func NewDenseGroupWriter ¶
func NewDenseGroupWriter(name string) *DenseGroupWriter
NewDenseGroupWriter creates new dense group writer.
Parameters:
- name: Group name (for error messages)
Returns:
- DenseGroupWriter ready to accept links
Reference: H5Gdense.c - H5G_dense_create().
func (*DenseGroupWriter) AddLink ¶
func (dgw *DenseGroupWriter) AddLink(name string, targetAddr uint64) error
AddLink adds hard link to dense group.
For MVP: Only hard links supported (targetAddr points to object header) Future: Soft links, external links
Parameters:
- name: Link name (UTF-8 string)
- targetAddr: File address of target object header
Returns:
- error if name empty, duplicate, or invalid
Reference: H5Gdense.c - H5G_dense_insert().
func (*DenseGroupWriter) WriteToFile ¶
func (dgw *DenseGroupWriter) WriteToFile(fw *FileWriter, allocator *Allocator, sb *core.Superblock) (uint64, error)
WriteToFile writes dense group to file, returns object header address.
This method:
- For each link: a. Create link message (hard link format) b. Insert link message into fractal heap c. Insert (name, heapID) into B-tree v2
- Write fractal heap to file
- Write B-tree v2 to file
- Create Link Info Message with heap/B-tree addresses
- Create object header with Link Info + other messages
- Write object header to file
Parameters:
- fw: FileWriter for write operations
- allocator: Space allocator
- sb: Superblock for encoding parameters
Returns:
- uint64: File address of group's object header
- error: Non-nil if write fails
Reference: H5Gdense.c - H5G_dense_create() + H5G_dense_insert().
type FileWriter ¶
type FileWriter struct {
// contains filtered or unexported fields
}
FileWriter wraps an os.File for writing HDF5 files. It provides: - Space allocation tracking (via Allocator) - Write-at-address operations - End-of-file tracking - Flush control
Thread-safety: Not thread-safe. Caller must synchronize access.
func NewFileWriter ¶
func NewFileWriter(filename string, mode CreateMode, initialOffset uint64) (*FileWriter, error)
NewFileWriter creates a writer for a new HDF5 file. The file is opened for reading and writing.
Parameters:
- filename: Path to file to create
- mode: Creation mode (truncate or exclusive)
- initialOffset: Starting address for allocations (typically superblock size)
For HDF5 files:
- Superblock v2 is 48 bytes, so initialOffset would be 48
- The superblock itself at offset 0 is not tracked by the allocator
Returns:
- FileWriter ready for use
- Error if file creation fails
func OpenFileWriter ¶
func OpenFileWriter(filename string, mode CreateMode, initialOffset uint64) (*FileWriter, error)
OpenFileWriter opens an existing HDF5 file for read-modify-write operations. Unlike NewFileWriter which creates a new file, this opens an existing file.
Parameters:
- filename: Path to existing HDF5 file
- mode: Open mode (ModeReadWrite or ModeReadOnly)
- initialOffset: Current end-of-file offset (for allocation tracking)
For existing files:
- initialOffset should be set to the current file size
- New allocations will occur after existing data
- Allocator tracks next free address
Returns:
- FileWriter ready for RMW operations
- Error if file doesn't exist or open fails
Example:
// Open existing file for modification
fw, err := OpenFileWriter("data.h5", ModeReadWrite, existingFileSize)
if err != nil {
return err
}
defer fw.Close()
// Now you can allocate new space and write data
addr, _ := fw.Allocate(1024)
fw.WriteAt(newData, int64(addr))
func (*FileWriter) Allocate ¶
func (w *FileWriter) Allocate(size uint64) (uint64, error)
Allocate reserves a block of space in the file. Returns the address where the block was allocated. The space is not zeroed - caller must write data to the allocated block.
For MVP: - Allocation always occurs at end of file - No alignment requirements
Example:
addr, err := writer.Allocate(1024)
if err != nil {
return err
}
// Now write data at addr
err = writer.WriteAt(data, addr)
func (*FileWriter) Allocator ¶
func (w *FileWriter) Allocator() *Allocator
Allocator returns the space allocator. Useful for debugging and testing allocation patterns.
func (*FileWriter) Close ¶
func (w *FileWriter) Close() error
Close closes the underlying file. This does NOT automatically flush - call Flush() first if needed. After Close(), the writer cannot be used.
func (*FileWriter) EndOfFile ¶
func (w *FileWriter) EndOfFile() uint64
EndOfFile returns the current end-of-file address. This is where the next allocation would occur.
func (*FileWriter) File ¶
func (w *FileWriter) File() *os.File
File returns the underlying *os.File. Use with caution - direct file operations may break allocation tracking. Primarily for reading operations or advanced use cases.
func (*FileWriter) Flush ¶
func (w *FileWriter) Flush() error
Flush ensures all writes are committed to disk. This should be called before closing or when data durability is required.
func (*FileWriter) ReadAt ¶
func (w *FileWriter) ReadAt(buf []byte, addr int64) (int, error)
ReadAt reads data at a specific address. Useful for reading back metadata immediately after writing. Implements io.ReaderAt interface for compatibility.
func (*FileWriter) Reader ¶
func (w *FileWriter) Reader() io.ReaderAt
Reader returns an io.ReaderAt interface for reading from the file. This is the preferred method for reading operations as it returns an interface rather than a concrete type, improving testability and following Go best practices.
Use this for:
- Reading back written data
- Object header modifications
- Integration tests (can be mocked)
Example:
reader := fw.Reader() oh, err := core.ReadObjectHeader(reader, addr, sb)
func (*FileWriter) Seek ¶
func (w *FileWriter) Seek(offset int64, whence int) (int64, error)
Seek implements io.Seeker interface for compatibility. Note: HDF5 uses absolute addressing, so seeking is rarely needed.
func (*FileWriter) WriteAt ¶
func (w *FileWriter) WriteAt(data []byte, offset int64) (int, error)
WriteAt writes data at a specific address in the file. Implements io.WriterAt interface.
The address should typically be obtained from Allocate().
Note: This does not automatically track the write as an allocation. For metadata tracking, use Allocate() first, then WriteAt().
Example:
addr, _ := writer.Allocate(uint64(len(data))) _, err := writer.WriteAt(data, int64(addr))
func (*FileWriter) WriteAtAddress ¶
func (w *FileWriter) WriteAtAddress(data []byte, addr uint64) error
WriteAtAddress writes data at a specific address (convenience method with uint64 address).
func (*FileWriter) WriteAtWithAllocation ¶
func (w *FileWriter) WriteAtWithAllocation(data []byte) (uint64, error)
WriteAtWithAllocation is a convenience method that allocates space and writes data. Returns the address where data was written.
This is equivalent to:
addr, err := writer.Allocate(uint64(len(data)))
if err != nil { return 0, err }
_, err = writer.WriteAt(data, int64(addr))
return addr, err
type Filter ¶
type Filter interface {
// ID returns the HDF5 filter identifier.
ID() FilterID
// Name returns human-readable filter name.
Name() string
// Apply applies filter to data (compression/checksum on write path).
// Returns transformed data.
Apply(data []byte) ([]byte, error)
// Remove reverses filter (decompression/verification on read path).
// Returns original data.
Remove(data []byte) ([]byte, error)
// Encode encodes filter parameters for Pipeline message.
// Returns: flags, cd_values (client data array).
Encode() (flags uint16, cdValues []uint32)
}
Filter interface for data transformation. Filters are applied in sequence during write (e.g., Shuffle → GZIP → Fletcher32) and reversed during read (Fletcher32 → GZIP → Shuffle).
type FilterID ¶
type FilterID uint16
FilterID represents HDF5 standard filter identifiers.
const ( FilterNone FilterID = 0 // No filter FilterGZIP FilterID = 1 // GZIP compression (deflate) FilterShuffle FilterID = 2 // Byte shuffle FilterFletcher32 FilterID = 3 // Fletcher32 checksum FilterSZIP FilterID = 4 // SZIP (not implemented) FilterNBIT FilterID = 5 // NBIT (not implemented) FilterScaleOffset FilterID = 6 // Scale+offset (not implemented) FilterBZIP2 FilterID = 307 // BZIP2 compression FilterLZF FilterID = 32000 // LZF compression (PyTables/h5py) )
HDF5 standard filter constants.
type FilterPipeline ¶
type FilterPipeline struct {
// contains filtered or unexported fields
}
FilterPipeline manages a chain of filters applied to chunk data. Filters are applied in sequence on write and reversed on read.
Example pipeline for numeric data compression:
- Shuffle (reorder bytes for better compression)
- GZIP (compress data)
- Fletcher32 (add checksum)
On write: data → Shuffle → GZIP → Fletcher32 → stored. On read: stored → Fletcher32 → GZIP → Shuffle → data.
func NewFilterPipeline ¶
func NewFilterPipeline() *FilterPipeline
NewFilterPipeline creates an empty filter pipeline.
func (*FilterPipeline) AddFilter ¶
func (fp *FilterPipeline) AddFilter(f Filter)
AddFilter adds a filter to the end of the pipeline. Filters are applied in the order they are added during write operations.
func (*FilterPipeline) AddFilterAtStart ¶
func (fp *FilterPipeline) AddFilterAtStart(f Filter)
AddFilterAtStart inserts a filter at the beginning of the pipeline. This is useful for filters that should be applied first (e.g., Shuffle before GZIP).
func (*FilterPipeline) Apply ¶
func (fp *FilterPipeline) Apply(data []byte) ([]byte, error)
Apply applies all filters in sequence (write path). Example: Shuffle → GZIP → Fletcher32
If any filter fails, the operation stops and returns an error.
func (*FilterPipeline) Count ¶
func (fp *FilterPipeline) Count() int
Count returns the number of filters in the pipeline.
func (*FilterPipeline) EncodePipelineMessage ¶
func (fp *FilterPipeline) EncodePipelineMessage() ([]byte, error)
EncodePipelineMessage encodes the filter pipeline as an HDF5 Pipeline message (0x000B). This message is stored in the dataset's object header to describe which filters are applied to the data.
Returns the encoded message bytes ready to be written to the object header. Returns an error if the pipeline is empty.
func (*FilterPipeline) IsEmpty ¶
func (fp *FilterPipeline) IsEmpty() bool
IsEmpty returns true if the pipeline has no filters.
type Fletcher32Filter ¶
type Fletcher32Filter struct{}
Fletcher32Filter implements Fletcher32 checksum (FilterID = 3).
The Fletcher32 filter adds a 4-byte checksum to the end of data to detect corruption during storage or transmission. It uses the Fletcher32 algorithm, which is faster than CRC32 but less robust against intentional tampering.
The filter is commonly used in HDF5 to ensure data integrity, especially for compressed data where corruption could affect decompression.
On write: checksum is calculated and appended (original_data + 4 bytes). On read: checksum is verified and stripped (returns original_data).
func NewFletcher32Filter ¶
func NewFletcher32Filter() *Fletcher32Filter
NewFletcher32Filter creates a Fletcher32 checksum filter.
func (*Fletcher32Filter) Apply ¶
func (f *Fletcher32Filter) Apply(data []byte) ([]byte, error)
Apply calculates Fletcher32 checksum and appends it to the data.
The returned data is 4 bytes longer than the input, with the checksum stored in little-endian format at the end.
func (*Fletcher32Filter) Encode ¶
func (f *Fletcher32Filter) Encode() (flags uint16, cdValues []uint32)
Encode returns the filter parameters for the Pipeline message.
Fletcher32 has no parameters, so this returns empty values.
func (*Fletcher32Filter) ID ¶
func (f *Fletcher32Filter) ID() FilterID
ID returns the HDF5 filter identifier for Fletcher32.
func (*Fletcher32Filter) Name ¶
func (f *Fletcher32Filter) Name() string
Name returns the HDF5 filter name.
func (*Fletcher32Filter) Remove ¶
func (f *Fletcher32Filter) Remove(data []byte) ([]byte, error)
Remove verifies and strips the Fletcher32 checksum.
This method:
- Extracts the 4-byte checksum from the end of data
- Calculates the checksum of the original data
- Verifies they match
- Returns the original data without the checksum
Returns an error if the checksum doesn't match (data corruption detected).
type GZIPFilter ¶
type GZIPFilter struct {
// contains filtered or unexported fields
}
GZIPFilter implements GZIP compression (FilterID = 1). This filter uses the DEFLATE compression algorithm to reduce data size. In HDF5, this filter is named "deflate" following zlib terminology.
Compression levels:
1 = fastest compression, larger files 6 = balanced (default) 9 = best compression, slower
func NewGZIPFilter ¶
func NewGZIPFilter(level int) *GZIPFilter
NewGZIPFilter creates a GZIP filter with the specified compression level.
Valid levels:
1 = Fast compression, lower ratio 6 = Default (balanced) 9 = Best compression, slower
Invalid levels are automatically adjusted to 6 (default).
func (*GZIPFilter) Apply ¶
func (f *GZIPFilter) Apply(data []byte) ([]byte, error)
Apply compresses data using GZIP/DEFLATE algorithm. Returns compressed data suitable for storage.
The compressed data includes GZIP headers and CRC32 checksum.
func (*GZIPFilter) Encode ¶
func (f *GZIPFilter) Encode() (flags uint16, cdValues []uint32)
Encode returns the filter parameters for the Pipeline message.
For GZIP, the client data contains a single value: the compression level. Flags are always 0 for GZIP.
func (*GZIPFilter) ID ¶
func (f *GZIPFilter) ID() FilterID
ID returns the HDF5 filter identifier for GZIP.
func (*GZIPFilter) Name ¶
func (f *GZIPFilter) Name() string
Name returns the HDF5 filter name. HDF5 uses "deflate" (the underlying algorithm) rather than "gzip".
type LZFFilter ¶
type LZFFilter struct {
}
LZFFilter implements LZF compression (FilterID = 32000). LZF is a very fast compression algorithm designed by Marc Lehmann. It provides ~40-50% compression with 3-5x faster compression than GZIP and 2x faster decompression.
This filter is commonly used by PyTables and h5py for fast compression. Filter ID 32000 was registered by Francesc Alted (PyTables maintainer).
Reference: http://oldhome.schmorp.de/marc/liblzf.html HDF5 Registration: https://portal.hdfgroup.org/display/support/Filters
func NewLZFFilter ¶
func NewLZFFilter() *LZFFilter
NewLZFFilter creates an LZF compression filter. LZF has no configuration parameters - it uses a fixed algorithm.
func (*LZFFilter) Apply ¶
Apply compresses data using LZF algorithm. Returns compressed data suitable for storage.
LZF algorithm characteristics:
- Hash-based pattern matching (LZ77 family)
- 8KB sliding window
- Very fast compression (near memcpy speed)
- Typical compression ratio: 40-50%
func (*LZFFilter) Encode ¶
Encode returns the filter parameters for the Pipeline message.
For LZF in HDF5, the client data typically contains:
- cd_values[0]: Plugin revision number (usually 0)
- cd_values[1]: LZF filter version (usually 0)
- cd_values[2]: Pre-computed chunk size (0 = not pre-computed)
For this implementation, we use minimal parameters.
type SZIPFilter ¶
type SZIPFilter struct {
// contains filtered or unexported fields
}
SZIPFilter implements SZIP compression (FilterID = 4). SZIP uses extended Golomb-Rice coding as defined in CCSDS 121.0-B-3 standard. It was designed by NASA for satellite imagery compression and is widely used in scientific data compression.
SZIP is implemented by libaec (Adaptive Entropy Coding) library in C. Patents on the SZIP algorithm expired in 2017, making it freely usable.
However, no pure Go implementation exists as of 2026. The algorithm is complex and requires significant effort to implement:
- Adaptive entropy coding (extended Golomb-Rice)
- Preprocessing options (NN predictor, EC option encoder)
- Block-based compression with configurable parameters
For HDF5 files requiring SZIP, users should:
- Use HDF5 C library with libaec
- Use h5py (Python) which links to C library
- Re-compress files using GZIP (filter ID 1) for pure Go compatibility
Reference: https://github.com/MathisRosenhauer/libaec CCSDS Standard: https://public.ccsds.org/Pubs/121x0b3.pdf HDF Group: https://docs.hdfgroup.org/hdf5/latest/group___s_z_i_p.html
func NewSZIPFilter ¶
func NewSZIPFilter(optionMask, pixelsPerBlock, bitsPerPixel, pixelsPerScan uint32) *SZIPFilter
NewSZIPFilter creates an SZIP compression filter. Parameters match the SZIP specification:
- optionMask: Compression options (NN=32, EC=4, LSB=1, MSB=2, RAW=128)
- pixelsPerBlock: Number of pixels per block (must be even, typically 8-32)
- bitsPerPixel: Bits per pixel (1-32)
- pixelsPerScan: Pixels per scanline for 2D data (0 for 1D)
Common configurations:
- NN predictor with EC encoder: optionMask = 36 (32 + 4)
- RAW mode (no preprocessing): optionMask = 128
func (*SZIPFilter) Apply ¶
func (f *SZIPFilter) Apply(_ []byte) ([]byte, error)
Apply compresses data using SZIP algorithm. Returns compressed data suitable for storage.
NOTE: SZIP compression requires libaec library (C implementation). No pure Go implementation exists as of 2026. This is a stub that returns "not implemented" error.
For SZIP compression, consider:
- Using CGo with libaec
- Using HDF5 C library
- Using alternative compression (GZIP filter ID 1)
func (*SZIPFilter) Encode ¶
func (f *SZIPFilter) Encode() (flags uint16, cdValues []uint32)
Encode returns the filter parameters for the Pipeline message.
For SZIP in HDF5, the client data contains:
- cd_values[0]: Bits per pixel (1-32)
- cd_values[1]: Coding method (NN=32, EC=4, LSB=1, MSB=2, RAW=128)
- cd_values[2]: Pixels per block (even number, 8-32)
- cd_values[3]: Pixels per scanline (0 for 1D data)
Reference: https://github.com/HDFGroup/hdf5/blob/develop/src/H5Zszip.c
func (*SZIPFilter) ID ¶
func (f *SZIPFilter) ID() FilterID
ID returns the HDF5 filter identifier for SZIP.
func (*SZIPFilter) Remove ¶
func (f *SZIPFilter) Remove(_ []byte) ([]byte, error)
Remove decompresses SZIP-compressed data. Returns the original uncompressed data.
NOTE: SZIP decompression requires libaec library (C implementation). No pure Go implementation exists as of 2026. This is a stub that returns "not implemented" error.
type ShuffleFilter ¶
type ShuffleFilter struct {
// contains filtered or unexported fields
}
ShuffleFilter implements byte shuffle (FilterID = 2).
The shuffle filter reorders bytes in the data to improve compression ratios for numeric data. It works by transposing byte order from element-by-element to byte-by-byte.
For example, with 4-byte integers [A1 A2 A3 A4][B1 B2 B3 B4][C1 C2 C3 C4]:
Original: [A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4] Shuffled: [A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4]
This transformation groups similar bytes together (all first bytes, then all second bytes, etc.), which typically compresses much better with algorithms like GZIP.
The shuffle filter is especially effective for:
- Integer arrays with slowly changing values
- Floating-point arrays with similar magnitudes
- Multi-dimensional arrays with spatial locality
Note: Shuffle should always be applied BEFORE compression filters like GZIP.
func NewShuffleFilter ¶
func NewShuffleFilter(elementSize uint32) *ShuffleFilter
NewShuffleFilter creates a shuffle filter with the specified element size.
The element size should match the datatype size:
- int32, float32: elementSize = 4
- int64, float64: elementSize = 8
- int16: elementSize = 2
- int8: elementSize = 1
For compound or array types, use the size of the base element.
func (*ShuffleFilter) Apply ¶
func (f *ShuffleFilter) Apply(data []byte) ([]byte, error)
Apply performs byte shuffle on the data.
The shuffle algorithm:
- Divide data into elements of size elementSize
- For each byte position in an element (0 to elementSize-1): a. Extract that byte from each element b. Write all those bytes consecutively
Example with elementSize=4, 3 elements:
Input: [a1 a2 a3 a4][b1 b2 b3 b4][c1 c2 c3 c4] Output: [a1 b1 c1][a2 b2 c2][a3 b3 c3][a4 b4 c4]
This groups similar bytes together, improving compression with GZIP.
func (*ShuffleFilter) Encode ¶
func (f *ShuffleFilter) Encode() (flags uint16, cdValues []uint32)
Encode returns the filter parameters for the Pipeline message.
For shuffle, the client data contains a single value: the element size. Flags are always 0 for shuffle.
func (*ShuffleFilter) ID ¶
func (f *ShuffleFilter) ID() FilterID
ID returns the HDF5 filter identifier for shuffle.
func (*ShuffleFilter) Name ¶
func (f *ShuffleFilter) Name() string
Name returns the HDF5 filter name.
func (*ShuffleFilter) Remove ¶
func (f *ShuffleFilter) Remove(data []byte) ([]byte, error)
Remove reverses the byte shuffle (unshuffle).
This operation reverses Apply, restoring the original byte order.
Example with elementSize=4, 3 elements:
Input: [a1 b1 c1][a2 b2 c2][a3 b3 c3][a4 b4 c4] Output: [a1 a2 a3 a4][b1 b2 b3 b4][c1 c2 c3 c4]