writer

package

v0.0.0-...-c75269d Latest Latest Go to latest Published: Feb 4, 2026 License: MIT Imports: 11 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/meko-christian/go-hdf5

Links

Open Source Insights

Documentation ¶

Overview ¶

Package writer provides HDF5 file writing infrastructure.

The Allocator manages free space allocation in HDF5 files. For v0.11.0-beta MVP, it uses a simple end-of-file allocation strategy with no freed space reuse.

See ALLOCATOR_DESIGN.md for comprehensive design documentation.

Package writer provides HDF5 file writing capabilities.

Index ¶

type AllocatedBlock
type Allocator
- func NewAllocator(initialOffset uint64) *Allocator
- func (a *Allocator) Allocate(size uint64) (uint64, error)
- func (a *Allocator) Blocks() []AllocatedBlock
- func (a *Allocator) EndOfFile() uint64
- func (a *Allocator) IsAllocated(offset, size uint64) bool
- func (a *Allocator) ValidateNoOverlaps() error
type BZIP2Filter
- func NewBZIP2Filter(blockSize int) *BZIP2Filter
- func (f *BZIP2Filter) Apply(_ []byte) ([]byte, error)
- func (f *BZIP2Filter) Encode() (flags uint16, cdValues []uint32)
- func (f *BZIP2Filter) ID() FilterID
- func (f *BZIP2Filter) Name() string
- func (f *BZIP2Filter) Remove(data []byte) ([]byte, error)
type ChunkCoordinator
- func NewChunkCoordinator(datasetDims, chunkDims []uint64) (*ChunkCoordinator, error)
- func (cc *ChunkCoordinator) ChunkDims() []uint64
- func (cc *ChunkCoordinator) DatasetDims() []uint64
- func (cc *ChunkCoordinator) ExtractChunkData(data []byte, coord []uint64, elemSize uint32) []byte
- func (cc *ChunkCoordinator) GetChunkCoordinate(index uint64) []uint64
- func (cc *ChunkCoordinator) GetChunkSize(coord []uint64) []uint64
- func (cc *ChunkCoordinator) GetTotalChunks() uint64
- func (cc *ChunkCoordinator) NumChunks() []uint64
type CreateMode
type DenseAttributeWriter
- func NewDenseAttributeWriter(objectAddr uint64) *DenseAttributeWriter
- func (daw *DenseAttributeWriter) AddAttribute(attr *core.Attribute, sb *core.Superblock) error
- func (daw *DenseAttributeWriter) WriteToFile(fw *FileWriter, allocator *Allocator, sb *core.Superblock) (*core.AttributeInfoMessage, error)
type DenseGroupWriter
- func NewDenseGroupWriter(name string) *DenseGroupWriter
- func (dgw *DenseGroupWriter) AddLink(name string, targetAddr uint64) error
- func (dgw *DenseGroupWriter) WriteToFile(fw *FileWriter, allocator *Allocator, sb *core.Superblock) (uint64, error)
type FileWriter
- func NewFileWriter(filename string, mode CreateMode, initialOffset uint64) (*FileWriter, error)
- func OpenFileWriter(filename string, mode CreateMode, initialOffset uint64) (*FileWriter, error)
- func (w *FileWriter) Allocate(size uint64) (uint64, error)
- func (w *FileWriter) Allocator() *Allocator
- func (w *FileWriter) Close() error
- func (w *FileWriter) EndOfFile() uint64
- func (w *FileWriter) File() *os.File
- func (w *FileWriter) Flush() error
- func (w *FileWriter) ReadAt(buf []byte, addr int64) (int, error)
- func (w *FileWriter) Reader() io.ReaderAt
- func (w *FileWriter) Seek(offset int64, whence int) (int64, error)
- func (w *FileWriter) WriteAt(data []byte, offset int64) (int, error)
- func (w *FileWriter) WriteAtAddress(data []byte, addr uint64) error
- func (w *FileWriter) WriteAtWithAllocation(data []byte) (uint64, error)
type Filter
type FilterID
type FilterPipeline
- func NewFilterPipeline() *FilterPipeline
- func (fp *FilterPipeline) AddFilter(f Filter)
- func (fp *FilterPipeline) AddFilterAtStart(f Filter)
- func (fp *FilterPipeline) Apply(data []byte) ([]byte, error)
- func (fp *FilterPipeline) Count() int
- func (fp *FilterPipeline) EncodePipelineMessage() ([]byte, error)
- func (fp *FilterPipeline) IsEmpty() bool
- func (fp *FilterPipeline) Remove(data []byte) ([]byte, error)
type Fletcher32Filter
- func NewFletcher32Filter() *Fletcher32Filter
- func (f *Fletcher32Filter) Apply(data []byte) ([]byte, error)
- func (f *Fletcher32Filter) Encode() (flags uint16, cdValues []uint32)
- func (f *Fletcher32Filter) ID() FilterID
- func (f *Fletcher32Filter) Name() string
- func (f *Fletcher32Filter) Remove(data []byte) ([]byte, error)
type GZIPFilter
- func NewGZIPFilter(level int) *GZIPFilter
- func (f *GZIPFilter) Apply(data []byte) ([]byte, error)
- func (f *GZIPFilter) Encode() (flags uint16, cdValues []uint32)
- func (f *GZIPFilter) ID() FilterID
- func (f *GZIPFilter) Name() string
- func (f *GZIPFilter) Remove(data []byte) ([]byte, error)
type LZFFilter
- func NewLZFFilter() *LZFFilter
- func (f *LZFFilter) Apply(data []byte) ([]byte, error)
- func (f *LZFFilter) Encode() (flags uint16, cdValues []uint32)
- func (f *LZFFilter) ID() FilterID
- func (f *LZFFilter) Name() string
- func (f *LZFFilter) Remove(data []byte) ([]byte, error)
type SZIPFilter
- func NewSZIPFilter(optionMask, pixelsPerBlock, bitsPerPixel, pixelsPerScan uint32) *SZIPFilter
- func (f *SZIPFilter) Apply(_ []byte) ([]byte, error)
- func (f *SZIPFilter) Encode() (flags uint16, cdValues []uint32)
- func (f *SZIPFilter) ID() FilterID
- func (f *SZIPFilter) Name() string
- func (f *SZIPFilter) Remove(_ []byte) ([]byte, error)
type ShuffleFilter
- func NewShuffleFilter(elementSize uint32) *ShuffleFilter
- func (f *ShuffleFilter) Apply(data []byte) ([]byte, error)
- func (f *ShuffleFilter) Encode() (flags uint16, cdValues []uint32)
- func (f *ShuffleFilter) ID() FilterID
- func (f *ShuffleFilter) Name() string
- func (f *ShuffleFilter) Remove(data []byte) ([]byte, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type AllocatedBlock ¶

type AllocatedBlock struct {
	Offset uint64 // Starting address in file
	Size   uint64 // Size of allocated block in bytes
}

AllocatedBlock tracks an allocated region of the file.

Each block represents a contiguous region that has been allocated and must not be overwritten or reused (in MVP version).

Blocks are tracked to prevent overlapping allocations and to validate allocator integrity during testing.

type Allocator ¶

type Allocator struct {
	// contains filtered or unexported fields
}

Allocator manages space allocation in HDF5 files.

Strategy (MVP v0.11.0-beta):

End-of-file allocation: All allocations occur at end of file
No freed space reuse: Once allocated, space is never reclaimed
No fragmentation: Perfect sequential layout
Overlap prevention: All allocations tracked

Thread Safety:

NOT thread-safe: Use external synchronization if needed
Designed for single-threaded FileWriter

Performance:

Allocate: O(1) - constant time
IsAllocated: O(n) - linear scan over blocks
Blocks: O(n log n) - copy and sort
ValidateNoOverlaps: O(n log n) - sort and scan

Advanced features (deferred to v0.11.0-RC):

Free space reuse (best-fit, first-fit strategies)
Fragmentation management
Thread safety (optional mutex)
Alignment enforcement (8-byte)

See ALLOCATOR_DESIGN.md for detailed design documentation.

func NewAllocator ¶

func NewAllocator(initialOffset uint64) *Allocator

NewAllocator creates a space allocator.

The allocator tracks all allocations and manages free space in the HDF5 file. It uses end-of-file allocation strategy (no freed space reuse in MVP).

Parameters:

initialOffset: Starting address for allocations (typically after superblock)
For superblock v2 (48 bytes): initialOffset = 48
For superblock v0 (variable size): initialOffset = superblock_size + driver_info_size

Returns:

*Allocator ready to allocate space

Example:

alloc := NewAllocator(48) // Start after superblock v2
addr, err := alloc.Allocate(1024)
if err != nil {
    return err
}

func (*Allocator) Allocate ¶

func (a *Allocator) Allocate(size uint64) (uint64, error)

Allocate reserves a block of space at the end of the file.

The block is allocated at the current end-of-file address and tracked to prevent overlapping allocations. This is the primary method for obtaining space for HDF5 objects (datasets, groups, attributes, metadata).

Strategy:

Allocates at current end-of-file (sequential allocation)
Updates end-of-file pointer to addr + size
Tracks allocation in internal block list
No alignment enforcement (deferred to RC)
No size limit validation (OS will reject impossible sizes)

Parameters:

size: Number of bytes to allocate (must be > 0)

Returns:

address: File offset where block is allocated
error: Non-nil if allocation fails

Errors:

"cannot allocate zero bytes": Size must be greater than 0

Thread Safety:

NOT thread-safe: Do not call concurrently

Example:

addr, err := allocator.Allocate(1024) // Allocate 1KB
if err != nil {
    return err
}
// Use addr to write data to file
file.WriteAt(data, int64(addr))

func (*Allocator) Blocks ¶

func (a *Allocator) Blocks() []AllocatedBlock

Blocks returns a copy of all allocated blocks, sorted by offset.

The returned slice is a copy, so modifications do not affect the allocator's internal state. Blocks are sorted by offset in ascending order for consistent iteration and display.

Returns:

[]AllocatedBlock: Copy of all allocated blocks, sorted by offset

Performance:

Time: O(n log n) where n is number of blocks (due to sorting)
Space: O(n) - allocates copy of blocks

Use Cases:

Debugging allocation patterns
Testing allocator state
Visualizing file layout
Calculating total allocated space

Example:

blocks := alloc.Blocks()
for _, block := range blocks {
    fmt.Printf("Block: [%d, %d) size=%d\n",
        block.Offset, block.Offset+block.Size, block.Size)
}

// Calculate total allocated space
var total uint64
for _, block := range blocks {
    total += block.Size
}

func (*Allocator) EndOfFile ¶

func (a *Allocator) EndOfFile() uint64

EndOfFile returns the current end-of-file address.

This is where the next allocation would occur. It represents the total file size including all allocated blocks.

Returns:

uint64: Current end-of-file address (next allocation address)

Performance:

Time: O(1) - constant time
Space: O(1) - no allocations

Use Cases:

Determine total file size
Verify space usage
Track file growth

Example:

eof := alloc.EndOfFile()
fmt.Printf("File size: %d bytes\n", eof)

func (*Allocator) IsAllocated ¶

func (a *Allocator) IsAllocated(offset, size uint64) bool

IsAllocated checks if an address range overlaps with any allocated blocks.

This method is useful for validation and debugging to ensure no overlapping writes occur. It performs a linear scan over all allocated blocks.

Overlap Detection Logic:

Two ranges [a1,a2) and [b1,b2) overlap if: a1 < b2 && b1 < a2
Adjacent blocks (touching boundaries) do NOT overlap
Zero-size ranges never overlap (returns false)

Parameters:

offset: Starting address of range to check
size: Size of range to check

Returns:

true: Range overlaps with at least one allocated block
false: Range is free (or size is 0)

Performance:

Time: O(n) where n is number of allocated blocks
Space: O(1) - no allocations

Use Cases:

Validation before writing to file
Debugging overlap issues
Testing allocation correctness

Example:

if alloc.IsAllocated(1000, 100) {
    fmt.Println("Warning: Range [1000, 1100) already allocated!")
}

func (*Allocator) ValidateNoOverlaps ¶

func (a *Allocator) ValidateNoOverlaps() error

ValidateNoOverlaps checks that no allocated blocks overlap.

This method is primarily for debugging and testing to ensure the allocator maintains correct state. In a correctly functioning allocator with end-of-file allocation, overlaps should NEVER occur.

Detection Logic:

Sorts blocks by offset
Checks that each block ends before the next block starts
Adjacent blocks (touching boundaries) are NOT considered overlapping

Returns:

nil: No overlaps detected (allocator state is valid)
error: Overlap detected (indicates allocator bug)

Performance:

Time: O(n log n) where n is number of blocks (due to sorting)
Space: O(n) - allocates sorted copy of blocks

Use Cases:

Debugging allocator implementation
Pre-release validation
Testing allocation correctness
Detecting memory corruption

Example:

if err := alloc.ValidateNoOverlaps(); err != nil {
    panic(fmt.Sprintf("BUG: Allocator corrupted: %v", err))
}

type BZIP2Filter ¶

type BZIP2Filter struct {
	// contains filtered or unexported fields
}

BZIP2Filter implements BZIP2 compression (FilterID = 307). BZIP2 is a high-quality compression algorithm designed by Julian Seward. It provides better compression than GZIP (typically 10-15% smaller) but is slower.

BZIP2 is commonly used for scientific datasets where storage space is critical. Filter ID 307 is registered with the HDF Group.

Reference: https://sourceware.org/bzip2/ HDF5 Registration: https://github.com/HDFGroup/hdf5_plugins

func NewBZIP2Filter ¶

func NewBZIP2Filter(blockSize int) *BZIP2Filter

NewBZIP2Filter creates a BZIP2 compression filter. blockSize specifies compression level (1-9):

1 = fastest, lowest compression (100KB blocks)
9 = slowest, highest compression (900KB blocks) - default

func (*BZIP2Filter) Apply ¶

func (f *BZIP2Filter) Apply(_ []byte) ([]byte, error)

Apply compresses data using BZIP2 algorithm. Returns compressed data suitable for storage.

NOTE: Go stdlib compress/bzip2 only provides decompression. For write support, consider using github.com/dsnet/compress/bzip2 or waiting for future implementation.

func (*BZIP2Filter) Encode ¶

func (f *BZIP2Filter) Encode() (flags uint16, cdValues []uint32)

Encode returns the filter parameters for the Pipeline message.

For BZIP2 in HDF5, the client data typically contains:

cd_values[0]: Block size (1-9, in 100KB units)

Reference: https://github.com/HDFGroup/hdf5_plugins/blob/master/BZIP2/src/H5Zbzip2.c

func (*BZIP2Filter) ID ¶

func (f *BZIP2Filter) ID() FilterID

ID returns the HDF5 filter identifier for BZIP2.

func (*BZIP2Filter) Name ¶

func (f *BZIP2Filter) Name() string

Name returns the HDF5 filter name.

func (*BZIP2Filter) Remove ¶

func (f *BZIP2Filter) Remove(data []byte) ([]byte, error)

Remove decompresses BZIP2-compressed data. Returns the original uncompressed data.

This uses Go's stdlib compress/bzip2 for decompression.

type ChunkCoordinator ¶

type ChunkCoordinator struct {
	// contains filtered or unexported fields
}

ChunkCoordinator handles N-dimensional dataset chunking.

This coordinator manages the mapping between: - Dataset dimensions and chunk dimensions - Linear chunk indices and N-dimensional chunk coordinates - Dataset data layout and chunk data extraction

Key Concepts:

Dataset dimensions: Total size of dataset in each dimension
Chunk dimensions: Size of each chunk in each dimension
Chunk coordinates: Scaled indices [dim0, dim1, ..., dimN] where coordinate[i] = element_index[i] / chunk_dim[i]
Edge chunks: Partial chunks at dataset boundaries

Example (2D dataset):

Dataset: 25x35 elements
Chunks: 10x10 elements
Result: 3x4 = 12 total chunks
  - Chunk [0,0]: 10x10 (full)
  - Chunk [0,3]: 10x5 (partial in dim 1)
  - Chunk [2,0]: 5x10 (partial in dim 0)
  - Chunk [2,3]: 5x5 (partial in both dims)

func NewChunkCoordinator ¶

func NewChunkCoordinator(datasetDims, chunkDims []uint64) (*ChunkCoordinator, error)

NewChunkCoordinator creates coordinator.

Calculates the number of chunks needed in each dimension using ceiling division: numChunks[i] = ceil(datasetDims[i] / chunkDims[i])

Parameters:

datasetDims: Dataset size in each dimension
chunkDims: Chunk size in each dimension

Returns:

ChunkCoordinator: Ready to use
error: If dimensions mismatch

Example:

// 2D dataset: 100x200 elements, chunks: 10x20
coord, err := NewChunkCoordinator(
    []uint64{100, 200},
    []uint64{10, 20},
)
// Result: 10x10 = 100 total chunks

func (*ChunkCoordinator) ChunkDims ¶

func (cc *ChunkCoordinator) ChunkDims() []uint64

ChunkDims returns chunk dimensions (read-only copy).

func (*ChunkCoordinator) DatasetDims ¶

func (cc *ChunkCoordinator) DatasetDims() []uint64

DatasetDims returns dataset dimensions (read-only copy).

func (*ChunkCoordinator) ExtractChunkData ¶

func (cc *ChunkCoordinator) ExtractChunkData(data []byte, coord []uint64, elemSize uint32) []byte

ExtractChunkData extracts chunk data from full dataset.

Extracts the data for a specific chunk from the full dataset buffer. The dataset is laid out in row-major order (C order), and the chunk data is extracted maintaining this layout.

Parameters:

data: Full dataset buffer (row-major layout)
coord: Chunk coordinate to extract
elemSize: Size of each element in bytes

Returns:

[]byte: Extracted chunk data (contiguous buffer)

Example (2D, dataset 20x30 uint32, chunks 10x10):

chunk [0,0]: extract data[0:10, 0:10]
chunk [0,1]: extract data[0:10, 10:20]
chunk [1,0]: extract data[10:20, 0:10]

Algorithm:

For each element in chunk:
  1. Calculate position in dataset coordinates
  2. Calculate linear offset in dataset buffer
  3. Copy element to chunk buffer

func (*ChunkCoordinator) GetChunkCoordinate ¶

func (cc *ChunkCoordinator) GetChunkCoordinate(index uint64) []uint64

GetChunkCoordinate converts linear index to N-D coordinate.

Uses row-major layout to convert a linear chunk index to its N-dimensional coordinate.

Row-major layout means: - Rightmost dimension varies fastest - Leftmost dimension varies slowest

Parameters:

index: Linear chunk index (0 to GetTotalChunks()-1)

Returns:

[]uint64: N-dimensional chunk coordinate

Example (2D, 3x4 chunks):

index=0  → [0,0]
index=1  → [0,1]
index=3  → [0,3]
index=4  → [1,0]
index=11 → [2,3]

Algorithm:

coord[N-1] = index % numChunks[N-1]
coord[N-2] = (index / numChunks[N-1]) % numChunks[N-2]
...
coord[0] = index / (numChunks[1] * numChunks[2] * ... * numChunks[N-1])

func (*ChunkCoordinator) GetChunkSize ¶

func (cc *ChunkCoordinator) GetChunkSize(coord []uint64) []uint64

GetChunkSize returns actual chunk size (may be partial).

Edge chunks at dataset boundaries may be smaller than the nominal chunk size. This method calculates the actual size of a chunk given its coordinate.

Parameters:

coord: Chunk coordinate [dim0, dim1, ..., dimN]

Returns:

[]uint64: Actual chunk size in each dimension

Example (dataset 25x35, chunks 10x10):

[0,0] → [10,10] (full chunk)
[0,3] → [10,5]  (partial in dim 1)
[2,0] → [5,10]  (partial in dim 0)
[2,3] → [5,5]   (partial in both)

Algorithm:

start[i] = coord[i] * chunkDims[i]
end[i] = min(start[i] + chunkDims[i], datasetDims[i])
size[i] = end[i] - start[i]

func (*ChunkCoordinator) GetTotalChunks ¶

func (cc *ChunkCoordinator) GetTotalChunks() uint64

GetTotalChunks returns total chunk count.

Calculates the total number of chunks by multiplying the number of chunks in each dimension.

Returns:

uint64: Total number of chunks in dataset

Example:

// Dataset: 100x200, chunks: 10x20
// numChunks = [10, 10]
// total = 10 * 10 = 100

func (*ChunkCoordinator) NumChunks ¶

func (cc *ChunkCoordinator) NumChunks() []uint64

NumChunks returns number of chunks per dimension (read-only copy).

type CreateMode ¶

type CreateMode int

CreateMode specifies the file creation/opening behavior.

const (
	// ModeTruncate creates a new file, truncating if it exists.
	// Equivalent to os.Create() behavior.
	ModeTruncate CreateMode = iota

	// ModeExclusive creates a new file, fails if it exists.
	// Equivalent to os.O_CREATE | os.O_EXCL.
	ModeExclusive

	// ModeReadWrite opens an existing file for reading and writing.
	// Used for read-modify-write operations on existing HDF5 files.
	ModeReadWrite

	// ModeReadOnly opens an existing file for reading only.
	// Used when opening files without modification intent.
	ModeReadOnly
)

type DenseAttributeWriter ¶

type DenseAttributeWriter struct {
	// contains filtered or unexported fields
}

DenseAttributeWriter manages dense attribute storage for a single object.

Dense attributes (8+ attributes) use: - Fractal Heap: Storage for attribute data (name + type + space + value) - B-tree v2: Index for fast attribute lookup by name - Attribute Info Message: Metadata with heap/B-tree addresses

This writer REUSES infrastructure from dense groups: - structures.WritableFractalHeap (already exists!) - structures.WritableBTreeV2 (already exists!)

Reference: H5Adense.c - H5A__dense_create(), H5A__dense_insert().

func NewDenseAttributeWriter ¶

func NewDenseAttributeWriter(objectAddr uint64) *DenseAttributeWriter

NewDenseAttributeWriter creates new dense attribute writer.

Parameters:

objectAddr: Address of object header (for reference)

Returns:

DenseAttributeWriter ready to use

func (*DenseAttributeWriter) AddAttribute ¶

func (daw *DenseAttributeWriter) AddAttribute(attr *core.Attribute, sb *core.Superblock) error

AddAttribute adds an attribute to dense storage.

Process: 1. Encode attribute (name + type + space + data) 2. Insert into fractal heap → get heap ID 3. Insert into B-tree v2 (name → heap ID)

Parameters:

attr: Attribute to add
sb: Superblock for encoding

Returns:

error: Non-nil if add fails or duplicate name

Reference: H5Adense.c - H5A__dense_insert().

func (*DenseAttributeWriter) WriteToFile ¶

func (daw *DenseAttributeWriter) WriteToFile(fw *FileWriter, allocator *Allocator, sb *core.Superblock) (*core.AttributeInfoMessage, error)

WriteToFile writes dense attribute storage to file.

Process: 1. Write fractal heap → get heap address 2. Write B-tree v2 → get B-tree address 3. Create Attribute Info Message with addresses 4. Return Attribute Info Message (caller adds to object header)

Parameters:

fw: FileWriter for write operations
allocator: Space allocator (pointer to match existing infrastructure)
sb: Superblock

Returns:

*core.AttributeInfoMessage: Message to add to object header
error: Non-nil if write fails

Reference: H5Adense.c - H5A__dense_create().

type DenseGroupWriter ¶

type DenseGroupWriter struct {
	// contains filtered or unexported fields
}

DenseGroupWriter manages dense group creation.

Dense groups (HDF5 1.8+) use:

Link Info Message: Metadata about link storage
Fractal Heap: Storage for link names and messages
B-tree v2: Index for fast link lookup by name

This coordinator:

Creates Fractal Heap for link storage
Creates B-tree v2 for link indexing
Stores link names and metadata in heap
Indexes links in B-tree
Builds Link Info Message with addresses
Constructs object header with all messages

Reference: H5Gdense.c - H5G_dense_create(), H5G_dense_insert().

func NewDenseGroupWriter ¶

func NewDenseGroupWriter(name string) *DenseGroupWriter

NewDenseGroupWriter creates new dense group writer.

Parameters:

name: Group name (for error messages)

Returns:

DenseGroupWriter ready to accept links

Reference: H5Gdense.c - H5G_dense_create().

func (*DenseGroupWriter) AddLink ¶

func (dgw *DenseGroupWriter) AddLink(name string, targetAddr uint64) error

AddLink adds hard link to dense group.

For MVP: Only hard links supported (targetAddr points to object header) Future: Soft links, external links

Parameters:

name: Link name (UTF-8 string)
targetAddr: File address of target object header

Returns:

error if name empty, duplicate, or invalid

Reference: H5Gdense.c - H5G_dense_insert().

func (*DenseGroupWriter) WriteToFile ¶

func (dgw *DenseGroupWriter) WriteToFile(fw *FileWriter, allocator *Allocator, sb *core.Superblock) (uint64, error)

WriteToFile writes dense group to file, returns object header address.

This method:

For each link: a. Create link message (hard link format) b. Insert link message into fractal heap c. Insert (name, heapID) into B-tree v2
Write fractal heap to file
Write B-tree v2 to file
Create Link Info Message with heap/B-tree addresses
Create object header with Link Info + other messages
Write object header to file

Parameters:

fw: FileWriter for write operations
allocator: Space allocator
sb: Superblock for encoding parameters

Returns:

uint64: File address of group's object header
error: Non-nil if write fails

Reference: H5Gdense.c - H5G_dense_create() + H5G_dense_insert().

type FileWriter ¶

type FileWriter struct {
	// contains filtered or unexported fields
}

FileWriter wraps an os.File for writing HDF5 files. It provides: - Space allocation tracking (via Allocator) - Write-at-address operations - End-of-file tracking - Flush control

Thread-safety: Not thread-safe. Caller must synchronize access.

func NewFileWriter ¶

func NewFileWriter(filename string, mode CreateMode, initialOffset uint64) (*FileWriter, error)

NewFileWriter creates a writer for a new HDF5 file. The file is opened for reading and writing.

Parameters:

filename: Path to file to create
mode: Creation mode (truncate or exclusive)
initialOffset: Starting address for allocations (typically superblock size)

For HDF5 files:

Superblock v2 is 48 bytes, so initialOffset would be 48
The superblock itself at offset 0 is not tracked by the allocator

Returns:

FileWriter ready for use
Error if file creation fails

func OpenFileWriter ¶

func OpenFileWriter(filename string, mode CreateMode, initialOffset uint64) (*FileWriter, error)

OpenFileWriter opens an existing HDF5 file for read-modify-write operations. Unlike NewFileWriter which creates a new file, this opens an existing file.

Parameters:

filename: Path to existing HDF5 file
mode: Open mode (ModeReadWrite or ModeReadOnly)
initialOffset: Current end-of-file offset (for allocation tracking)

For existing files:

initialOffset should be set to the current file size
New allocations will occur after existing data
Allocator tracks next free address

Returns:

FileWriter ready for RMW operations
Error if file doesn't exist or open fails

Example:

// Open existing file for modification
fw, err := OpenFileWriter("data.h5", ModeReadWrite, existingFileSize)
if err != nil {
    return err
}
defer fw.Close()

// Now you can allocate new space and write data
addr, _ := fw.Allocate(1024)
fw.WriteAt(newData, int64(addr))

func (*FileWriter) Allocate ¶

func (w *FileWriter) Allocate(size uint64) (uint64, error)

Allocate reserves a block of space in the file. Returns the address where the block was allocated. The space is not zeroed - caller must write data to the allocated block.

For MVP: - Allocation always occurs at end of file - No alignment requirements

Example:

addr, err := writer.Allocate(1024)
if err != nil {
    return err
}
// Now write data at addr
err = writer.WriteAt(data, addr)

func (*FileWriter) Allocator ¶

func (w *FileWriter) Allocator() *Allocator

Allocator returns the space allocator. Useful for debugging and testing allocation patterns.

func (*FileWriter) Close ¶

func (w *FileWriter) Close() error

Close closes the underlying file. This does NOT automatically flush - call Flush() first if needed. After Close(), the writer cannot be used.

func (*FileWriter) EndOfFile ¶

func (w *FileWriter) EndOfFile() uint64

EndOfFile returns the current end-of-file address. This is where the next allocation would occur.

func (*FileWriter) File ¶

func (w *FileWriter) File() *os.File

File returns the underlying *os.File. Use with caution - direct file operations may break allocation tracking. Primarily for reading operations or advanced use cases.

func (*FileWriter) Flush ¶

func (w *FileWriter) Flush() error

Flush ensures all writes are committed to disk. This should be called before closing or when data durability is required.

func (*FileWriter) ReadAt ¶

func (w *FileWriter) ReadAt(buf []byte, addr int64) (int, error)

ReadAt reads data at a specific address. Useful for reading back metadata immediately after writing. Implements io.ReaderAt interface for compatibility.

func (*FileWriter) Reader ¶

func (w *FileWriter) Reader() io.ReaderAt

Reader returns an io.ReaderAt interface for reading from the file. This is the preferred method for reading operations as it returns an interface rather than a concrete type, improving testability and following Go best practices.

Use this for:

Reading back written data
Object header modifications
Integration tests (can be mocked)

Example:

reader := fw.Reader()
oh, err := core.ReadObjectHeader(reader, addr, sb)

func (*FileWriter) Seek ¶

func (w *FileWriter) Seek(offset int64, whence int) (int64, error)

Seek implements io.Seeker interface for compatibility. Note: HDF5 uses absolute addressing, so seeking is rarely needed.

func (*FileWriter) WriteAt ¶

func (w *FileWriter) WriteAt(data []byte, offset int64) (int, error)

WriteAt writes data at a specific address in the file. Implements io.WriterAt interface.

The address should typically be obtained from Allocate().

Note: This does not automatically track the write as an allocation. For metadata tracking, use Allocate() first, then WriteAt().

Example:

addr, _ := writer.Allocate(uint64(len(data)))
_, err := writer.WriteAt(data, int64(addr))

func (*FileWriter) WriteAtAddress ¶

func (w *FileWriter) WriteAtAddress(data []byte, addr uint64) error

WriteAtAddress writes data at a specific address (convenience method with uint64 address).

func (*FileWriter) WriteAtWithAllocation ¶

func (w *FileWriter) WriteAtWithAllocation(data []byte) (uint64, error)

WriteAtWithAllocation is a convenience method that allocates space and writes data. Returns the address where data was written.

This is equivalent to:

addr, err := writer.Allocate(uint64(len(data)))
if err != nil { return 0, err }
_, err = writer.WriteAt(data, int64(addr))
return addr, err

type Filter ¶

type Filter interface {
	// ID returns the HDF5 filter identifier.
	ID() FilterID

	// Name returns human-readable filter name.
	Name() string

	// Apply applies filter to data (compression/checksum on write path).
	// Returns transformed data.
	Apply(data []byte) ([]byte, error)

	// Remove reverses filter (decompression/verification on read path).
	// Returns original data.
	Remove(data []byte) ([]byte, error)

	// Encode encodes filter parameters for Pipeline message.
	// Returns: flags, cd_values (client data array).
	Encode() (flags uint16, cdValues []uint32)
}

Filter interface for data transformation. Filters are applied in sequence during write (e.g., Shuffle → GZIP → Fletcher32) and reversed during read (Fletcher32 → GZIP → Shuffle).

type FilterID ¶

type FilterID uint16

FilterID represents HDF5 standard filter identifiers.

const (
	FilterNone        FilterID = 0     // No filter
	FilterGZIP        FilterID = 1     // GZIP compression (deflate)
	FilterShuffle     FilterID = 2     // Byte shuffle
	FilterFletcher32  FilterID = 3     // Fletcher32 checksum
	FilterSZIP        FilterID = 4     // SZIP (not implemented)
	FilterNBIT        FilterID = 5     // NBIT (not implemented)
	FilterScaleOffset FilterID = 6     // Scale+offset (not implemented)
	FilterBZIP2       FilterID = 307   // BZIP2 compression
	FilterLZF         FilterID = 32000 // LZF compression (PyTables/h5py)
)

HDF5 standard filter constants.

type FilterPipeline ¶

type FilterPipeline struct {
	// contains filtered or unexported fields
}

FilterPipeline manages a chain of filters applied to chunk data. Filters are applied in sequence on write and reversed on read.

Example pipeline for numeric data compression:

Shuffle (reorder bytes for better compression)
GZIP (compress data)
Fletcher32 (add checksum)

On write: data → Shuffle → GZIP → Fletcher32 → stored. On read: stored → Fletcher32 → GZIP → Shuffle → data.

func NewFilterPipeline ¶

func NewFilterPipeline() *FilterPipeline

NewFilterPipeline creates an empty filter pipeline.

func (*FilterPipeline) AddFilter ¶

func (fp *FilterPipeline) AddFilter(f Filter)

AddFilter adds a filter to the end of the pipeline. Filters are applied in the order they are added during write operations.

func (*FilterPipeline) AddFilterAtStart ¶

func (fp *FilterPipeline) AddFilterAtStart(f Filter)

AddFilterAtStart inserts a filter at the beginning of the pipeline. This is useful for filters that should be applied first (e.g., Shuffle before GZIP).

func (*FilterPipeline) Apply ¶

func (fp *FilterPipeline) Apply(data []byte) ([]byte, error)

Apply applies all filters in sequence (write path). Example: Shuffle → GZIP → Fletcher32

If any filter fails, the operation stops and returns an error.

func (*FilterPipeline) Count ¶

func (fp *FilterPipeline) Count() int

Count returns the number of filters in the pipeline.

func (*FilterPipeline) EncodePipelineMessage ¶

func (fp *FilterPipeline) EncodePipelineMessage() ([]byte, error)

EncodePipelineMessage encodes the filter pipeline as an HDF5 Pipeline message (0x000B). This message is stored in the dataset's object header to describe which filters are applied to the data.

Returns the encoded message bytes ready to be written to the object header. Returns an error if the pipeline is empty.

func (*FilterPipeline) IsEmpty ¶

func (fp *FilterPipeline) IsEmpty() bool

IsEmpty returns true if the pipeline has no filters.

func (*FilterPipeline) Remove ¶

func (fp *FilterPipeline) Remove(data []byte) ([]byte, error)

Remove reverses all filters in reverse order (read path). Example: Fletcher32 → GZIP → Shuffle

Filters must be removed in reverse order to correctly restore the original data.

type Fletcher32Filter ¶

type Fletcher32Filter struct{}

Fletcher32Filter implements Fletcher32 checksum (FilterID = 3).

The Fletcher32 filter adds a 4-byte checksum to the end of data to detect corruption during storage or transmission. It uses the Fletcher32 algorithm, which is faster than CRC32 but less robust against intentional tampering.

The filter is commonly used in HDF5 to ensure data integrity, especially for compressed data where corruption could affect decompression.

On write: checksum is calculated and appended (original_data + 4 bytes). On read: checksum is verified and stripped (returns original_data).

func NewFletcher32Filter ¶

func NewFletcher32Filter() *Fletcher32Filter

NewFletcher32Filter creates a Fletcher32 checksum filter.

func (*Fletcher32Filter) Apply ¶

func (f *Fletcher32Filter) Apply(data []byte) ([]byte, error)

Apply calculates Fletcher32 checksum and appends it to the data.

The returned data is 4 bytes longer than the input, with the checksum stored in little-endian format at the end.

func (*Fletcher32Filter) Encode ¶

func (f *Fletcher32Filter) Encode() (flags uint16, cdValues []uint32)

Encode returns the filter parameters for the Pipeline message.

Fletcher32 has no parameters, so this returns empty values.

func (*Fletcher32Filter) ID ¶

func (f *Fletcher32Filter) ID() FilterID

ID returns the HDF5 filter identifier for Fletcher32.

func (*Fletcher32Filter) Name ¶

func (f *Fletcher32Filter) Name() string

Name returns the HDF5 filter name.

func (*Fletcher32Filter) Remove ¶

func (f *Fletcher32Filter) Remove(data []byte) ([]byte, error)

Remove verifies and strips the Fletcher32 checksum.

This method:

Extracts the 4-byte checksum from the end of data
Calculates the checksum of the original data
Verifies they match
Returns the original data without the checksum

Returns an error if the checksum doesn't match (data corruption detected).

type GZIPFilter ¶

type GZIPFilter struct {
	// contains filtered or unexported fields
}

GZIPFilter implements GZIP compression (FilterID = 1). This filter uses the DEFLATE compression algorithm to reduce data size. In HDF5, this filter is named "deflate" following zlib terminology.

Compression levels:

1 = fastest compression, larger files
6 = balanced (default)
9 = best compression, slower

func NewGZIPFilter ¶

func NewGZIPFilter(level int) *GZIPFilter

NewGZIPFilter creates a GZIP filter with the specified compression level.

Valid levels:

1 = Fast compression, lower ratio
6 = Default (balanced)
9 = Best compression, slower

Invalid levels are automatically adjusted to 6 (default).

func (*GZIPFilter) Apply ¶

func (f *GZIPFilter) Apply(data []byte) ([]byte, error)

Apply compresses data using GZIP/DEFLATE algorithm. Returns compressed data suitable for storage.

The compressed data includes GZIP headers and CRC32 checksum.

func (*GZIPFilter) Encode ¶

func (f *GZIPFilter) Encode() (flags uint16, cdValues []uint32)

Encode returns the filter parameters for the Pipeline message.

For GZIP, the client data contains a single value: the compression level. Flags are always 0 for GZIP.

func (*GZIPFilter) ID ¶

func (f *GZIPFilter) ID() FilterID

ID returns the HDF5 filter identifier for GZIP.

func (*GZIPFilter) Name ¶

func (f *GZIPFilter) Name() string

Name returns the HDF5 filter name. HDF5 uses "deflate" (the underlying algorithm) rather than "gzip".

func (*GZIPFilter) Remove ¶

func (f *GZIPFilter) Remove(data []byte) ([]byte, error)

Remove decompresses GZIP-compressed data. Returns the original uncompressed data.

This method reverses the Apply operation, restoring the original data.

type LZFFilter ¶

type LZFFilter struct {
}

LZFFilter implements LZF compression (FilterID = 32000). LZF is a very fast compression algorithm designed by Marc Lehmann. It provides ~40-50% compression with 3-5x faster compression than GZIP and 2x faster decompression.

This filter is commonly used by PyTables and h5py for fast compression. Filter ID 32000 was registered by Francesc Alted (PyTables maintainer).

Reference: http://oldhome.schmorp.de/marc/liblzf.html HDF5 Registration: https://portal.hdfgroup.org/display/support/Filters

func NewLZFFilter ¶

func NewLZFFilter() *LZFFilter

NewLZFFilter creates an LZF compression filter. LZF has no configuration parameters - it uses a fixed algorithm.

func (*LZFFilter) Apply ¶

func (f *LZFFilter) Apply(data []byte) ([]byte, error)

Apply compresses data using LZF algorithm. Returns compressed data suitable for storage.

LZF algorithm characteristics:

Hash-based pattern matching (LZ77 family)
8KB sliding window
Very fast compression (near memcpy speed)
Typical compression ratio: 40-50%

func (*LZFFilter) Encode ¶

func (f *LZFFilter) Encode() (flags uint16, cdValues []uint32)

Encode returns the filter parameters for the Pipeline message.

For LZF in HDF5, the client data typically contains:

cd_values[0]: Plugin revision number (usually 0)
cd_values[1]: LZF filter version (usually 0)
cd_values[2]: Pre-computed chunk size (0 = not pre-computed)

For this implementation, we use minimal parameters.

func (*LZFFilter) ID ¶

func (f *LZFFilter) ID() FilterID

ID returns the HDF5 filter identifier for LZF.

func (*LZFFilter) Name ¶

func (f *LZFFilter) Name() string

Name returns the HDF5 filter name.

func (*LZFFilter) Remove ¶

func (f *LZFFilter) Remove(data []byte) ([]byte, error)

Remove decompresses LZF-compressed data. Returns the original uncompressed data.

This method reverses the Apply operation, restoring the original data.

type SZIPFilter ¶

type SZIPFilter struct {
	// contains filtered or unexported fields
}

SZIPFilter implements SZIP compression (FilterID = 4). SZIP uses extended Golomb-Rice coding as defined in CCSDS 121.0-B-3 standard. It was designed by NASA for satellite imagery compression and is widely used in scientific data compression.

SZIP is implemented by libaec (Adaptive Entropy Coding) library in C. Patents on the SZIP algorithm expired in 2017, making it freely usable.

However, no pure Go implementation exists as of 2026. The algorithm is complex and requires significant effort to implement:

Adaptive entropy coding (extended Golomb-Rice)
Preprocessing options (NN predictor, EC option encoder)
Block-based compression with configurable parameters

For HDF5 files requiring SZIP, users should:

Use HDF5 C library with libaec
Use h5py (Python) which links to C library
Re-compress files using GZIP (filter ID 1) for pure Go compatibility

Reference: https://github.com/MathisRosenhauer/libaec CCSDS Standard: https://public.ccsds.org/Pubs/121x0b3.pdf HDF Group: https://docs.hdfgroup.org/hdf5/latest/group___s_z_i_p.html

func NewSZIPFilter ¶

func NewSZIPFilter(optionMask, pixelsPerBlock, bitsPerPixel, pixelsPerScan uint32) *SZIPFilter

NewSZIPFilter creates an SZIP compression filter. Parameters match the SZIP specification:

optionMask: Compression options (NN=32, EC=4, LSB=1, MSB=2, RAW=128)
pixelsPerBlock: Number of pixels per block (must be even, typically 8-32)
bitsPerPixel: Bits per pixel (1-32)
pixelsPerScan: Pixels per scanline for 2D data (0 for 1D)

Common configurations:

NN predictor with EC encoder: optionMask = 36 (32 + 4)
RAW mode (no preprocessing): optionMask = 128

func (*SZIPFilter) Apply ¶

func (f *SZIPFilter) Apply(_ []byte) ([]byte, error)

Apply compresses data using SZIP algorithm. Returns compressed data suitable for storage.

NOTE: SZIP compression requires libaec library (C implementation). No pure Go implementation exists as of 2026. This is a stub that returns "not implemented" error.

For SZIP compression, consider:

Using CGo with libaec
Using HDF5 C library
Using alternative compression (GZIP filter ID 1)

func (*SZIPFilter) Encode ¶

func (f *SZIPFilter) Encode() (flags uint16, cdValues []uint32)

Encode returns the filter parameters for the Pipeline message.

For SZIP in HDF5, the client data contains:

cd_values[0]: Bits per pixel (1-32)
cd_values[1]: Coding method (NN=32, EC=4, LSB=1, MSB=2, RAW=128)
cd_values[2]: Pixels per block (even number, 8-32)
cd_values[3]: Pixels per scanline (0 for 1D data)

Reference: https://github.com/HDFGroup/hdf5/blob/develop/src/H5Zszip.c

func (*SZIPFilter) ID ¶

func (f *SZIPFilter) ID() FilterID

ID returns the HDF5 filter identifier for SZIP.

func (*SZIPFilter) Name ¶

func (f *SZIPFilter) Name() string

Name returns the HDF5 filter name.

func (*SZIPFilter) Remove ¶

func (f *SZIPFilter) Remove(_ []byte) ([]byte, error)

Remove decompresses SZIP-compressed data. Returns the original uncompressed data.

NOTE: SZIP decompression requires libaec library (C implementation). No pure Go implementation exists as of 2026. This is a stub that returns "not implemented" error.

type ShuffleFilter ¶

type ShuffleFilter struct {
	// contains filtered or unexported fields
}

ShuffleFilter implements byte shuffle (FilterID = 2).

The shuffle filter reorders bytes in the data to improve compression ratios for numeric data. It works by transposing byte order from element-by-element to byte-by-byte.

For example, with 4-byte integers [A1 A2 A3 A4][B1 B2 B3 B4][C1 C2 C3 C4]:

Original: [A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4]
Shuffled: [A1 B1 C1 A2 B2 C2 A3 B3 C3 A4 B4 C4]

This transformation groups similar bytes together (all first bytes, then all second bytes, etc.), which typically compresses much better with algorithms like GZIP.

The shuffle filter is especially effective for:

Integer arrays with slowly changing values
Floating-point arrays with similar magnitudes
Multi-dimensional arrays with spatial locality

Note: Shuffle should always be applied BEFORE compression filters like GZIP.

func NewShuffleFilter ¶

func NewShuffleFilter(elementSize uint32) *ShuffleFilter

NewShuffleFilter creates a shuffle filter with the specified element size.

The element size should match the datatype size:

int32, float32: elementSize = 4
int64, float64: elementSize = 8
int16: elementSize = 2
int8: elementSize = 1

For compound or array types, use the size of the base element.

func (*ShuffleFilter) Apply ¶

func (f *ShuffleFilter) Apply(data []byte) ([]byte, error)

Apply performs byte shuffle on the data.

The shuffle algorithm:

Divide data into elements of size elementSize
For each byte position in an element (0 to elementSize-1): a. Extract that byte from each element b. Write all those bytes consecutively

Example with elementSize=4, 3 elements:

Input:  [a1 a2 a3 a4][b1 b2 b3 b4][c1 c2 c3 c4]
Output: [a1 b1 c1][a2 b2 c2][a3 b3 c3][a4 b4 c4]

This groups similar bytes together, improving compression with GZIP.

func (*ShuffleFilter) Encode ¶

func (f *ShuffleFilter) Encode() (flags uint16, cdValues []uint32)

Encode returns the filter parameters for the Pipeline message.

For shuffle, the client data contains a single value: the element size. Flags are always 0 for shuffle.

func (*ShuffleFilter) ID ¶

func (f *ShuffleFilter) ID() FilterID

ID returns the HDF5 filter identifier for shuffle.

func (*ShuffleFilter) Name ¶

func (f *ShuffleFilter) Name() string

Name returns the HDF5 filter name.

func (*ShuffleFilter) Remove ¶

func (f *ShuffleFilter) Remove(data []byte) ([]byte, error)

Remove reverses the byte shuffle (unshuffle).

This operation reverses Apply, restoring the original byte order.

Example with elementSize=4, 3 elements:

Input:  [a1 b1 c1][a2 b2 c2][a3 b3 c3][a4 b4 c4]
Output: [a1 a2 a3 a4][b1 b2 b3 b4][c1 c2 c3 c4]

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL