blob

package module
v0.0.0-...-c126d80 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 16, 2026 License: Apache-2.0, MIT Imports: 17 Imported by: 0

README

Blob

A file archive format for OCI container registries with random access via HTTP range requests.

Blob stores files in two OCI blobs: a small index containing metadata, and a data blob containing file contents sorted by path. This design enables reading individual files without downloading entire archives, efficient directory fetches with single range requests, and content-addressed caching with automatic deduplication.

Installation

go get github.com/meigma/blob

Requires Go 1.25 or later.

Usage

Creating an Archive
import (
    "context"
    "bytes"
    "github.com/meigma/blob"
)

var indexBuf, dataBuf bytes.Buffer
err := blob.Create(context.Background(), "/path/to/source", &indexBuf, &dataBuf)
Reading Files
import "github.com/meigma/blob"

// Open archive with index data and a ByteSource for the data blob
archive, err := blob.New(indexData, source)

// Read a file
content, err := archive.ReadFile("config/app.json")

// List directory contents
entries, err := archive.ReadDir("src")

// Use as fs.FS
f, err := archive.Open("main.go")
Remote Archives via HTTP
import (
    "github.com/meigma/blob"
    "github.com/meigma/blob/http"
)

source, err := http.NewSource(dataURL,
    http.WithHeader("Authorization", "Bearer "+token),
)
archive, err := blob.New(indexData, source)
Caching
import (
    "github.com/meigma/blob/cache"
    "github.com/meigma/blob/cache/disk"
)

diskCache, err := disk.New("/var/cache/blob")
cached := cache.New(archive, diskCache)

// First read fetches from source and caches
content, err := cached.ReadFile("lib/utils.go")

// Second read returns from cache
content, err = cached.ReadFile("lib/utils.go")

Features

  • Random access - Read any file without downloading the entire archive
  • Directory fetches - Path-sorted storage enables single-request directory reads
  • Integrity verification - Per-file SHA256 hashes
  • Compression - Per-file zstd compression preserves random access
  • Content-addressed caching - Automatic deduplication across archives
  • Standard interfaces - Implements fs.FS, fs.ReadFileFS, fs.ReadDirFS

Documentation

Full documentation is available at the documentation site:

License

Licensed under either of Apache License, Version 2.0 or MIT License at your option.

Documentation

Overview

Package blob provides a file archive format optimized for random access via HTTP range requests against OCI registries.

Archives consist of two OCI blobs:

  • Index blob: FlatBuffers-encoded file metadata enabling O(log n) lookups
  • Data blob: Concatenated file contents, sorted by path for efficient directory fetches

The package implements fs.FS and related interfaces for stdlib compatibility.

Index

Constants

View Source
const (
	CompressionNone = blobtype.CompressionNone
	CompressionZstd = blobtype.CompressionZstd
)

Re-export compression constants.

View Source
const (
	DefaultIndexName = "index.blob"
	DefaultDataName  = "data.blob"
)

Default file names for blob archives.

View Source
const DefaultMaxFiles = 200_000

DefaultMaxFiles is the default limit used when no MaxFiles option is set.

Variables

View Source
var (
	// ErrHashMismatch is returned when file content does not match its hash.
	ErrHashMismatch = blobtype.ErrHashMismatch

	// ErrDecompression is returned when decompression fails.
	ErrDecompression = blobtype.ErrDecompression

	// ErrSizeOverflow is returned when byte counts exceed supported limits.
	ErrSizeOverflow = blobtype.ErrSizeOverflow
)

Sentinel errors re-exported from internal/blobtype.

View Source
var (
	// ErrSymlink is returned when a symlink is encountered where not allowed.
	ErrSymlink = errors.New("blob: symlink")

	// ErrTooManyFiles is returned when the file count exceeds the configured limit.
	ErrTooManyFiles = errors.New("blob: too many files")
)

Sentinel errors specific to the blob package.

View Source
var DefaultSkipCompression = write.DefaultSkipCompression

DefaultSkipCompression returns a SkipCompressionFunc that skips small files and known already-compressed extensions.

View Source
var EntryFromViewWithPath = blobtype.EntryFromViewWithPath

EntryFromViewWithPath creates an Entry from an EntryView with the given path.

Functions

func Create

func Create(ctx context.Context, dir string, indexW, dataW io.Writer, opts ...CreateOption) error

Create builds an archive from the contents of dir.

Files are written to the data writer in path-sorted order, enabling efficient directory fetches via single range requests. The index is written as a FlatBuffers-encoded blob to the index writer.

Create builds the entire index in memory; memory use scales with entry count and path length. Rough guide: ~30-50MB for 100k files with ~60B average paths (entries plus FlatBuffers buffer).

Create walks dir recursively, including all regular files. Empty directories are not preserved. Symbolic links are not followed.

The context can be used for cancellation of long-running archive creation.

Types

type Blob

type Blob struct {
	// contains filtered or unexported fields
}

Blob provides random access to archive files.

Blob implements fs.FS, fs.StatFS, fs.ReadFileFS, and fs.ReadDirFS for compatibility with the standard library.

func New

func New(indexData []byte, source ByteSource, opts ...Option) (*Blob, error)

New creates a Blob for accessing files in the archive.

The indexData is the FlatBuffers-encoded index blob and source provides access to file content. Options can be used to configure size and decoder limits.

func (*Blob) CopyDir

func (b *Blob) CopyDir(destDir, prefix string, opts ...CopyOption) error

CopyDir extracts all files under a directory prefix to a destination.

If prefix is "" or ".", all files in the archive are extracted.

Files are written atomically using temp files and renames by default. CopyWithCleanDest clears the destination prefix and writes directly to the final path. This is more performant but less safe.

Parent directories are created as needed.

By default:

  • Existing files are skipped (use CopyWithOverwrite to overwrite)
  • File modes and times are not preserved (use CopyWithPreserveMode/Times)
  • Range reads are pipelined (when beneficial) with concurrency 4 (use CopyWithReadConcurrency to change)

func (*Blob) CopyTo

func (b *Blob) CopyTo(destDir string, paths ...string) error

CopyTo extracts specific files to a destination directory.

Parent directories are created as needed.

By default:

  • Existing files are skipped (use CopyWithOverwrite to overwrite)
  • File modes and times are not preserved (use CopyWithPreserveMode/Times)
  • Range reads are pipelined (when beneficial) with concurrency 4 (use CopyWithReadConcurrency to change)

func (*Blob) CopyToWithOptions

func (b *Blob) CopyToWithOptions(destDir string, paths []string, opts ...CopyOption) error

CopyToWithOptions extracts specific files with options.

func (*Blob) Entries

func (b *Blob) Entries() iter.Seq[EntryView]

Entries returns an iterator over all entries as read-only views.

The returned views are only valid while the Blob remains alive.

func (*Blob) EntriesWithPrefix

func (b *Blob) EntriesWithPrefix(prefix string) iter.Seq[EntryView]

EntriesWithPrefix returns an iterator over entries with the given prefix as read-only views.

The returned views are only valid while the Blob remains alive.

func (*Blob) Entry

func (b *Blob) Entry(path string) (EntryView, bool)

Entry returns a read-only view of the entry for the given path.

The returned view is only valid while the Blob remains alive.

func (*Blob) IndexData

func (b *Blob) IndexData() []byte

IndexData returns the raw FlatBuffers-encoded index data. This is useful for creating new Blobs with different data sources.

func (*Blob) Len

func (b *Blob) Len() int

Len returns the number of entries in the archive.

func (*Blob) Open

func (b *Blob) Open(name string) (fs.File, error)

Open implements fs.FS.

Open returns an fs.File for reading the named file. The returned file verifies the content hash on Close (unless disabled by WithVerifyOnClose) and returns ErrHashMismatch if verification fails. Callers must read to EOF or Close to ensure integrity; partial reads may return unverified data.

func (*Blob) ReadDir

func (b *Blob) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir implements fs.ReadDirFS.

ReadDir returns directory entries for the named directory, sorted by name. Directory entries are synthesized from file paths—the archive does not store directories explicitly.

func (*Blob) ReadFile

func (b *Blob) ReadFile(name string) ([]byte, error)

ReadFile implements fs.ReadFileFS.

ReadFile reads and returns the entire contents of the named file. The content is decompressed if necessary and verified against its hash.

func (*Blob) Reader

func (b *Blob) Reader() *file.Reader

Reader returns the underlying file reader. This is useful for cached readers that need to share the decompression pool.

func (*Blob) Save

func (b *Blob) Save(indexPath, dataPath string) error

Save writes the blob archive to the specified paths.

Uses atomic writes (temp file + rename) to prevent partial writes on failure. Parent directories are created as needed.

func (*Blob) Stat

func (b *Blob) Stat(name string) (fs.FileInfo, error)

Stat implements fs.StatFS.

Stat returns file info for the named file without reading its content. For directories (paths that are prefixes of other entries), Stat returns synthetic directory info.

func (*Blob) Stream

func (b *Blob) Stream() io.Reader

Stream returns a reader that streams the entire data blob from beginning to end. This is useful for copying or transmitting the complete data content.

type BlobFile

type BlobFile struct {
	*Blob
	// contains filtered or unexported fields
}

BlobFile wraps a Blob with its underlying data file handle. Close must be called to release file resources.

func CreateBlob

func CreateBlob(ctx context.Context, srcDir, destDir string, opts ...CreateBlobOption) (*BlobFile, error)

CreateBlob creates a blob archive from srcDir and writes it to destDir.

By default, files are named "index.blob" and "data.blob". Use CreateBlobWithIndexName and CreateBlobWithDataName to override.

Returns a BlobFile that must be closed to release file handles.

func OpenFile

func OpenFile(indexPath, dataPath string, opts ...Option) (*BlobFile, error)

OpenFile opens a blob archive from index and data files.

The index file is read into memory; the data file is opened for random access. The returned BlobFile must be closed to release file resources.

func (*BlobFile) Close

func (bf *BlobFile) Close() error

Close closes the underlying data file.

type ByteSource

type ByteSource interface {
	io.ReaderAt
	Size() int64
}

ByteSource provides random access to the data blob.

Implementations exist for local files (*os.File) and HTTP range requests.

type ChangeDetection

type ChangeDetection uint8

ChangeDetection controls how strictly file changes are detected during creation.

const (
	ChangeDetectionNone ChangeDetection = iota
	ChangeDetectionStrict
)

type Compression

type Compression = blobtype.Compression

Compression identifies the compression algorithm used for a file.

type CopyOption

type CopyOption func(*copyConfig)

CopyOption configures CopyTo and CopyDir operations.

func CopyWithCleanDest

func CopyWithCleanDest(enabled bool) CopyOption

CopyWithCleanDest clears the destination prefix before copying and writes directly to the final path (no temp files). This is only supported by CopyDir.

func CopyWithOverwrite

func CopyWithOverwrite(overwrite bool) CopyOption

CopyWithOverwrite allows overwriting existing files. By default, existing files are skipped.

func CopyWithPreserveMode

func CopyWithPreserveMode(preserve bool) CopyOption

CopyWithPreserveMode preserves file permission modes from the archive. By default, modes are not preserved (files use umask defaults).

func CopyWithPreserveTimes

func CopyWithPreserveTimes(preserve bool) CopyOption

CopyWithPreserveTimes preserves file modification times from the archive. By default, times are not preserved (files use current time).

func CopyWithReadAheadBytes

func CopyWithReadAheadBytes(limit uint64) CopyOption

CopyWithReadAheadBytes caps the total size of buffered group data. A value of 0 disables the byte budget.

func CopyWithReadConcurrency

func CopyWithReadConcurrency(n int) CopyOption

CopyWithReadConcurrency sets the number of concurrent range reads. Use 1 to force serial reads. Zero uses the default concurrency (4).

func CopyWithWorkers

func CopyWithWorkers(n int) CopyOption

CopyWithWorkers sets the number of workers for parallel processing. Values < 0 force serial processing. Zero uses automatic heuristics. Values > 0 force a specific worker count.

type CreateBlobOption

type CreateBlobOption func(*createBlobConfig)

CreateBlobOption configures CreateBlob.

func CreateBlobWithChangeDetection

func CreateBlobWithChangeDetection(cd ChangeDetection) CreateBlobOption

CreateBlobWithChangeDetection sets the change detection mode.

func CreateBlobWithCompression

func CreateBlobWithCompression(compression Compression) CreateBlobOption

CreateBlobWithCompression sets the compression algorithm.

func CreateBlobWithDataName

func CreateBlobWithDataName(name string) CreateBlobOption

CreateBlobWithDataName sets the data file name (default: "data.blob").

func CreateBlobWithIndexName

func CreateBlobWithIndexName(name string) CreateBlobOption

CreateBlobWithIndexName sets the index file name (default: "index.blob").

func CreateBlobWithMaxFiles

func CreateBlobWithMaxFiles(n int) CreateBlobOption

CreateBlobWithMaxFiles limits the number of files in the archive.

func CreateBlobWithSkipCompression

func CreateBlobWithSkipCompression(fns ...SkipCompressionFunc) CreateBlobOption

CreateBlobWithSkipCompression adds skip compression predicates.

type CreateOption

type CreateOption func(*createConfig)

CreateOption configures archive creation.

func CreateWithChangeDetection

func CreateWithChangeDetection(cd ChangeDetection) CreateOption

CreateWithChangeDetection controls whether the writer verifies files did not change during archive creation. The zero value disables change detection to reduce syscalls; enable ChangeDetectionStrict for stronger guarantees.

func CreateWithCompression

func CreateWithCompression(c Compression) CreateOption

CreateWithCompression sets the compression algorithm to use. Use CompressionNone to store files uncompressed, CompressionZstd for zstd.

func CreateWithMaxFiles

func CreateWithMaxFiles(n int) CreateOption

CreateWithMaxFiles limits the number of files included in the archive. Zero uses DefaultMaxFiles. Negative means no limit.

func CreateWithSkipCompression

func CreateWithSkipCompression(fns ...SkipCompressionFunc) CreateOption

CreateWithSkipCompression adds predicates that decide to store a file uncompressed. If any predicate returns true, compression is skipped for that file. These checks are on the hot path, so keep them cheap.

type Entry

type Entry = blobtype.Entry

Entry represents a file in the archive.

type EntryView

type EntryView = blobtype.EntryView

EntryView provides a read-only view of an index entry.

type Option

type Option func(*Blob)

Option configures a Blob.

func WithDecoderConcurrency

func WithDecoderConcurrency(n int) Option

WithDecoderConcurrency sets the zstd decoder concurrency (default: 1). Values < 0 are treated as 0 (use GOMAXPROCS).

func WithDecoderLowmem

func WithDecoderLowmem(enabled bool) Option

WithDecoderLowmem sets whether the zstd decoder should use low-memory mode (default: false).

func WithMaxDecoderMemory

func WithMaxDecoderMemory(limit uint64) Option

WithMaxDecoderMemory limits the maximum memory used by the zstd decoder. Set limit to 0 to disable the limit.

func WithMaxFileSize

func WithMaxFileSize(limit uint64) Option

WithMaxFileSize limits the maximum per-file size (compressed and uncompressed). Set limit to 0 to disable the limit.

func WithVerifyOnClose

func WithVerifyOnClose(enabled bool) Option

WithVerifyOnClose controls whether Close drains the file to verify the hash.

When false, Close returns without reading the remaining data. Integrity is only guaranteed when callers read to EOF.

type SkipCompressionFunc

type SkipCompressionFunc = write.SkipCompressionFunc

SkipCompressionFunc returns true when a file should be stored uncompressed. It is called once per file and should be inexpensive.

Directories

Path Synopsis
Package cache provides content-addressed caching for blob archives.
Package cache provides content-addressed caching for blob archives.
disk
Package disk provides a disk-backed cache implementation.
Package disk provides a disk-backed cache implementation.
cmd
profiler command
Package http provides a ByteSource backed by HTTP range requests.
Package http provides a ByteSource backed by HTTP range requests.
internal
batch
Package batch provides batch processing for reading multiple entries from a blob archive.
Package batch provides batch processing for reading multiple entries from a blob archive.
blobtype
Package blobtype defines shared types used across the blob package and its internal packages.
Package blobtype defines shared types used across the blob package and its internal packages.
fb
file
Package file provides internal file reading operations for the blob package.
Package file provides internal file reading operations for the blob package.
sizing
Package sizing provides safe size arithmetic and conversions to prevent overflow.
Package sizing provides safe size arithmetic and conversions to prevent overflow.
write
Package write provides internal file writing operations for the blob package.
Package write provides internal file writing operations for the blob package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL