c4fs

package module
v0.9.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 12, 2025 License: MIT Imports: 13 Imported by: 0

README

c4fs - Content-Addressable Filesystem

Go Reference Go Report Card CI License

A content-addressable filesystem for the absfs ecosystem using C4 IDs (SMPTE ST 2114:2017) with transparent hydration/dehydration, snapshots, and efficient sync.

Overview

c4fs is a C4-based content-addressable filesystem that provides:

  • Content Addressing: Uses C4 IDs for cryptographically verifiable content identification
  • C4M Format: Filesystem metadata stored in C4 Manifest (C4M) format
  • Transparent Operations: Automatic hydration/dehydration of file content
  • Copy-on-Write: Immutable base layer + mutable overlay architecture
  • Deduplication: Same content stored once regardless of filename
  • Instant Snapshots: Manifest is the snapshot
  • Efficient Sync: Transfer manifest, fetch only missing C4 IDs

Core Concepts

Dehydration (Write)
File content → C4 Store → Get C4 ID → Store in C4M

When writing files, content is stored in the C4 Store and the resulting C4 ID is recorded in the C4M manifest.

Hydration (Read)
C4M entry → Lookup C4 ID → C4 Store.Get() → File content

When reading files, the C4 ID is looked up in the manifest and the content is retrieved from the C4 Store.

Metadata Preservation

Files appear with their original size and attributes, not the 90-byte C4 ID size. The C4M manifest preserves:

  • File size
  • Unix permissions
  • Timestamps (modified, accessed, created)
  • File type (regular file, directory, symlink)
Copy-on-Write Architecture
Immutable base C4M (snapshot/version)
         +
Mutable overlay C4M (layer)
         ↓
   Combined view

Reads check layer first, then fall back to base. Writes always go to the layer. The Flatten() operation merges base + layer into a new manifest.

Architecture

absfs.FileSystem Interface
        ↓
    c4fs.FS (base + layer manifests)
        ↓
C4 Store (content by C4 ID)
        ↓
Backend (osfs, s3fs, memfs, etc.)

Key Features

  • Automatic Content Deduplication: Same content = same C4 ID = stored once
  • Instant Snapshots: Manifest is the snapshot - no data copying required
  • Efficient Sync: Transfer manifest, fetch missing C4 IDs only
  • Cryptographic Verification: C4 IDs are SHA-512 based
  • Versioning: Maintain manifest history for time-travel
  • Copy-on-Write Efficiency: Minimal storage overhead for snapshots
  • Backend Agnostic: Works with any c4/store backend (RAM, Local, S3, etc.)
  • Full Symlink Support: Unix-style symbolic links with relative/absolute paths
  • High Performance: O(1) path lookups with hash map indexing (up to 21x faster)
  • Standard Library Compliance: Full io/fs interface implementation
  • Garbage Collection Ready: Track referenced C4 IDs for orphaned content cleanup
  • Thread-Safe: Concurrent operations protected with read/write locks

Components

1. C4 Store Interface
type Store interface {
    // Put stores content and returns its C4 ID
    Put(io.Reader) (c4.ID, error)

    // Get retrieves content by C4 ID
    Get(c4.ID) (io.ReadCloser, error)

    // Has checks if content exists
    Has(c4.ID) bool

    // Delete removes content (with refcounting)
    Delete(c4.ID) error
}
2. Store Implementations
LocalStore

Disk-based storage with directory hierarchy:

/var/c4/[first-2]/[next-2]/[full-c4-id]

Example:

/var/c4/c4/1a/c41a2b3c...rest-of-id
MemoryStore

In-memory map for testing and temporary storage.

S3Store

S3 bucket with C4 IDs as object keys.

CachedStore

Local cache + remote backend for performance.

3. Filesystem (c4fs.FS)
type FS struct {
    base  *c4m.Manifest  // Immutable base (snapshot)
    layer *c4m.Manifest  // Mutable overlay (starts empty)
    store Store          // Content storage
}

Operations:

  • Base manifest: Immutable, read-only
  • Layer manifest: Mutable overlay, starts empty
  • Copy-on-write: Reads check layer → base, writes go to layer
  • Flatten(): Merge base + layer into new manifest
4. File Operations
Read
  1. Lookup entry in manifest (layer first, then base)
  2. Get C4 ID from entry
  3. Hydrate from C4 Store
  4. Return content with metadata from manifest
Write
  1. Buffer content in memory or temp file
  2. On close, dehydrate to C4 Store
  3. Get C4 ID
  4. Update layer manifest with entry
Metadata

All metadata preserved from manifest:

  • Size: Original file size, not C4 ID size
  • Mode: Unix permissions
  • Timestamps: Modified, accessed, created times

Implementation Phases

Phase 1: Core Store
  • Store interface definition
  • LocalStore implementation
  • MemoryStore implementation
  • Reference counting for garbage collection
Phase 2: C4M Integration
  • Parse C4M manifests (github.com/Avalanche-io/c4/c4m)
  • Generate C4M manifests
  • Merge base + layer manifests
  • Diff manifests
Phase 3: Basic Filesystem
  • Read operations (Open, Stat, ReadDir)
  • Hydration: C4M entry → Store.Get()
  • absfs.FileSystem interface (read-only first)
Phase 4: Write Operations
  • Dehydrating file wrapper
  • Write buffering
  • Close() triggers dehydration
  • Layer manifest updates
Phase 5: Copy-on-Write
  • Base + layer architecture
  • Overlay semantics
  • Flatten() operation
  • Snapshot management
Phase 6: Advanced Features
  • Garbage collection (remove unreferenced C4 IDs)
  • Efficient sync protocol
  • S3Store implementation
  • CachedStore implementation

Implementation Status

✅ Completed Features
  • Core Filesystem Operations: Open, Stat, ReadDir, ReadFile, WriteFile, Create
  • Directory Operations: Mkdir, MkdirAll, Remove, RemoveAll, Rename
  • Copy-on-Write: Base + layer architecture with Flatten() operation
  • Symbolic Links: Full Unix-style symlink support with loop detection
  • Metadata Operations: Chmod, Chtimes, Lstat
  • Pattern Matching: Glob support
  • Subtrees: Sub() for filesystem subtrees
  • Standard Interfaces: Implements io/fs interfaces (FS, ReadDirFS, StatFS, GlobFS, SubFS)
  • Performance: O(1) path lookups with hash map indexing
  • Garbage Collection: ReferencedIDs() for identifying orphaned content
  • Root Directory: Proper handling of "/", ".", and "" as root
🎯 Performance Characteristics

Recent optimizations (path indexing) provide dramatic speedups:

  • Stat operations: 9-91% faster (up to 12x speedup for base manifest lookups)
  • Rename operations: 82-95% faster (up to 21x speedup)
  • Remove operations: 87% faster (7.8x speedup)
  • O(1) lookups: Hash map indexing instead of linear scans
  • Concurrent-safe: All operations protected with read/write locks

Benchmark results (on typical workloads):

  • Stat: ~200 ns/op
  • ReadFile (1KB): ~50 μs/op
  • WriteFile (1KB): ~100 μs/op
  • ReadDir (100 files): ~50 μs/op
  • Rename: ~50 μs/op for files, ~5ms for directories with 1000 children
🔄 Future Enhancements
  • Advanced GC: Automatic garbage collection with mark-and-sweep
  • Compression: Optional content compression in store
  • Encryption: Encrypted content storage
  • Caching: Multi-level caching for remote stores
  • Sync Protocol: Efficient filesystem synchronization
  • Watches: File change notifications

Usage Examples

Quick Start
package main

import (
    "fmt"
    "log"

    "github.com/Avalanche-io/c4/c4m"
    "github.com/Avalanche-io/c4/store"
    "github.com/absfs/c4fs"
)

func main() {
    // Create a RAM-based store for testing (or use store.NewLocal(path) for disk)
    adapter := c4fs.NewStoreAdapter(store.NewRAM())

    // Create a new filesystem with empty base manifest
    fs := c4fs.New(nil, adapter)

    // Write a file - content is automatically dehydrated to C4 store
    err := fs.WriteFile("hello.txt", []byte("Hello, C4FS!"), 0644)
    if err != nil {
        log.Fatal(err)
    }

    // Read the file back - content is automatically hydrated
    data, err := fs.ReadFile("hello.txt")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("Content: %s\n", data)

    // Take a snapshot
    snapshot := fs.Flatten()
    fmt.Printf("Snapshot has %d entries\n", len(snapshot.Entries))
}
Working with Snapshots
// Take a snapshot of current state
snapshot := fs.Flatten()

// Save snapshot to file (manifest only, not content)
f, err := os.Create("backup.c4m")
if err != nil {
    log.Fatal(err)
}
defer f.Close()
snapshot.WriteTo(f)

// Later: restore from snapshot
// The content is still in the C4 store, we just load the manifest
data, _ := os.ReadFile("backup.c4m")
restoredManifest := c4m.Parse(data)
restoredFS := c4fs.New(restoredManifest, adapter)

// The filesystem is now exactly as it was when snapshot was taken
Automatic Deduplication
// Write the same content to multiple files
content := []byte("This content appears in many files")

fs.WriteFile("file1.txt", content, 0644)
fs.WriteFile("file2.txt", content, 0644)
fs.WriteFile("docs/copy.txt", content, 0644)

// All three files reference the same C4 ID
// Content is stored only once in the C4 store
// Massive space savings for duplicate content!

// Verify deduplication
refs := fs.ReferencedIDs()
fmt.Printf("Unique content blobs: %d\n", len(refs)) // Will be 1
// Create a file
fs.WriteFile("target.txt", []byte("target content"), 0644)

// Create a symbolic link
err := fs.Symlink("target.txt", "link.txt")
if err != nil {
    log.Fatal(err)
}

// Read through the symlink
data, err := fs.ReadFile("link.txt")
// data contains "target content"

// Check what the symlink points to
target, err := fs.ReadLink("link.txt")
fmt.Printf("Link points to: %s\n", target) // "target.txt"

// Stat without following symlink
info, err := fs.Lstat("link.txt")
// info.Mode() will show it's a symlink
Copy-on-Write Layering
// Create base filesystem with initial content
baseFS := c4fs.New(nil, adapter)
baseFS.WriteFile("config.json", []byte(`{"version": 1}`), 0644)
baseFS.WriteFile("readme.md", []byte("# Project"), 0644)

// Take snapshot as base layer
base := baseFS.Flatten()

// Create new filesystem with base as immutable layer
layeredFS := c4fs.New(base, adapter)

// Make changes - these go to the mutable layer
layeredFS.WriteFile("config.json", []byte(`{"version": 2}`), 0644)
layeredFS.Remove("readme.md")
layeredFS.WriteFile("new-file.txt", []byte("new content"), 0644)

// The base manifest is unchanged
// All changes are in the layer
// Flatten creates a new merged snapshot
newSnapshot := layeredFS.Flatten()
Garbage Collection
// Get all currently referenced C4 IDs
refs := fs.ReferencedIDs()

// Example: iterate and check which IDs are actually used
for id := range refs {
    fmt.Printf("Referenced: %s\n", id.String())
}

// Use this to identify orphaned content in your store
// and clean it up with adapter.Delete(id)

// Note: Be careful with GC across multiple filesystems/snapshots!
// An ID might be orphaned in one FS but referenced in another
Directory Operations
// Create directories
err := fs.Mkdir("project", 0755)
err = fs.MkdirAll("project/src/components", 0755)

// Write files in directories
fs.WriteFile("project/src/main.go", []byte("package main"), 0644)

// List directory contents
entries, err := fs.ReadDir("project/src")
for _, entry := range entries {
    fmt.Printf("%s (%d bytes)\n", entry.Name(), entry.Size())
}

// Glob pattern matching
matches, err := fs.Glob("project/**/*.go")
for _, match := range matches {
    fmt.Printf("Go file: %s\n", match)
}
Rename and Move Operations
// Rename a file
err := fs.Rename("old-name.txt", "new-name.txt")

// Move file to different directory
err = fs.Rename("file.txt", "archive/file.txt")

// Rename a directory (renames all children too)
err = fs.Rename("old-dir", "new-dir")
// All files like "old-dir/file.txt" become "new-dir/file.txt"

Comparison with Traditional Filesystems

Aspect Traditional FS C4FS
Content Filename → Content (mutable) Filename → C4 ID → Content (immutable)
Storage Duplicate content stored multiple times Deduplicated by C4 ID
Snapshots Copy all data Just save manifest
Sync Transfer all files Transfer manifest + missing C4 IDs
Verification None or checksums Cryptographic C4 IDs
Versioning Complex (requires external tools) Simple (manifest history)

Benefits

Space Efficiency

Automatic deduplication means same content is stored once, regardless of how many files reference it.

Instant Snapshots

Snapshots are just manifests - no data copying required. Create thousands of snapshots with minimal overhead.

Cryptographic Integrity

C4 IDs are based on SHA-512, providing cryptographic verification of content integrity.

Efficient Remote Sync
  1. Transfer manifest (small)
  2. Compare local and remote C4 IDs
  3. Only fetch missing content
  4. Verify with C4 IDs
Versioning and Time-Travel

Maintain manifest history to access any previous state of the filesystem.

Git-like Content Addressing

Similar to how Git stores objects by hash, but for entire filesystems.

Integration with absfs Ecosystem

c4fs integrates seamlessly with other absfs filesystem wrappers:

// Compose with other wrappers
base := c4m.Parse("snapshot.c4m")
store := c4store.NewS3("my-bucket")

fs := c4fs.New(base, store)
fs = cachefs.New(fs)          // Add caching
fs = encryptfs.New(fs)        // Add encryption
fs = metricsfs.New(fs)        // Add metrics
fs = retryfs.New(fs)          // Add retry logic

Technical Details

C4 ID Format
  • Length: 90 characters
  • Format: c4 + 88 base58-encoded characters
  • Based on: SHA-512 hash of content
  • Example: c41a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p7q8r9s0t1u2v3w4x5y6z7a8b9c0d1e2f3g4h5i6j7k8l9m0n1
Storage Layout

Content-addressable directory structure:

/var/c4/
  c4/
    1a/
      c41a2b3c...  (full C4 ID as filename)
    2b/
      c42b3c4d...
Manifest Format

C4M v1.0 specification (SMPTE ST 2114:2017):

  • JSON-based manifest
  • Entry per file/directory
  • Stores: path, C4 ID, size, mode, timestamps
  • Hierarchical directory structure
Metadata Storage

All metadata stored in C4M manifest:

  • Unix permissions (mode)
  • Timestamps (modified, accessed, created)
  • Original file size
  • File type (regular, directory, symlink)
Reference Counting

For garbage collection:

  • Track C4 ID references across all manifests
  • Only delete when refcount reaches zero
  • Incremental GC to avoid performance impact
Copy-on-Write Semantics
  • Base manifest: Immutable
  • Layer manifest: Mutable
  • Read priority: Layer → Base
  • Write destination: Always layer
  • Flatten: Merge into new immutable manifest

Testing Strategy

Unit Tests
  • Store implementations (Local, Memory, S3)
  • C4M parsing and generation
  • Reference counting
  • Manifest merging and diffing
Integration Tests
  • Full hydration/dehydration cycle
  • Copy-on-write operations
  • Snapshot creation and restoration
  • Deduplication verification
Performance Benchmarks
  • Store operations (Put, Get, Has, Delete)
  • Hydration/dehydration throughput
  • Manifest operations (merge, diff, flatten)
  • Large directory listing
Correctness Tests
  • Metadata preservation
  • Content integrity verification
  • Reference counting accuracy
  • Concurrent access safety

License

MIT License - see LICENSE file for details.

Contributing

Contributions welcome! Please ensure:

  • Tests pass: go test ./...
  • Code formatted: go fmt ./...
  • Linter clean: golangci-lint run
  • absfs - Abstract filesystem interface
  • c4 - C4 ID implementation
  • osfs - OS filesystem wrapper
  • memfs - In-memory filesystem
  • s3fs - S3 filesystem wrapper

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type FS

type FS struct {
	// contains filtered or unexported fields
}

FS implements a content-addressable filesystem using C4 IDs. It uses a copy-on-write architecture with an immutable base manifest and a mutable layer manifest for changes.

func New

func New(base *c4m.Manifest, store *StoreAdapter) *FS

New creates a new C4FS filesystem. If base is nil, an empty manifest is created.

func NewWithLayer

func NewWithLayer(base, layer *c4m.Manifest, store *StoreAdapter) *FS

NewWithLayer creates a new C4FS filesystem with an existing layer.

func (*FS) Base

func (c4fs *FS) Base() *c4m.Manifest

Base returns a copy of the base manifest.

func (*FS) Chmod

func (c4fs *FS) Chmod(name string, mode fs.FileMode) error

Chmod changes the mode of the named file in the layer.

func (*FS) Chown

func (c4fs *FS) Chown(name string, uid, gid int) error

Chown changes the owner and group ids of the named file. This is a no-op for content-addressable filesystems since ownership is not part of the content identity.

func (*FS) Chtimes

func (c4fs *FS) Chtimes(name string, atime, mtime time.Time) error

Chtimes changes the access and modification times of the named file in the layer.

func (*FS) Create

func (c4fs *FS) Create(name string) (File, error)

Create creates a file for writing.

func (*FS) Exists

func (c4fs *FS) Exists(name string) bool

Exists checks if a file or directory exists.

func (*FS) Flatten

func (c4fs *FS) Flatten() *c4m.Manifest

Flatten merges the base and layer manifests into a new manifest. This creates a new snapshot of the current filesystem state. Tombstones in the layer cause corresponding base entries to be excluded.

func (*FS) Glob

func (c4fs *FS) Glob(pattern string) ([]string, error)

Glob returns the names of all files matching pattern. This implements fs.GlobFS for pattern matching.

func (*FS) IsDir

func (c4fs *FS) IsDir(name string) bool

IsDir checks if the path is a directory.

func (*FS) IsFile

func (c4fs *FS) IsFile(name string) bool

IsFile checks if the path is a regular file.

func (*FS) Layer

func (c4fs *FS) Layer() *c4m.Manifest

Layer returns a copy of the layer manifest.

func (*FS) Lstat

func (c4fs *FS) Lstat(name string) (fs.FileInfo, error)

Lstat returns file information for the named file without following symlinks. This is like Stat but doesn't follow symbolic links.

func (*FS) Mkdir

func (c4fs *FS) Mkdir(name string, perm fs.FileMode) error

Mkdir creates a new directory.

func (*FS) MkdirAll

func (c4fs *FS) MkdirAll(name string, perm fs.FileMode) error

MkdirAll creates a directory and all necessary parents.

func (*FS) Open

func (c4fs *FS) Open(name string) (fs.File, error)

Open opens the named file for reading. This follows symbolic links.

func (*FS) OpenFile

func (c4fs *FS) OpenFile(name string, flag int, perm fs.FileMode) (File, error)

OpenFile opens a file with the specified flags and permissions. This is required by the absfs.Filer interface.

func (*FS) ReadDir

func (c4fs *FS) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir reads the directory named by dirname and returns a list of directory entries.

func (*FS) ReadFile

func (c4fs *FS) ReadFile(name string) ([]byte, error)

ReadFile reads the named file and returns its contents.

func (c4fs *FS) ReadLink(name string) (string, error)

ReadLink reads the target of a symbolic link. It returns the target path without resolving it.

func (*FS) ReferencedIDs

func (c4fs *FS) ReferencedIDs() map[c4.ID]bool

ReferencedIDs returns a set of all C4 IDs currently referenced by the filesystem. This includes IDs from both the base and layer manifests, excluding tombstones and shadowed entries. The returned map can be used for garbage collection to identify orphaned content.

func (*FS) Remove

func (c4fs *FS) Remove(name string) error

Remove removes the named file or empty directory. In a copy-on-write filesystem, this adds a tombstone marker to the layer.

func (*FS) RemoveAll

func (c4fs *FS) RemoveAll(name string) error

RemoveAll removes a path and any children it contains. For directories, it recursively removes all contents.

func (*FS) Rename

func (c4fs *FS) Rename(oldname, newname string) error

Rename renames (moves) oldpath to newpath. For directories, all children are recursively renamed.

func (*FS) Size

func (c4fs *FS) Size(name string) (int64, error)

Size returns the size of the named file.

func (*FS) Stat

func (c4fs *FS) Stat(name string) (fs.FileInfo, error)

Stat returns file information for the given path. Unlike Lstat, this follows symbolic links.

func (*FS) Store

func (c4fs *FS) Store() *StoreAdapter

Store returns the underlying content store.

func (*FS) Sub

func (c4fs *FS) Sub(dir string) (fs.FS, error)

Sub returns an FS corresponding to the subtree rooted at dir. This implements fs.SubFS for better composability.

func (c4fs *FS) Symlink(target, name string) error

Symlink creates a symbolic link at name pointing to target.

func (*FS) WriteFile

func (c4fs *FS) WriteFile(name string, data []byte, perm fs.FileMode) error

WriteFile writes data to the named file, creating it if necessary. This is a dehydration operation: content → C4 ID → layer manifest.

type File

type File interface {
	fs.File // Embeds Read, Close, Stat

	// Write operations
	Write(p []byte) (n int, err error)
	WriteAt(p []byte, off int64) (n int, err error)
	WriteString(s string) (n int, err error)

	// Read operations
	ReadAt(p []byte, off int64) (n int, err error)

	// Seek operation
	Seek(offset int64, whence int) (int64, error)

	// Sync operation
	Sync() error

	// Truncate operation
	Truncate(size int64) error

	// Directory operations
	Readdirnames(n int) (names []string, err error)
	ReadDir(n int) ([]fs.DirEntry, error)
}

File represents an open file with read/write capabilities.

type FileInfo

type FileInfo = fs.FileInfo

FileInfo is an alias for fs.FileInfo for convenience.

type FileSystem

type FileSystem interface {
	// Read operations (from io/fs.FS)
	Open(name string) (fs.File, error)

	// Extended read operations
	Stat(name string) (fs.FileInfo, error)
	ReadDir(name string) ([]fs.DirEntry, error)
	ReadFile(name string) ([]byte, error)

	// Write operations
	Create(name string) (File, error)
	Mkdir(name string, perm fs.FileMode) error
	MkdirAll(name string, perm fs.FileMode) error
	Remove(name string) error
	RemoveAll(name string) error
	WriteFile(name string, data []byte, perm fs.FileMode) error

	// Utility operations
	Rename(oldname, newname string) error
}

FileSystem represents a filesystem interface compatible with io/fs.FS and extended with write operations.

type StoreAdapter

type StoreAdapter struct {
	// contains filtered or unexported fields
}

StoreAdapter wraps a c4/store.Store and provides high-level Put/Get operations that compute C4 IDs from content.

func NewStoreAdapter

func NewStoreAdapter(s store.Store) *StoreAdapter

NewStoreAdapter creates a StoreAdapter from a c4/store.Store.

func (*StoreAdapter) Delete

func (s *StoreAdapter) Delete(id c4.ID) error

Delete removes content for the given C4 ID.

func (*StoreAdapter) Get

func (s *StoreAdapter) Get(id c4.ID) (io.ReadCloser, error)

Get retrieves content by C4 ID. Returns an error if the content does not exist.

func (*StoreAdapter) Has

func (s *StoreAdapter) Has(id c4.ID) bool

Has checks if content exists for the given C4 ID. This is a best-effort check - tries to open and immediately close.

func (*StoreAdapter) Put

func (s *StoreAdapter) Put(r io.Reader) (c4.ID, error)

Put stores content and returns its C4 ID. The C4 ID is computed from the content using SHA-512. If the content already exists in the store, it returns the ID without error.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL