12-Storage-Redesign.md 24 KB

Storage Layer Redesign

Document Status

  • Created: 2026-03-13
  • Status: Draft
  • Author: Collaborative design session

Problems with Current Architecture

1. Confusing Separation of Concerns

The current design splits storage operations between Storage and IndexManager without clear boundaries:

  • Storage handles entity metadata, properties, children, AND configs
  • IndexManager handles type indices, category members, catalogue groups, AND n-grams
  • It's unclear which class should handle what

2. Awkward Coupling in EmbeddedEngine

// EmbeddedEngine must cast to BasicStorage to get Dbm for IndexManager
var basic_storage = (_storage as Storage.BasicStorage);
if (basic_storage != null) {
    _index_manager = new Storage.IndexManager(basic_storage.dbm);
}

This is a code smell indicating the abstraction is broken.

3. Unclear Value of Storage Interface

The Storage interface has only one implementation (BasicStorage). It's unclear what alternative implementations would look like or why an interface is needed.

4. Naming Confusion

  • Storage.add_child() / get_children() - Structural child names
  • IndexManager.add_to_category() / get_category_members() - Indexed member documents
  • Both deal with "children" but mean different things

5. Key Prefix Sprawl

Both classes define key prefixes independently, making it hard to see the overall storage schema.

Proposed Architecture

Design Principles

  1. One class per prefix - Each key prefix gets its own focused class
  2. Entity facades compose prefix stores - High-level APIs for each entity type
  3. Engine holds facade references - Clean dependency graph
  4. All stores are concrete classes - No unnecessary interfaces

Architecture Overview

graph TB
    subgraph Low-Level Prefix Stores
        EMS[EntityMetadataStorage<br/>entity: prefix]
        PS[PropertiesStorage<br/>props: prefix]
        CS[ChildrenStorage<br/>children: prefix]
        CCS[CategoryConfigStorage<br/>config: prefix]
        CLCS[CatalogueConfigStorage<br/>catcfg: prefix]
        TIS[TypeIndexStorage<br/>typeidx: prefix]
        CATIS[CategoryIndexStorage<br/>cat: prefix]
        CLIS[CatalogueIndexStorage<br/>catl: prefix]
        TXIS[TextIndexStorage<br/>idx: prefix]
    end
    
    subgraph High-Level Entity Facades
        ES[EntityStore<br/>metadata + type index]
        DS[DocumentStore<br/>properties]
        CNS[ContainerStore<br/>children]
        CAS[CategoryStore<br/>config + index + children]
        CLS[CatalogueStore<br/>config + index]
        IDS[IndexStore<br/>config + text index]
    end
    
    subgraph Engine
        E[EmbeddedEngine]
    end
    
    EMS --> ES
    TIS --> ES
    PS --> DS
    CS --> CNS
    CCS --> CAS
    CATIS --> CAS
    CS --> CAS
    CLCS --> CLS
    CLIS --> CLS
    CCS --> IDS
    TXIS --> IDS
    
    E --> ES
    E --> DS
    E --> CNS
    E --> CAS
    E --> CLS
    E --> IDS

Low-Level Prefix Stores

Each prefix store handles exactly one key prefix and provides type-safe operations.

Naming Convention: Prefix stores use the Storage suffix (e.g., EntityMetadataStorage).

EntityMetadataStorage

Prefix: entity:

Key Format: entity:<path>

Value: Serialized (EntityType type, string? type_label)

public class EntityMetadataStorage : Object {
    public EntityMetadataStorage(Dbm dbm);
    
    public void store_metadata(EntityPath path, EntityType type, string? type_label) throws StorageError;
    public EntityType? get_type(EntityPath path) throws StorageError;
    public string? get_type_label(EntityPath path) throws StorageError;
    public bool exists(EntityPath path);
    public void delete(EntityPath path) throws StorageError;
}

PropertiesStorage

Prefix: props:

Key Format: props:<path>

Value: Serialized Properties dictionary

public class PropertiesStorage : Object {
    public PropertiesStorage(Dbm dbm);
    
    public void store(EntityPath path, Properties properties) throws StorageError;
    public Properties? load(EntityPath path) throws StorageError;
    public void delete(EntityPath path) throws StorageError;
}

ChildrenStorage

Prefix: children:

Key Format: children:<parent_path>

Value: Serialized array of child names

public class ChildrenStorage : Object {
    public ChildrenStorage(Dbm dbm);
    
    public void add_child(EntityPath parent, string child_name) throws StorageError;
    public void remove_child(EntityPath parent, string child_name) throws StorageError;
    public bool has_child(EntityPath parent, string child_name) throws StorageError;
    public Enumerable<string> get_children(EntityPath parent) throws StorageError;
    public void delete(EntityPath parent) throws StorageError;
}

CategoryConfigStorage

Prefix: config:

Key Format: config:<path>

Value: Serialized (string type_label, string expression)

public class CategoryConfigStorage : Object {
    public CategoryConfigStorage(Dbm dbm);
    
    public void store(EntityPath path, string type_label, string expression) throws StorageError;
    public CategoryConfig? load(EntityPath path) throws StorageError;
    public void delete(EntityPath path) throws StorageError;
}

public class CategoryConfig : Object {
    public string type_label { get; construct set; }
    public string expression { get; construct set; }
}

CatalogueConfigStorage

Prefix: catcfg:

Key Format: catcfg:<path>

Value: Serialized (string type_label, string expression)

public class CatalogueConfigStorage : Object {
    public CatalogueConfigStorage(Dbm dbm);
    
    public void store(EntityPath path, string type_label, string expression) throws StorageError;
    public CatalogueConfig? load(EntityPath path) throws StorageError;
    public void delete(EntityPath path) throws StorageError;
}

public class CatalogueConfig : Object {
    public string type_label { get; construct set; }
    public string expression { get; construct set; }
}

TypeIndexStorage

Prefix: typeidx:

Key Format: typeidx:<type_label>

Value: Serialized array of document paths

public class TypeIndexStorage : Object {
    public TypeIndexStorage(Dbm dbm);
    
    public void add_document(string type_label, string doc_path) throws StorageError;
    public void remove_document(string type_label, string doc_path) throws StorageError;
    public Enumerable<string> get_documents(string type_label);
}

CategoryIndexStorage

Prefix: cat:

Key Format: cat:<category_path>:members

Value: Serialized array of document paths

public class CategoryIndexStorage : Object {
    public CategoryIndexStorage(Dbm dbm);
    
    // Single member operations
    public void add_member(string category_path, string doc_path) throws StorageError;
    public void remove_member(string category_path, string doc_path) throws StorageError;
    
    // Batch member operations
    public void add_members(string category_path, Enumerable<string> doc_paths) throws StorageError;
    public void remove_members(string category_path, Enumerable<string> doc_paths) throws StorageError;
    public void set_members(string category_path, Enumerable<string> doc_paths) throws StorageError;
    
    // Query and lifecycle
    public Enumerable<string> get_members(string category_path);
    public void clear(string category_path) throws StorageError;
}

CatalogueIndexStorage

Prefix: catl:

Key Formats:

  • catl:<catalogue_path>:keys - List of group keys
  • catl:<catalogue_path>:group:<key> - Document paths in group

    public class CatalogueIndexStorage : Object {
    public CatalogueIndexStorage(Dbm dbm);
        
    // Group operations
    public void add_to_group(string catalogue_path, string key, string doc_path) throws StorageError;
    public void remove_from_group(string catalogue_path, string key, string doc_path) throws StorageError;
    public Enumerable<string> get_group_members(string catalogue_path, string key);
        
    // Key operations
    public void add_key(string catalogue_path, string key) throws StorageError;
    public void remove_key(string catalogue_path, string key) throws StorageError;
    public Enumerable<string> get_keys(string catalogue_path);
        
    // Clear all
    public void clear(string catalogue_path) throws StorageError;
    }
    

TextIndexStorage

Prefix: idx:

Key Formats:

  • idx:<index_path>:tri:<trigram> - Document paths containing trigram
  • idx:<index_path>:bi:<bigram> - Trigrams containing bigram
  • idx:<index_path>:uni:<unigram> - Bigrams starting with unigram
  • idx:<index_path>:doc:<doc_path> - Cached document content

    public class TextIndexStorage : Object {
    public TextIndexStorage(Dbm dbm);
        
    // Trigram index
    public void add_trigram(string index_path, string trigram, string doc_path) throws StorageError;
    public void remove_trigram(string index_path, string trigram, string doc_path) throws StorageError;
    public Enumerable<string> get_documents_for_trigram(string index_path, string trigram);
        
    // Bigram reverse index
    public void add_bigram_mapping(string index_path, string bigram, string trigram) throws StorageError;
    public Enumerable<string> get_trigrams_for_bigram(string index_path, string bigram);
        
    // Unigram reverse index
    public void add_unigram_mapping(string index_path, string unigram, string bigram) throws StorageError;
    public Enumerable<string> get_bigrams_for_unigram(string index_path, string unigram);
        
    // Document content cache
    public void store_document_content(string index_path, string doc_path, string content) throws StorageError;
    public string? get_document_content(string index_path, string doc_path);
    public void remove_document_content(string index_path, string doc_path) throws StorageError;
        
    // Clear all
    public void clear(string index_path) throws StorageError;
    }
    

High-Level Entity Facades

Facades compose prefix stores to provide entity-specific APIs.

Naming Convention: Entity facades use the Store suffix (e.g., EntityStore).

EntityStore

Composition: EntityMetadataStorage + TypeIndexStorage

public class EntityStore : Object {
    public EntityStore(Dbm dbm);
    
    // Metadata operations
    public void store_metadata(EntityPath path, EntityType type, string? type_label) throws StorageError;
    public EntityType? get_type(EntityPath path) throws StorageError;
    public string? get_type_label(EntityPath path) throws StorageError;
    public bool exists(EntityPath path);
    public void delete(EntityPath path) throws StorageError;
    
    // Type index operations
    public void register_document_type(string type_label, string doc_path) throws StorageError;
    public void unregister_document_type(string type_label, string doc_path) throws StorageError;
    public Enumerable<string> get_documents_by_type(string type_label);
}

DocumentStore

Composition: PropertiesStorage

public class DocumentStore : Object {
    public DocumentStore(Dbm dbm);
    
    public void store_properties(EntityPath path, Properties properties) throws StorageError;
    public Properties? load_properties(EntityPath path) throws StorageError;
    public void delete(EntityPath path) throws StorageError;
}

ContainerStore

Composition: ChildrenStorage

public class ContainerStore : Object {
    public ContainerStore(Dbm dbm);
    
    public void add_child(EntityPath parent, string child_name) throws StorageError;
    public void remove_child(EntityPath parent, string child_name) throws StorageError;
    public bool has_child(EntityPath parent, string child_name) throws StorageError;
    public Enumerable<string> get_children(EntityPath parent) throws StorageError;
}

CategoryStore

Composition: CategoryConfigStorage + CategoryIndexStorage + ChildrenStorage

public class CategoryStore : Object {
    public CategoryStore(Dbm dbm);
    
    // Configuration
    public void store_config(EntityPath path, string type_label, string expression) throws StorageError;
    public CategoryConfig? load_config(EntityPath path) throws StorageError;
    
    // Single member operations
    public void add_member(EntityPath category_path, string doc_path) throws StorageError;
    public void remove_member(EntityPath category_path, string doc_path) throws StorageError;
    
    // Batch member operations
    public void add_members(EntityPath category_path, Enumerable<string> doc_paths) throws StorageError;
    public void remove_members(EntityPath category_path, Enumerable<string> doc_paths) throws StorageError;
    public void set_members(EntityPath category_path, Enumerable<string> doc_paths) throws StorageError;
    public Enumerable<string> get_members(EntityPath category_path);
    
    // Structural children (for when entities are created inside category)
    public void add_child(EntityPath parent, string child_name) throws StorageError;
    public void remove_child(EntityPath parent, string child_name) throws StorageError;
    public Enumerable<string> get_children(EntityPath parent) throws StorageError;
    
    // Lifecycle
    public void delete(EntityPath path) throws StorageError;
}

CatalogueStore

Composition: CatalogueConfigStorage + CatalogueIndexStorage

public class CatalogueStore : Object {
    public CatalogueStore(Dbm dbm);
    
    // Configuration
    public void store_config(EntityPath path, string type_label, string expression) throws StorageError;
    public CatalogueConfig? load_config(EntityPath path) throws StorageError;
    
    // Group operations
    public void add_to_group(EntityPath catalogue_path, string key, string doc_path) throws StorageError;
    public void remove_from_group(EntityPath catalogue_path, string key, string doc_path) throws StorageError;
    public Enumerable<string> get_group_members(EntityPath catalogue_path, string key);
    public Enumerable<string> get_group_keys(EntityPath catalogue_path);
    
    // Lifecycle
    public void delete(EntityPath path) throws StorageError;
}

IndexStore

Composition: CategoryConfigStorage + TextIndexStorage

public class IndexStore : Object {
    public IndexStore(Dbm dbm);
    
    // Configuration (reuses CategoryConfigStorage with config: prefix)
    public void store_config(EntityPath path, string type_label, string expression) throws StorageError;
    public CategoryConfig? load_config(EntityPath path) throws StorageError;
    
    // Trigram index
    public void add_trigram(EntityPath index_path, string trigram, string doc_path) throws StorageError;
    public void remove_trigram(EntityPath index_path, string trigram, string doc_path) throws StorageError;
    public Enumerable<string> get_documents_for_trigram(EntityPath index_path, string trigram);
    
    // Reverse indices
    public void add_bigram_mapping(EntityPath index_path, string bigram, string trigram) throws StorageError;
    public Enumerable<string> get_trigrams_for_bigram(EntityPath index_path, string bigram);
    public void add_unigram_mapping(EntityPath index_path, string unigram, string bigram) throws StorageError;
    public Enumerable<string> get_bigrams_for_unigram(EntityPath index_path, string unigram);
    
    // Content cache
    public void store_document_content(EntityPath index_path, string doc_path, string content) throws StorageError;
    public string? get_document_content(EntityPath index_path, string doc_path);
    public void remove_document_content(EntityPath index_path, string doc_path) throws StorageError;
    
    // Lifecycle
    public void delete(EntityPath path) throws StorageError;
}

Engine Integration

The EmbeddedEngine holds references to all entity facades:

public class EmbeddedEngine : Object, Core.Engine {
    private EntityStore _entity_store;
    private DocumentStore _document_store;
    private ContainerStore _container_store;
    private CategoryStore _category_store;
    private CatalogueStore _catalogue_store;
    private IndexStore _index_store;
    
    public EmbeddedEngine.with_path(string storage_path) {
        var dbm = new FilesystemDbm(storage_path);
        
        _entity_store = new EntityStore(dbm);
        _document_store = new DocumentStore(dbm);
        _container_store = new ContainerStore(dbm);
        _category_store = new CategoryStore(dbm);
        _catalogue_store = new CatalogueStore(dbm);
        _index_store = new IndexStore(dbm);
        
        // ... rest of initialization
    }
    
    // Public access for entity classes
    public EntityStore entity_store { get { return _entity_store; } }
    public DocumentStore document_store { get { return _document_store; } }
    public ContainerStore container_store { get { return _container_store; } }
    public CategoryStore category_store { get { return _category_store; } }
    public CatalogueStore catalogue_store { get { return _catalogue_store; } }
    public IndexStore index_store { get { return _index_store; } }
}

Key Schema Summary

Prefix Storage Class Description
entity: EntityMetadataStorage Entity type and type_label
props: PropertiesStorage Document properties
children: ChildrenStorage Structural child names
config: CategoryConfigStorage Category/Index configuration
catcfg: CatalogueConfigStorage Catalogue configuration
typeidx: TypeIndexStorage Global type → documents index
cat: CategoryIndexStorage Category members index
catl: CatalogueIndexStorage Catalogue groups and keys
idx: TextIndexStorage N-gram indices and content cache

Implementation Strategy

Since this is a greenfields project, we can implement directly without backward compatibility concerns.

Phase 1: Create Prefix Storage Classes

  1. Create EntityMetadataStorage in src/Storage/EntityMetadataStorage.vala
  2. Create PropertiesStorage in src/Storage/PropertiesStorage.vala
  3. Create ChildrenStorage in src/Storage/ChildrenStorage.vala
  4. Create CategoryConfigStorage in src/Storage/CategoryConfigStorage.vala
  5. Create CatalogueConfigStorage in src/Storage/CatalogueConfigStorage.vala
  6. Create TypeIndexStorage in src/Storage/TypeIndexStorage.vala
  7. Create CategoryIndexStorage in src/Storage/CategoryIndexStorage.vala
  8. Create CatalogueIndexStorage in src/Storage/CatalogueIndexStorage.vala
  9. Create TextIndexStorage in src/Storage/TextIndexStorage.vala

Phase 2: Create Entity Facade Classes

  1. Create EntityStore in src/Storage/EntityStore.vala
  2. Create DocumentStore in src/Storage/DocumentStore.vala
  3. Create ContainerStore in src/Storage/ContainerStore.vala
  4. Create CategoryStore in src/Storage/CategoryStore.vala
  5. Create CatalogueStore in src/Storage/CatalogueStore.vala
  6. Create IndexStore in src/Storage/IndexStore.vala

Phase 3: Update EmbeddedEngine

  1. Add with_write_transaction() helper method
  2. Add facade references for all stores
  3. Remove old Storage and IndexManager references

Phase 4: Update Entity Classes

  1. Update Container.vala to use ContainerStore and EntityStore
  2. Update Document.vala to use DocumentStore and EntityStore
  3. Update Category.vala to use CategoryStore and EntityStore
  4. Update Catalogue.vala to use CatalogueStore and EntityStore
  5. Update Index.vala to use IndexStore and EntityStore

Phase 5: Remove Old Classes

  1. Delete src/Storage/Storage.vala
  2. Delete src/Storage/IndexManager.vala

Phase 6: Update Tests

  1. Update tests/Storage/StorageTest.vala to test new storage classes
  2. Add tests for each prefix storage class
  3. Add tests for each entity facade class

Benefits of New Design

  1. Clear Separation: Each prefix store has one responsibility
  2. Composable: Facades compose stores as needed
  3. Testable: Small, focused classes are easier to test
  4. Discoverable: engine.category_store.add_member() is self-documenting
  5. Flexible: Can use low-level stores directly if needed
  6. No Broken Abstractions: No casting to concrete types
  7. Clear Key Schema: All prefixes documented in one place

Transaction Model

The new architecture uses a transaction-per-write-request model:

Behavior

Operation Type Transaction Behavior
Write operations Automatically wrapped in a transaction
Read operations No transaction (no overhead)
Hooks Run within the same transaction as the triggering write

Benefits

  1. Atomicity where needed: Create document + add to container = one transaction
  2. Hooks are atomic: Update document + category membership updates = one transaction
  3. No explicit transaction management: Caller doesn't need to think about transactions
  4. Consistent behavior: Same model works for both EmbeddedEngine and RemoteEngine
  5. No read overhead: Reads don't pay transaction cost

Implementation

EmbeddedEngine

Write operations on entities wrap their work in a transaction:

// In Container.create_child()
public Document create_child(string name) throws EntityError {
    Document? doc = null;
    _engine.with_write_transaction(() => {
        // Create entity metadata
        _engine.entity_store.store_metadata(path.child(name), EntityType.DOCUMENT, type_label);
        
        // Add to container's children
        _engine.container_store.add_child(path, name);
        
        // Create document entity
        doc = new Document(_engine, path.child(name));
        
        // Hooks run within same transaction
        _engine.hooks.run_after_create(doc);
    });
    return doc;
}

The Engine provides a helper method:

public class EmbeddedEngine : Object, Core.Engine {
    private Dbm _dbm;
    
    public void with_write_transaction(WriteTransactionDelegate delegate) throws Error {
        _dbm.with_transaction(() => delegate());
    }
}

RemoteEngine (Server Side)

The server wraps each write request in a transaction:

// In ClientHandler
void handle_create_document(Message request) {
    _engine.with_write_transaction(() => {
        // Process the entire request in one transaction
        var path = request.get_path();
        var type_label = request.get_type_label();
        
        _engine.entity_store.store_metadata(path, EntityType.DOCUMENT, type_label);
        _engine.container_store.add_child(path.parent, path.name);
        
        // Hooks run within same transaction
        var doc = new Document(_engine, path);
        _engine.hooks.run_after_create(doc);
    });
}

Examples

Creating a Document

// Single transaction covers:
// 1. Create entity metadata
// 2. Add to container's children list
// 3. Store initial properties
// 4. Run after_create hooks (e.g., add to categories)
var doc = container.create_child("my-doc");

Updating a Document

// Single transaction covers:
// 1. Update properties
// 2. Run after_update hooks (e.g., update category memberships, update indices)
doc.set_property("status", "active");

Deleting a Document

// Single transaction covers:
// 1. Remove from container's children list
// 2. Delete entity metadata
// 3. Delete properties
// 4. Run after_delete hooks (e.g., remove from categories, remove from indices)
doc.delete();

Decisions Made

  1. Caching: Deferred - no caching layer needed in the initial implementation. Can be added later if performance profiling indicates it's needed.

  2. Batch Operations: Added add_members() and remove_members() batch methods to CategoryIndexStorage and CategoryStore for efficient bulk updates during reindexing operations.

  3. Transaction Model: Transaction-per-write-request model selected. Write operations are automatically wrapped in transactions, reads have no transaction overhead, and hooks run within the same transaction as the triggering write.