storage-migration-plan.md 13 KB

Storage Layer Migration Plan

Overview

Migrate from legacy BasicStorage + IndexManager to the new HighLevel + LowLevel architecture while preserving all performance optimizations.

Current State Analysis

Legacy Components (to be removed)

New Architecture (to be completed)

  • LowLevel: Single-responsibility storage classes per key prefix
  • HighLevel: Entity-specific facades composing LowLevel stores

Performance Optimizations in IndexManager (must preserve)

Optimization Location Description
HashSet dedup on load load_string_set() L764-770 Uses HashSet during deserialization to deduplicate while preserving order
HashSet for membership checks add_to_ngram_index() L407-408 Creates HashSet for O(1) contains() checks instead of O(n) Vector.contains()
HashSet for remove remove_from_ngram_index() L428-429 HashSet for efficient membership test before rebuild
Batch add with HashSet add_to_ngram_index_batch() L446-456 Tracks changes, only saves if modified, uses HashSet for dedup
Batch remove with HashSet remove_from_ngram_index_batch() L463-474 HashSet for values, rebuilds vector without matches
Set members with dedup set_category_members() L205-213 Uses HashSet to deduplicate input enumerable
Batch reverse index add_bigrams_reverse_batch() L564-570 Dictionary-based batch operations
Trigram batch ops add_trigrams_batch() L622-627 Dictionary-based batch trigram operations

Migration Plan

Phase 1: Add Batch Operations to LowLevel Classes

Add the missing batch methods and HashSet optimizations to LowLevel storage classes.

1.1 Update CategoryIndexStorage

File: src/Storage/LowLevel/CategoryIndexStorage.vala

Changes:

  • Add set_members() method with HashSet deduplication (from IndexManager L205-213)
  • Optimize add_member() to use HashSet for O(1) membership check
  • Optimize remove_member() to use HashSet for O(1) membership check

Current vs Optimized:

// Current (O(n) contains check)
public void add_member(string category_path, string doc_path) throws StorageError {
    string key = members_key(category_path);
    var members = load_string_set(key);
    if (!members.contains(doc_path)) {  // O(n) operation
        members.add(doc_path);
        save_string_set(key, members);
    }
}

// Optimized (O(1) contains check)
public void add_member(string category_path, string doc_path) throws StorageError {
    string key = members_key(category_path);
    var members = load_string_set(key);
    var members_hash = new Invercargill.DataStructures.HashSet<string>();
    foreach (var m in members) members_hash.add(m);
    
    if (!members_hash.has(doc_path)) {  // O(1) operation
        members.add(doc_path);
        save_string_set(key, members);
    }
}

1.2 Update CatalogueIndexStorage

File: src/Storage/LowLevel/CatalogueIndexStorage.vala

Changes:

  • Add set_group_members() method with HashSet deduplication
  • Optimize add_to_group() with HashSet membership check
  • Optimize remove_from_group() with HashSet membership check

1.3 Update TextIndexStorage (Critical - most complex)

File: src/Storage/LowLevel/TextIndexStorage.vala

Changes:

  • Optimize add_trigram() with HashSet membership check (from IndexManager L404-414)
  • Optimize remove_trigram() with HashSet membership check (from IndexManager L425-440)
  • Add add_trigram_batch() method (from IndexManager L442-457)
  • Add remove_trigram_batch() method (from IndexManager L459-475)
  • Add add_bigram_mapping_batch() method (from IndexManager L564-570)
  • Add add_unigram_mapping_batch() method (from IndexManager L614-620)
  • Add add_trigrams_batch() dictionary method (from IndexManager L622-628)
  • Add remove_trigrams_batch() dictionary method (from IndexManager L630-636)
  • Update load_string_set() to use HashSet for deduplication (from IndexManager L747-779)

1.4 Update TypeIndexStorage

File: src/Storage/LowLevel/TypeIndexStorage.vala

Changes:

  • Optimize add_document() with HashSet membership check
  • Optimize remove_document() with HashSet membership check
  • Update load_string_set() to use HashSet for deduplication

Phase 2: Add Batch Methods to HighLevel Stores

Expose the new LowLevel batch methods through the HighLevel facades.

2.1 Update CategoryStore

File: src/Storage/HighLevel/CategoryStore.vala

Changes:

  • Add set_members() facade method
  • Ensure all LowLevel optimizations are properly delegated

2.2 Update CatalogueStore

File: src/Storage/HighLevel/CatalogueStore.vala

Changes:

  • Add set_group_members() facade method
  • Ensure all LowLevel optimizations are properly delegated

2.3 Update IndexStore

File: src/Storage/HighLevel/IndexStore.vala

Changes:

  • Add add_trigram_batch() facade method
  • Add remove_trigram_batch() facade method
  • Add add_bigram_mappings_batch() facade method
  • Add add_unigram_mappings_batch() facade method
  • Add add_trigrams_batch() dictionary method
  • Add remove_trigrams_batch() dictionary method

Phase 3: Migrate Entity Classes

Update Category, Catalogue, and Index entities to use HighLevel stores instead of IndexManager.

3.1 Migrate Category Entity

File: src/Entities/Category.vala

Changes:

  • Replace get_index_manager() calls with get_category_store() calls
  • Update populate_index() to use CategoryStore.set_members()
  • Update add_document() to use CategoryStore.add_member()
  • Update remove_document() to use CategoryStore.remove_member()
  • Update contains_document() to use CategoryStore.get_members()
  • Update batch_update_members() to use CategoryStore methods
  • Update clear_index() to use CategoryStore.clear_index()

Example migration:

// Before (using IndexManager)
var index_manager = get_index_manager();
if (index_manager != null) {
    ((!) index_manager).add_to_category(_path.to_string(), doc_path);
}

// After (using CategoryStore)
var store = get_category_store();
if (store != null) {
    ((!) store).add_member(_path, doc_path);
}

3.2 Migrate Catalogue Entity

File: src/Entities/Catalogue.vala

Changes:

  • Replace get_index_manager() calls with get_catalogue_store() calls
  • Update all index operations to use CatalogueStore methods
  • Update batch operations to use CatalogueStore methods

3.3 Migrate Index Entity

File: src/Entities/Index.vala

Changes:

  • Replace get_index_manager() calls with get_index_store() calls
  • Update trigram operations to use IndexStore methods
  • Update batch operations to use IndexStore batch methods
  • Update search methods to use IndexStore for lookups

Phase 4: Update Engine Configuration

4.1 Update EngineConfiguration

File: src/Engine/EngineConfiguration.vala

Changes:

  • Remove index_manager property (L171)
  • Add store accessors or remove if handled by EmbeddedEngine

4.2 Update Core.Engine interface

File: src/Core/Engine.vala

Changes:

  • Remove index_manager property (L215-216)

4.3 Update EmbeddedEngine

File: src/Engine/EmbeddedEngine.vala

Changes:

  • Remove _index_manager field (L67)
  • Remove IndexManager initialization (L189)
  • Remove _configuration.index_manager assignment (L190)
  • Keep HighLevel store initialization (L211-217)
  • Expose stores via properties (already done L121-146)

Phase 5: Update Remaining BasicStorage Usage

5.1 Update EngineFactory

File: src/Engine/EngineFactory.vala

Changes:

  • Replace BasicStorage with direct Dbm + HighLevel stores
  • Or keep BasicStorage for simple cases, use stores for indexed entities

5.2 Update RemoteEngine

File: src/Engine/RemoteEngine.vala

Changes:

  • Remove placeholder BasicStorage if no longer needed
  • Or update to use new architecture

5.3 Update Server

File: src/Server/Server.vala

Changes:

  • Update storage initialization to use new architecture

Phase 6: Remove Legacy Code

6.1 Remove Files

  • Delete src/Storage/Storage.vala (BasicStorage class)
  • Delete src/Storage/IndexManager.vala

6.2 Update meson.build

File: src/meson.build

Changes:

  • Remove 'Storage/Storage.vala' from storage_sources (L35)
  • Remove 'Storage/IndexManager.vala' from storage_sources (L36)
  • Remove "Legacy storage (deprecated, will be removed)" comment (L34)

Phase 7: Testing

7.1 Add Unit Tests for LowLevel Classes

File: tests/Storage/LowLevelStorageTest.vala (new)

  • Test CategoryIndexStorage with HashSet optimizations
  • Test CatalogueIndexStorage with HashSet optimizations
  • Test TextIndexStorage batch operations
  • Test TypeIndexStorage with HashSet optimizations

7.2 Add Unit Tests for HighLevel Classes

File: tests/Storage/HighLevelStorageTest.vala (new)

  • Test CategoryStore facade
  • Test CatalogueStore facade
  • Test IndexStore batch methods

7.3 Update Existing Tests

  • Update StorageTest.vala to remove BasicStorage tests
  • Run full test suite to verify no regressions

7.4 Performance Verification

  • Run performance benchmarks before migration
  • Run performance benchmarks after migration
  • Verify no performance regression

Key Code Patterns to Preserve

Pattern 1: HashSet for O(1) Membership Checks

// When checking if item exists before add/remove
var set_hash = new Invercargill.DataStructures.HashSet<string>();
foreach (var item in set) set_hash.add(item);

if (!set_hash.has(new_item)) {  // O(1) instead of O(n)
    set.add(new_item);
    save_string_set(key, set);
}

Pattern 2: HashSet Deduplication on Load

// When deserializing, deduplicate while preserving order
var result = new Invercargill.DataStructures.Vector<string>();
var hash_set = new Invercargill.DataStructures.HashSet<string>();

foreach (var item in array) {
    if (!item.is_null()) {
        string value = item.as<string>();
        if (!hash_set.has(value)) {  // Prevent duplicates
            hash_set.add(value);
            result.add(value);
        }
    }
}

Pattern 3: Change Tracking for Batch Operations

// Only save if changes were made
bool changed = false;
var existing_hash = new Invercargill.DataStructures.HashSet<string>();
foreach (var ex in existing) existing_hash.add(ex);

foreach (var val in values) {
    if (!existing_hash.has(val)) {
        existing_hash.add(val);
        existing.add(val);
        changed = true;
    }
}
if (changed) save_string_set(key, existing);  // Only save if modified

Pattern 4: Dictionary-based Batch Operations

// Process multiple keys in batch
public void add_trigrams_batch(string index_path, 
    Invercargill.DataStructures.Dictionary<string, Invercargill.DataStructures.Vector<string>> additions) 
    throws StorageError {
    foreach (var trigram in additions.keys) {
        Invercargill.DataStructures.Vector<string> docs;
        additions.try_get(trigram, out docs);
        add_to_ngram_index_batch(index_path, "tri", trigram, docs);
    }
}

Risk Assessment

Risk Impact Mitigation
Performance regression High Benchmark before/after, preserve HashSet patterns
Data corruption Critical Same key prefixes used, no data migration needed
API breakage Medium HighLevel stores already exist, just need to use them
Test coverage Medium Add tests for LowLevel classes before migration

Estimated Effort

Phase Complexity
Phase 1: LowLevel optimizations Medium - careful pattern copying
Phase 2: HighLevel batch methods Low - simple facades
Phase 3: Entity migration Medium - many call sites
Phase 4: Engine configuration Low - few changes
Phase 5: Remaining usage Low - few files
Phase 6: Remove legacy Low - delete files
Phase 7: Testing Medium - comprehensive testing

Execution Order

  1. Phase 1 first - ensures LowLevel classes have all optimizations
  2. Phase 2 second - exposes optimizations through facades
  3. Phase 7.1-7.2 - test new methods before using them
  4. Phase 3 - migrate entities to use new stores
  5. Phase 7.3 - run existing tests
  6. Phase 4-5 - update engine and remaining code
  7. Phase 7.4 - performance verification
  8. Phase 6 - remove legacy code last