# Storage Layer Migration Plan ## Overview Migrate from legacy `BasicStorage` + `IndexManager` to the new `HighLevel` + `LowLevel` architecture while preserving all performance optimizations. ## Current State Analysis ### Legacy Components (to be removed) - [`BasicStorage`](../src/Storage/Storage.vala) - High-level storage interface - [`IndexManager`](../src/Storage/IndexManager.vala) - Index operations with performance optimizations ### New Architecture (to be completed) - **LowLevel**: Single-responsibility storage classes per key prefix - **HighLevel**: Entity-specific facades composing LowLevel stores ### Performance Optimizations in IndexManager (must preserve) | Optimization | Location | Description | |--------------|----------|-------------| | HashSet dedup on load | `load_string_set()` L764-770 | Uses HashSet during deserialization to deduplicate while preserving order | | HashSet for membership checks | `add_to_ngram_index()` L407-408 | Creates HashSet for O(1) contains() checks instead of O(n) Vector.contains() | | HashSet for remove | `remove_from_ngram_index()` L428-429 | HashSet for efficient membership test before rebuild | | Batch add with HashSet | `add_to_ngram_index_batch()` L446-456 | Tracks changes, only saves if modified, uses HashSet for dedup | | Batch remove with HashSet | `remove_from_ngram_index_batch()` L463-474 | HashSet for values, rebuilds vector without matches | | Set members with dedup | `set_category_members()` L205-213 | Uses HashSet to deduplicate input enumerable | | Batch reverse index | `add_bigrams_reverse_batch()` L564-570 | Dictionary-based batch operations | | Trigram batch ops | `add_trigrams_batch()` L622-627 | Dictionary-based batch trigram operations | --- ## Migration Plan ### Phase 1: Add Batch Operations to LowLevel Classes Add the missing batch methods and HashSet optimizations to LowLevel storage classes. #### 1.1 Update `CategoryIndexStorage` **File**: [`src/Storage/LowLevel/CategoryIndexStorage.vala`](../src/Storage/LowLevel/CategoryIndexStorage.vala) **Changes**: - [ ] Add `set_members()` method with HashSet deduplication (from IndexManager L205-213) - [ ] Optimize `add_member()` to use HashSet for O(1) membership check - [ ] Optimize `remove_member()` to use HashSet for O(1) membership check **Current vs Optimized**: ```vala // Current (O(n) contains check) public void add_member(string category_path, string doc_path) throws StorageError { string key = members_key(category_path); var members = load_string_set(key); if (!members.contains(doc_path)) { // O(n) operation members.add(doc_path); save_string_set(key, members); } } // Optimized (O(1) contains check) public void add_member(string category_path, string doc_path) throws StorageError { string key = members_key(category_path); var members = load_string_set(key); var members_hash = new Invercargill.DataStructures.HashSet(); foreach (var m in members) members_hash.add(m); if (!members_hash.has(doc_path)) { // O(1) operation members.add(doc_path); save_string_set(key, members); } } ``` #### 1.2 Update `CatalogueIndexStorage` **File**: [`src/Storage/LowLevel/CatalogueIndexStorage.vala`](../src/Storage/LowLevel/CatalogueIndexStorage.vala) **Changes**: - [ ] Add `set_group_members()` method with HashSet deduplication - [ ] Optimize `add_to_group()` with HashSet membership check - [ ] Optimize `remove_from_group()` with HashSet membership check #### 1.3 Update `TextIndexStorage` (Critical - most complex) **File**: [`src/Storage/LowLevel/TextIndexStorage.vala`](../src/Storage/LowLevel/TextIndexStorage.vala) **Changes**: - [ ] Optimize `add_trigram()` with HashSet membership check (from IndexManager L404-414) - [ ] Optimize `remove_trigram()` with HashSet membership check (from IndexManager L425-440) - [ ] Add `add_trigram_batch()` method (from IndexManager L442-457) - [ ] Add `remove_trigram_batch()` method (from IndexManager L459-475) - [ ] Add `add_bigram_mapping_batch()` method (from IndexManager L564-570) - [ ] Add `add_unigram_mapping_batch()` method (from IndexManager L614-620) - [ ] Add `add_trigrams_batch()` dictionary method (from IndexManager L622-628) - [ ] Add `remove_trigrams_batch()` dictionary method (from IndexManager L630-636) - [ ] Update `load_string_set()` to use HashSet for deduplication (from IndexManager L747-779) #### 1.4 Update `TypeIndexStorage` **File**: [`src/Storage/LowLevel/TypeIndexStorage.vala`](../src/Storage/LowLevel/TypeIndexStorage.vala) **Changes**: - [ ] Optimize `add_document()` with HashSet membership check - [ ] Optimize `remove_document()` with HashSet membership check - [ ] Update `load_string_set()` to use HashSet for deduplication --- ### Phase 2: Add Batch Methods to HighLevel Stores Expose the new LowLevel batch methods through the HighLevel facades. #### 2.1 Update `CategoryStore` **File**: [`src/Storage/HighLevel/CategoryStore.vala`](../src/Storage/HighLevel/CategoryStore.vala) **Changes**: - [ ] Add `set_members()` facade method - [ ] Ensure all LowLevel optimizations are properly delegated #### 2.2 Update `CatalogueStore` **File**: [`src/Storage/HighLevel/CatalogueStore.vala`](../src/Storage/HighLevel/CatalogueStore.vala) **Changes**: - [ ] Add `set_group_members()` facade method - [ ] Ensure all LowLevel optimizations are properly delegated #### 2.3 Update `IndexStore` **File**: [`src/Storage/HighLevel/IndexStore.vala`](../src/Storage/HighLevel/IndexStore.vala) **Changes**: - [ ] Add `add_trigram_batch()` facade method - [ ] Add `remove_trigram_batch()` facade method - [ ] Add `add_bigram_mappings_batch()` facade method - [ ] Add `add_unigram_mappings_batch()` facade method - [ ] Add `add_trigrams_batch()` dictionary method - [ ] Add `remove_trigrams_batch()` dictionary method --- ### Phase 3: Migrate Entity Classes Update Category, Catalogue, and Index entities to use HighLevel stores instead of IndexManager. #### 3.1 Migrate `Category` Entity **File**: [`src/Entities/Category.vala`](../src/Entities/Category.vala) **Changes**: - [ ] Replace `get_index_manager()` calls with `get_category_store()` calls - [ ] Update `populate_index()` to use `CategoryStore.set_members()` - [ ] Update `add_document()` to use `CategoryStore.add_member()` - [ ] Update `remove_document()` to use `CategoryStore.remove_member()` - [ ] Update `contains_document()` to use `CategoryStore.get_members()` - [ ] Update `batch_update_members()` to use CategoryStore methods - [ ] Update `clear_index()` to use `CategoryStore.clear_index()` **Example migration**: ```vala // Before (using IndexManager) var index_manager = get_index_manager(); if (index_manager != null) { ((!) index_manager).add_to_category(_path.to_string(), doc_path); } // After (using CategoryStore) var store = get_category_store(); if (store != null) { ((!) store).add_member(_path, doc_path); } ``` #### 3.2 Migrate `Catalogue` Entity **File**: [`src/Entities/Catalogue.vala`](../src/Entities/Catalogue.vala) **Changes**: - [ ] Replace `get_index_manager()` calls with `get_catalogue_store()` calls - [ ] Update all index operations to use CatalogueStore methods - [ ] Update batch operations to use CatalogueStore methods #### 3.3 Migrate `Index` Entity **File**: [`src/Entities/Index.vala`](../src/Entities/Index.vala) **Changes**: - [ ] Replace `get_index_manager()` calls with `get_index_store()` calls - [ ] Update trigram operations to use IndexStore methods - [ ] Update batch operations to use IndexStore batch methods - [ ] Update search methods to use IndexStore for lookups --- ### Phase 4: Update Engine Configuration #### 4.1 Update `EngineConfiguration` **File**: [`src/Engine/EngineConfiguration.vala`](../src/Engine/EngineConfiguration.vala) **Changes**: - [ ] Remove `index_manager` property (L171) - [ ] Add store accessors or remove if handled by EmbeddedEngine #### 4.2 Update `Core.Engine` interface **File**: [`src/Core/Engine.vala`](../src/Core/Engine.vala) **Changes**: - [ ] Remove `index_manager` property (L215-216) #### 4.3 Update `EmbeddedEngine` **File**: [`src/Engine/EmbeddedEngine.vala`](../src/Engine/EmbeddedEngine.vala) **Changes**: - [ ] Remove `_index_manager` field (L67) - [ ] Remove IndexManager initialization (L189) - [ ] Remove `_configuration.index_manager` assignment (L190) - [ ] Keep HighLevel store initialization (L211-217) - [ ] Expose stores via properties (already done L121-146) --- ### Phase 5: Update Remaining BasicStorage Usage #### 5.1 Update `EngineFactory` **File**: [`src/Engine/EngineFactory.vala`](../src/Engine/EngineFactory.vala) **Changes**: - [ ] Replace `BasicStorage` with direct Dbm + HighLevel stores - [ ] Or keep BasicStorage for simple cases, use stores for indexed entities #### 5.2 Update `RemoteEngine` **File**: [`src/Engine/RemoteEngine.vala`](../src/Engine/RemoteEngine.vala) **Changes**: - [ ] Remove placeholder BasicStorage if no longer needed - [ ] Or update to use new architecture #### 5.3 Update `Server` **File**: [`src/Server/Server.vala`](../src/Server/Server.vala) **Changes**: - [ ] Update storage initialization to use new architecture --- ### Phase 6: Remove Legacy Code #### 6.1 Remove Files - [ ] Delete `src/Storage/Storage.vala` (BasicStorage class) - [ ] Delete `src/Storage/IndexManager.vala` #### 6.2 Update meson.build **File**: [`src/meson.build`](../src/meson.build) **Changes**: - [ ] Remove `'Storage/Storage.vala'` from storage_sources (L35) - [ ] Remove `'Storage/IndexManager.vala'` from storage_sources (L36) - [ ] Remove "Legacy storage (deprecated, will be removed)" comment (L34) --- ### Phase 7: Testing #### 7.1 Add Unit Tests for LowLevel Classes **File**: `tests/Storage/LowLevelStorageTest.vala` (new) - [ ] Test `CategoryIndexStorage` with HashSet optimizations - [ ] Test `CatalogueIndexStorage` with HashSet optimizations - [ ] Test `TextIndexStorage` batch operations - [ ] Test `TypeIndexStorage` with HashSet optimizations #### 7.2 Add Unit Tests for HighLevel Classes **File**: `tests/Storage/HighLevelStorageTest.vala` (new) - [ ] Test `CategoryStore` facade - [ ] Test `CatalogueStore` facade - [ ] Test `IndexStore` batch methods #### 7.3 Update Existing Tests - [ ] Update `StorageTest.vala` to remove BasicStorage tests - [ ] Run full test suite to verify no regressions #### 7.4 Performance Verification - [ ] Run performance benchmarks before migration - [ ] Run performance benchmarks after migration - [ ] Verify no performance regression --- ## Key Code Patterns to Preserve ### Pattern 1: HashSet for O(1) Membership Checks ```vala // When checking if item exists before add/remove var set_hash = new Invercargill.DataStructures.HashSet(); foreach (var item in set) set_hash.add(item); if (!set_hash.has(new_item)) { // O(1) instead of O(n) set.add(new_item); save_string_set(key, set); } ``` ### Pattern 2: HashSet Deduplication on Load ```vala // When deserializing, deduplicate while preserving order var result = new Invercargill.DataStructures.Vector(); var hash_set = new Invercargill.DataStructures.HashSet(); foreach (var item in array) { if (!item.is_null()) { string value = item.as(); if (!hash_set.has(value)) { // Prevent duplicates hash_set.add(value); result.add(value); } } } ``` ### Pattern 3: Change Tracking for Batch Operations ```vala // Only save if changes were made bool changed = false; var existing_hash = new Invercargill.DataStructures.HashSet(); foreach (var ex in existing) existing_hash.add(ex); foreach (var val in values) { if (!existing_hash.has(val)) { existing_hash.add(val); existing.add(val); changed = true; } } if (changed) save_string_set(key, existing); // Only save if modified ``` ### Pattern 4: Dictionary-based Batch Operations ```vala // Process multiple keys in batch public void add_trigrams_batch(string index_path, Invercargill.DataStructures.Dictionary> additions) throws StorageError { foreach (var trigram in additions.keys) { Invercargill.DataStructures.Vector docs; additions.try_get(trigram, out docs); add_to_ngram_index_batch(index_path, "tri", trigram, docs); } } ``` --- ## Risk Assessment | Risk | Impact | Mitigation | |------|--------|------------| | Performance regression | High | Benchmark before/after, preserve HashSet patterns | | Data corruption | Critical | Same key prefixes used, no data migration needed | | API breakage | Medium | HighLevel stores already exist, just need to use them | | Test coverage | Medium | Add tests for LowLevel classes before migration | --- ## Estimated Effort | Phase | Complexity | |-------|------------| | Phase 1: LowLevel optimizations | Medium - careful pattern copying | | Phase 2: HighLevel batch methods | Low - simple facades | | Phase 3: Entity migration | Medium - many call sites | | Phase 4: Engine configuration | Low - few changes | | Phase 5: Remaining usage | Low - few files | | Phase 6: Remove legacy | Low - delete files | | Phase 7: Testing | Medium - comprehensive testing | --- ## Execution Order 1. **Phase 1** first - ensures LowLevel classes have all optimizations 2. **Phase 2** second - exposes optimizations through facades 3. **Phase 7.1-7.2** - test new methods before using them 4. **Phase 3** - migrate entities to use new stores 5. **Phase 7.3** - run existing tests 6. **Phase 4-5** - update engine and remaining code 7. **Phase 7.4** - performance verification 8. **Phase 6** - remove legacy code last