hookmanager-batch-optimize.md 3.0 KB

HookManager Batch Optimization Plan

Problem

Even after fixing the double-processing bug, batched operations are still slower than individual inserts:

Operation Individual Batched Ratio
create_document_small 4.19ms 238.51ms per batch (23.85ms/doc) 5.7× slower
create_document_large 42.92ms 452.75ms per batch (45.27ms/doc) 1.05× slower

Root Cause Analysis

In commit_batch(), even when ALL handlers are batched handlers (Category, Catalogue, Index all implement BatchedHookHandler with supports_batch = true), the code still calls batch.execute():

public void commit_batch() {
    // Execute batch for batched handlers
    execute_batch_for_handlers((!) _current_batch);  // ← Correct: calls on_batch_change()
    
    // Also execute individual events for non-batched handlers
    ((!) _current_batch).execute(this);  // ← WASTEFUL when no non-batched handlers!
    
    _current_batch = null;
    _batch_mode = false;
}

What batch.execute() does (unnecessarily when all handlers are batched):

  1. get_consolidated_events() - Creates new Vector, Dictionary, iterates all events
  2. For each consolidated event:
    • Calls engine.get_entity_or_null() - Storage lookup!
    • Calls notify_entity_change_from_event()notify_entity_change_immediate()
    • Iterates ALL handlers just to skip them (they're all batched)
  3. execute_property_changes() - Iterates property changes, calls handlers that skip

Why This is Expensive

For10 documents with2 properties each:

  • 30 events recorded
  • 10 entity lookups from storage (expensive!)
  • 30 handler iterations (all skipped, but still iterated)

Solution

Modify commit_batch() to check if there are any non-batched handlers before calling batch.execute():

public void commit_batch() {
    if (_current_batch == null) {
        return;
    }
    
    // Execute batch for batched handlers
    execute_batch_for_handlers((!) _current_batch);
    
    // Only execute individual events if there are non-batched handlers
    if (has_non_batched_handlers()) {
        ((!) _current_batch).execute(this);
    }
    
    _current_batch = null;
    _batch_mode = false;
}

private bool has_non_batched_handlers() {
    foreach (var handler in _handlers) {
        if (!(handler is BatchedHookHandler)) {
            return true;
        }
        var batched = (BatchedHookHandler) handler;
        if (!batched.supports_batch) {
            return true;
        }
    }
    return false;
}

Expected Outcome

After fix:

  • batch.execute() is skipped entirely when all handlers support batching
  • No unnecessary entity lookups
  • No unnecessary handler iterations
  • Batched inserts should be faster than individual inserts (single transaction vs N transactions)

Verification

  1. Run tests: meson test -C builddir
  2. Run benchmarks: builddir/tools/implexus-perf/implexus-perf gdbm:///tmp/perf-test
  3. Compare batched vs individual insert times