implementation_plan.md 2.8 KB

Optimize Batch Creation Performance

The performance regression in create_documents_batch_small compared to create_document_small is due to direct writes to storage during entity creations, property updates, and child additions, completely bypassing the EmbeddedTransaction's _operations queue. Every add_child directly updates and re-serializes the entire children array of a container synchronously, causing quadratic time complexity for batch inserts in a single container.

Proposed Changes

1. src/Engine/EmbeddedEngine.vala

  • Add an internal accessor for current_transaction so that entities can queue operations:

    internal EmbeddedTransaction? current_transaction { get { return _current_transaction; } }
    

2. src/Engine/EmbeddedTransaction.vala

  • Implement apply_operation for OperationType.SET_PROPERTY to properly apply property updates, OR refactor property updates to queue a full property save (OperationType.SAVE_PROPERTIES).
  • Since Document modifies _properties in-memory and multiple set_entity_property calls would each record a property, it is optimal to let EmbeddedTransaction merge property updates.
  • During commit(), merge operations or ensure that save_properties is called only once per document.
  • Add support for Category, Catalogue, and Index creation records if needed, or expand OperationType.CREATE_ENTITY to handle them. (Currently CREATE_ENTITY metadata storage might require expressions for categories).

3. src/Entities/Container.vala

  • Modify create_container, create_document, create_category, create_catalogue, create_index, and delete_child to check if _engine is in a transaction.
  • If in a transaction, call record_create_entity and record_add_child instead of writing to storage directly.

4. src/Entities/Document.vala

  • Modify save_properties to check for an active transaction.
  • If active, defer the property save to the transaction rather than saving immediately.

5. src/Entities/AbstractEntity.vala

  • Modify delete() to use record_delete_entity and record_remove_child when inside a transaction.

Verification Plan

  1. Run implexus-perf benchmark specifically for Document to verify that create_documents_batch_small decreases from ~58ms down to effectively ~1-2ms per batch (since the DB write is deferred and batched).
  2. The benchmark will validate both performance improvements and correctness (reads/writes in the benchmark must still function correctly).