This document describes the design for indexed entity types in Implexus: Category, Catalogue, and Index. These entities use pre-built indices for fast lookups instead of scanning all documents.
The current implementation of virtual entities has performance issues:
| Entity | Current Behavior | Problem |
|---|---|---|
| Category | Scans all documents, evaluates expression per document | O(n) per query |
| Index | Scans all documents, does substring matching | O(n) per query |
| Catalogue | Does not exist | Need key-based groupings |
All indexed entities will:
graph TB
subgraph Entity Types
Container[Container - Arbitrary children]
Document[Document - Properties]
Category[Category - Predicate filter]
Catalogue[Catalogue - Key groupings]
Index[Index - Text search]
end
subgraph Category Behavior
Category -->|contains| DocCat[Documents matching predicate]
end
subgraph Catalogue Behavior
Catalogue -->|has child| Group1[Key Group: john]
Catalogue -->|has child| Group2[Key Group: jane]
Group1 -->|contains| Doc1[Post by john]
Group2 -->|contains| Doc2[Post by jane]
end
subgraph Index Behavior
Index -->|search| Result[Search Result Container]
Result -->|contains| Doc3[Matching documents]
end
!draft, status == "active", age > 18)true/posts/active with expression !draft returns all posts where draft == falseauthor, category, status)/posts/by-author with expression author:
/posts/by-author/john → all posts where author == "john"/posts/by-author/jane → all posts where author == "jane"Dictionary<TKey, List<TItem>> or GROUP BY in SQLcontent, title)*world* - contains "world"hello* - starts with "hello"*goodbye - ends with "goodbye"All index data uses scoped keys to prevent collisions between entities:
index:<entity_type>:<entity_path>:<data_type>:<key>
To efficiently find all documents of a given type, maintain a global type index:
typeidx:<type_label> → [doc_path1, doc_path2, ...]
Storage: typeidx:User → ["/users/john", "/users/jane", ...]
Category stores document paths that match the predicate:
cat:<entity_path>:members → [doc_path1, doc_path2, ...]
cat:<entity_path>:config → {type_label: "...", expression: "..."}
Example:
cat:/posts/active:members → ["/posts/post1", "/posts/post3"]
cat:/posts/active:config → {type_label: "Post", expression: "!draft"}
Catalogue stores groupings keyed by expression result:
catl:<entity_path>:group:<key_value> → [doc_path1, doc_path2, ...]
catl:<entity_path>:keys → ["key1", "key2", ...]
catl:<entity_path>:config → {type_label: "...", expression: "..."}
Example:
catl:/posts/by-author:group:john → ["/posts/post1", "/posts/post3"]
catl:/posts/by-author:group:jane → ["/posts/post2"]
catl:/posts/by-author:keys → ["john", "jane"]
catl:/posts/by-author:config → {type_label: "Post", expression: "author"}
The Index uses a hierarchical n-gram structure for arbitrary-length pattern matching:
idx:<entity_path>:tri:<trigram> → [doc_path1, doc_path2, ...]
Example: Text "hello world" produces trigrams: hel, ell, llo, lo, o w, wo, wor, orl, rld
idx:/posts/search:tri:hel → ["/posts/post1", "/posts/post2"]
idx:/posts/search:tri:ell → ["/posts/post1"]
...
Maps bigrams to trigrams containing them (for 2-character queries):
idx:<entity_path>:bi:<bigram> → [trigram1, trigram2, ...]
Example:
idx:/posts/search:bi:he → ["hel"]
idx:/posts/search:bi:el → ["hel", "ell"]
idx:/posts/search:bi:ll → ["ell", "llo"]
Maps characters to bigrams starting with them (for 1-character queries):
idx:<entity_path>:uni:<char> → [bigram1, bigram2, ...]
Example:
idx:/posts/search:uni:h → ["he"]
idx:/posts/search:uni:e → ["el"]
Stores the indexed field value for each document (used for verification and rebuilds):
idx:<entity_path>:doc:<doc_path> → "indexed field value"
idx:<entity_path>:config → {type_label: "...", expression: "...", case_sensitive: false}
| Index Type | Entries Per Doc | Stores | Relative Size |
|---|---|---|---|
| Trigram→Docs | O(L) where L = text length | Doc paths | Primary cost |
| Bigram→Trigrams | O(L) | Strings only | Small |
| Unigram→Bigrams | O(unique chars) | Strings only | Tiny |
| Doc Content | 1 | Field value | Moderate |
Example: For 10,000 documents averaging 1KB text:
The reverse indexes have minimal overhead since they store short strings, not document paths.
INPUT: Category at path P
OUTPUT: All documents matching predicate
1. Load config to get type_label and expression
2. Lookup cat:P:members → [doc_paths]
3. For each doc_path, instantiate Document entity
4. Return documents
Complexity: O(k) where k = number of matching documents
INPUT: Catalogue at path P, optional key K
OUTPUT: If K provided: documents with that key value
Otherwise: list of available keys
# Get specific group
get_child(K):
1. Lookup catl:P:group:K → [doc_paths]
2. Return VirtualContainer with documents
# List available keys
child_names:
1. Lookup catl:P:keys → [key1, key2, ...]
2. Return keys
Complexity: O(1) for key lookup, O(k) for listing groups
INPUT: Index at path P, search pattern S
OUTPUT: Documents matching pattern
parse_pattern(S):
if S starts with * and ends with *:
return CONTAINS(S[1:-1])
if S starts with *:
return ENDS_WITH(S[1:])
if S ends with *:
return STARTS_WITH(S[:-1])
return EXACT(S)
search(pattern):
terms = parse_pattern(pattern)
if terms.type == CONTAINS and len(terms.value) >= 3:
# Use trigram index
trigrams = extract_trigrams(terms.value)
doc_sets = [lookup idx:P:tri:t for t in trigrams]
candidates = intersect(doc_sets)
return [d for d in candidates if d contains terms.value]
if terms.type == CONTAINS and len(terms.value) == 2:
# Use bigram→trigram→docs
bigram = terms.value
trigrams = lookup idx:P:bi:bigram
doc_sets = [lookup idx:P:tri:t for t in trigrams]
candidates = union(doc_sets)
return [d for d in candidates if d contains terms.value]
if terms.type == CONTAINS and len(terms.value) == 1:
# Use unigram→bigram→trigram→docs
char = terms.value
bigrams = lookup idx:P:uni:char
trigrams = union([lookup idx:P:bi:b for b in bigrams])
doc_sets = [lookup idx:P:tri:t for t in trigrams]
candidates = union(doc_sets)
return [d for d in candidates if d contains terms.value]
if terms.type == STARTS_WITH:
# Prefix search using first trigram
prefix = terms.value
first_tri = prefix[0:3]
candidates = lookup idx:P:tri:first_tri
return [d for d in candidates if d starts_with prefix]
if terms.type == ENDS_WITH:
# Suffix search using last trigram
suffix = terms.value
last_tri = suffix[-3:]
candidates = lookup idx:P:tri:last_tri
return [d for d in candidates if d ends_with suffix]
Complexity:
The hook system notifies indexed entities when documents are created, modified, or deleted so they can update their indices.
sequenceDiagram
participant App as Application
participant Engine as Engine
participant Doc as Document
participant Hook as HookManager
participant Cat as Category
participant Catl as Catalogue
participant Idx as Index
App->>Engine: create_document with type_label
Engine->>Doc: create
Engine->>Hook: entity_created(doc)
Hook->>Hook: find_interested_entities(type_label)
loop For each interested entity
Hook->>Cat: on_document_created(doc)
Hook->>Catl: on_document_created(doc)
Hook->>Idx: on_document_created(doc)
end
Indexed entities register interest in specific type labels:
interface IndexHook {
// The type label this entity is interested in
public abstract string type_label { get; }
// The entity path for index scoping
public abstract EntityPath entity_path { get; }
// Called when a document is created
public abstract void on_document_created(Entity doc) throws Error;
// Called when a document is modified
public abstract void on_document_modified(Entity doc, Set<string> changed_properties) throws Error;
// Called when a document is deleted
public abstract void on_document_deleted(EntityPath doc_path) throws Error;
}
class HookManager {
// type_label → list of registered hooks
private Dictionary<string, List<IndexHook>> _hooks_by_type;
// Register a hook for a type label
public void register_hook(IndexHook hook) {
var hooks = _hooks_by_type.get_or_default(hook.type_label, new List());
hooks.add(hook);
}
// Unregister a hook
public void unregister_hook(IndexHook hook) {
var hooks = _hooks_by_type.get(hook.type_label);
if (hooks != null) {
hooks.remove(hook);
}
}
// Notify hooks of document creation
public void notify_created(Entity doc) {
var hooks = _hooks_by_type.get(doc.type_label);
if (hooks != null) {
foreach (var hook in hooks) {
hook.on_document_created(doc);
}
}
}
// Notify hooks of document modification
public void notify_modified(Entity doc, Set<string> changed_properties) {
var hooks = _hooks_by_type.get(doc.type_label);
if (hooks != null) {
foreach (var hook in hooks) {
hook.on_document_modified(doc, changed_properties);
}
}
}
// Notify hooks of document deletion
public void notify_deleted(EntityPath doc_path, string type_label) {
var hooks = _hooks_by_type.get(type_label);
if (hooks != null) {
foreach (var hook in hooks) {
hook.on_document_deleted(doc_path);
}
}
}
}
The Engine emits signals that the HookManager listens to:
class EmbeddedEngine : Object, Engine {
private HookManager _hooks;
// Connect signals to hook manager
construct {
_hooks = new HookManager();
this.entity_created.connect((entity) => {
if (entity.entity_type == EntityType.DOCUMENT) {
_hooks.notify_created(entity);
}
});
this.entity_modified.connect((entity) => {
if (entity.entity_type == EntityType.DOCUMENT) {
_hooks.notify_modified(entity, entity.modified_properties);
}
});
this.entity_deleted.connect((path, type_label) => {
_hooks.notify_deleted(path, type_label);
});
}
}
class Category : AbstractEntity, IndexHook {
public void on_document_created(Entity doc) {
update_index_for_document(doc, true);
}
public void on_document_modified(Entity doc, Set<string> changed_properties) {
// Re-evaluate predicate and update index
update_index_for_document(doc, evaluate_predicate(doc));
}
public void on_document_deleted(EntityPath doc_path) {
remove_from_index(doc_path);
}
private void update_index_for_document(Entity doc, bool should_include) {
var members_key = "cat:%s:members".printf(_path.to_string());
var current_members = load_member_set(members_key);
if (should_include) {
current_members.add(doc.path.to_string());
} else {
current_members.remove(doc.path.to_string());
}
save_member_set(members_key, current_members);
}
private bool evaluate_predicate(Entity doc) {
// Evaluate the boolean expression against the document
return ExpressionEvaluator.evaluate_boolean(_expression, doc.properties);
}
}
class Catalogue : AbstractEntity, IndexHook {
public void on_document_created(Entity doc) {
update_index_for_document(doc);
}
public void on_document_modified(Entity doc, Set<string> changed_properties) {
// Check if the grouped expression changed
if (changed_properties.contains(_expression)) {
// Get old value and remove from old group
var old_value = get_old_value(doc);
remove_from_group(old_value, doc.path);
// Add to new group
update_index_for_document(doc);
}
}
public void on_document_deleted(EntityPath doc_path) {
// Need to look up the value before removing
var value = get_indexed_value(doc_path);
remove_from_group(value, doc_path);
}
private void update_index_for_document(Entity doc) {
var value = evaluate_expression(doc);
if (value != null) {
add_to_group(value, doc.path);
}
}
private void add_to_group(string key, EntityPath doc_path) {
var group_key = "catl:%s:group:%s".printf(_path.to_string(), key);
var members = load_member_set(group_key);
members.add(doc_path.to_string());
save_member_set(group_key, members);
// Also update keys list if this is a new key
var keys_key = "catl:%s:keys".printf(_path.to_string());
var keys = load_key_set(keys_key);
if (!keys.contains(key)) {
keys.add(key);
save_key_set(keys_key, keys);
}
}
}
class Index : AbstractEntity, IndexHook {
private bool _case_sensitive;
public void on_document_created(Entity doc) {
index_document(doc);
}
public void on_document_modified(Entity doc, Set<string> changed_properties) {
if (changed_properties.contains(_expression)) {
// Re-index with new content
reindex_document(doc);
}
}
public void on_document_deleted(EntityPath doc_path) {
unindex_document(doc_path);
}
private void index_document(Entity doc) {
var content = get_field_value(doc);
if (content == null) return;
var normalized = _case_sensitive ? content : content.down();
var doc_path_str = doc.path.to_string();
// Store content cache
var cache_key = "idx:%s:doc:%s".printf(_path.to_string(), doc_path_str);
storage.set(cache_key, normalized);
// Index trigrams
var trigrams = extract_trigrams(normalized);
foreach (var tri in trigrams) {
add_to_trigram_index(tri, doc_path_str);
}
// Update bigram reverse index
var bigrams = extract_bigrams(normalized);
foreach (var bi in bigrams) {
add_to_bigram_reverse(bi, trigrams);
}
// Update unigram reverse index
var unigrams = extract_unique_chars(normalized);
foreach (var uni in unigrams) {
add_to_unigram_reverse(uni, bigrams);
}
}
private void unindex_document(EntityPath doc_path) {
var doc_path_str = doc_path.to_string();
var cache_key = "idx:%s:doc:%s".printf(_path.to_string(), doc_path_str);
// Get content to unindex
var content = storage.get(cache_key);
if (content == null) return;
var trigrams = extract_trigrams(content);
foreach (var tri in trigrams) {
remove_from_trigram_index(tri, doc_path_str);
}
// Note: We don't remove from reverse indexes as they're just string mappings
// and don't reference specific documents
storage.delete(cache_key);
}
private void reindex_document(Entity doc) {
unindex_document(doc.path);
index_document(doc);
}
}
Add CATALOGUE to the entity type enumeration:
public enum EntityType {
CONTAINER,
DOCUMENT,
CATEGORY,
CATALOGUE, // NEW
INDEX;
public bool is_indexed() {
return this == CATEGORY || this == CATALOGUE || this == INDEX;
}
}
public interface Entity : Object {
// ... existing members ...
// New method for indexed entities
public abstract void rebuild_index() throws EngineError;
// Configuration for case sensitivity (Index only)
public abstract bool case_sensitive { get; }
}
Add method to create Catalogue:
public interface Entity : Object {
// ... existing methods ...
// New factory method for Catalogue
public abstract Entity? create_catalogue(
string name,
string type_label,
string expression
) throws EngineError;
// Index creation with options
public abstract Entity? create_index_with_options(
string name,
string type_label,
string expression,
IndexOptions options
) throws EngineError;
}
public class IndexOptions : Object {
public bool case_sensitive { get; set; default = false; }
}
public interface Engine : Object {
// ... existing members ...
// Get all indexed entities for a type label
public abstract Invercargill.Enumerable<Entity> get_indexes_for_type(string type_label);
// Rebuild all indices (maintenance operation)
public abstract void rebuild_all_indices() throws EngineError;
// New signal with type_label for deleted entities
public signal void entity_deleted(EntityPath path, string type_label);
}
public interface Storage : Object {
// ... existing methods ...
// Index operations
public abstract void add_to_index(string index_key, string doc_path) throws StorageError;
public abstract void remove_from_index(string index_key, string doc_path) throws StorageError;
public abstract Invercargill.Enumerable<string> get_index_members(string index_key) throws StorageError;
// Type index operations
public abstract void register_document_type(EntityPath doc_path, string type_label) throws StorageError;
public abstract void unregister_document_type(EntityPath doc_path, string type_label) throws StorageError;
public abstract Invercargill.Enumerable<EntityPath> get_documents_by_type(string type_label) throws StorageError;
// Catalogue operations
public abstract void catalogue_add_to_group(EntityPath catalogue_path, string key, EntityPath doc_path) throws StorageError;
public abstract void catalogue_remove_from_group(EntityPath catalogue_path, string key, EntityPath doc_path) throws StorageError;
public abstract Invercargill.Enumerable<string> catalogue_get_keys(EntityPath catalogue_path) throws StorageError;
public abstract Invercargill.Enumerable<EntityPath> catalogue_get_group(EntityPath catalogue_path, string key) throws StorageError;
// N-gram index operations
public abstract void ngram_add_trigram(EntityPath index_path, string trigram, EntityPath doc_path) throws StorageError;
public abstract void ngram_remove_trigram(EntityPath index_path, string trigram, EntityPath doc_path) throws StorageError;
public abstract Invercargill.Enumerable<EntityPath> ngram_get_documents(EntityPath index_path, string trigram) throws StorageError;
public abstract void ngram_add_bigram_reverse(EntityPath index_path, string bigram, string trigram) throws StorageError;
public abstract Invercargill.Enumerable<string> ngram_get_trigrams_for_bigram(EntityPath index_path, string bigram) throws StorageError;
public abstract void ngram_add_unigram_reverse(EntityPath index_path, char unigram, string bigram) throws StorageError;
public abstract Invercargill.Enumerable<string> ngram_get_bigrams_for_unigram(EntityPath index_path, char unigram) throws StorageError;
}
When an indexed entity is created, it must build its index from existing documents:
public override Entity? create_category(string name, string type_label, string expression) throws EngineError {
// ... create entity metadata ...
var category = new Category(_engine, child_path, type_label, expression);
// Build initial index
category.build_initial_index();
// Register with hook manager
_engine.hooks.register_hook(category);
_engine.entity_created(category);
return category;
}
The expression system needs to support:
!draft, status == "active", age > 18author, category, status.yearcontent, titleConsider using or extending Invercargill's expression system if available.
Index updates should be atomic with document changes:
public void set_entity_property(string name, Element value) throws EngineError {
var tx = _engine.begin_transaction();
try {
// Update property
_properties.set(name, value);
save_properties();
// Notify hooks (which update indices)
_engine.notify_modified(this, new Set.from_values(name));
tx.commit();
} catch (Error e) {
tx.rollback();
throw e;
}
}
public enum IndexError {
// Expression parsing failed
EXPRESSION_ERROR,
// Index corruption detected
CORRUPT_INDEX,
// Index build failed
BUILD_FAILED,
// Hook registration failed
HOOK_ERROR;
}
| Entity | Purpose | Index Type | Query Complexity |
|---|---|---|---|
| Container | Arbitrary children | None (explicit) | O(k) |
| Document | Properties | None | O(1) |
| Category | Boolean filter | Member set | O(k) |
| Catalogue | Key groupings | Key→members map | O(1) per key |
| Index | Text search | Hierarchical n-gram | O(k) |
Where k = number of results/candidates, not total documents.