phase-4-projections-and-joins.md 20 KB

Phase 4: Projections and Joins

Overview

This phase introduces Projections - a new concept for read-only, composable query shapes that can join multiple entities and nested projections together.

Key Concepts

Concept Description
Entity Read/write, table-mapped, full CRUD operations
Projection Read-only, composable, join-aware query shape

Important: Invercargill Expressions

All expression parameters throughout this API use Invercargill.Expressions syntax, NOT raw SQL. These are expression strings that the system parses, analyzes, and translates to SQL. This allows:

  • Type-safe analysis of expression structure
  • Variable scoping and translation
  • Aggregate detection for WHERE/HAVING splitting
  • Cross-dialect compatibility

    // ✅ Correct: Invercargill expression syntax
    .select<int64>("user_id", "r.id", ...)
    .where("user_id > 100 && user_name contains 'admin'")
    
    // ❌ Incorrect: Raw SQL (not supported)
    .select<int64>("user_id", "users.id AS user_id", ...)
    .where("user_id > 100 AND user_name LIKE '%admin%'")
    

API Design

Projection Registration

// Register a projection with the session
session.register_projection<UserOrderStats>(p => p
    .source<User>("r")
    .join<Order>("o", "r.id == o.user_id")
    .group_by("r.id")
    .select<int64>("user_id", "r.id", (x, v) => x.user_id = v)
    .select<string>("user_name", "r.name", (x, v) => x.user_name = v)
    .select<int64>("order_count", "COUNT(o.id)", (x, v) => x.order_count = v)
    .select<double>("total_spent", "SUM(o.total)", (x, v) => x.total_spent = v)
);

Projection Query

// Query projections using the same API as entities
var results = session.query<UserOrderStats>()
    .where("user_id > 100")
    .or_where("order_count >= 5")
    .order_by_desc("total_spent")
    .limit(10)
    .materialise_async();

Builder Methods

source<T>(string variable_name)

Defines the primary entity source and declares a variable for use in expressions.

.source<User>("r")  // r refers to User entity

join<T>(string variable_name, string join_expression)

Joins another entity with a condition expression.

.join<Order>("o", "r.id == o.user_id")

Multiple joins to same entity type are allowed:

.join<User>("sender", "t.sender_id == sender.id")
.join<User>("receiver", "t.receiver_id == receiver.id")

select<T>(string friendly_name, string expression, PropertySetter setter)

Selects a scalar value with a friendly name for use in where/order_by expressions.

Parameter Description
friendly_name Name used in where() and order_by() expressions
expression Invercargill expression using declared variables (e.g., "r.id", "COUNT(o.id)")
setter Function to set the value on the result object
.select<int64>("user_id", "r.id", (x, v) => x.user_id = v)
.select<string>("user_name", "r.name", (x, v) => x.user_name = v)
.select<int64>("order_count", "COUNT(o.id)", (x, v) => x.order_count = v)

select<TProjection>(string friendly_name, string entry_point_expression, PropertySetter setter)

Selects a nested projection for 1:1 relationships.

Parameter Description
friendly_name Name used for nested property navigation in where() expressions
entry_point_expression Invercargill expression selecting the variable that provides the source type for the nested projection (e.g., "pr" if pr is a Profile and the nested projection's source is Profile)
setter Function to set the nested projection on the result object
session.register_projection<ProfileSummary>(p => p
    .source<Profile>("p")
    .select<int64>("id", "p.id", (x, v) => x.id = v)
    .select<string>("bio", "p.bio", (x, v) => x.bio = v)
);

session.register_projection<UserWithProfile>(p => p
    .source<User>("u")
    .join<Profile>("pr", "u.profile_id == pr.id")
    .select<int64>("id", "u.id", (x, v) => x.id = v)
    .select<ProfileSummary>("profile", "pr", (x, v) => x.profile = v)
    // entry_point "pr" must be of type Profile (matches ProfileSummary's source)
);

// Query with nested navigation
var results = session.query<UserWithProfile>()
    .where("profile.id == 5")  // Navigate into nested projection
    .materialise_async();

select_many<TProjection>(string friendly_name, string entry_point_expression, PropertySetter setter)

Selects a collection of nested projections for 1:N relationships.

Parameter Description
friendly_name Name used for nested collection property navigation
entry_point_expression Invercargill expression selecting the variable that provides the source type for the nested projection (e.g., "o" if o is an Order and the nested projection's source is Order)
setter Function to set the collection on the result object
// Child projection
session.register_projection<OrderSummary>(p => p
    .source<Order>("r")
    .join<OrderItem>("oi", "r.id == oi.order_id")
    .group_by("r.id")
    .select<int64>("order_id", "r.id", (x, v) => x.order_id = v)
    .select<double>("total", "SUM(oi.price)", (x, v) => x.total = v)
);

// Parent projection with collection
session.register_projection<UserWithOrders>(p => p
    .source<User>("r")
    .join<Order>("o", "r.id == o.user_id")
    .select<int64>("user_id", "r.id", (x, v) => x.user_id = v)
    .select_many<OrderSummary>("orders", "o", (x, v) => x.orders = v)
    // entry_point "o" must be of type Order (matches OrderSummary's source)
);

// Query
var results = session.query<UserWithOrders>()
    .where("user_id == 55")
    .materialise_async();

group_by(params string[] expressions)

Explicitly declares GROUP BY columns. Required for aggregate queries.

.group_by("r.id", "r.name")

Variable Scoping and Translation

Per-Projection Scope

Variable names are scoped to each projection. The same variable name can map to different types in different projections.

// OrderSummary: r → Order, o → OrderItem
session.register_projection<OrderSummary>(p => p
    .source<Order>("r")
    .join<OrderItem>("o", "r.id == o.order_id")
    .select<int64>("order_id", "r.id", ...)
);

// UserWithOrders: r → User, o → Order
session.register_projection<UserWithOrders>(p => p
    .source<User>("r")
    .join<Order>("o", "r.id == o.user_id")
    .select_many<OrderSummary>("o", ...)  // Entry point type must match child's source
);

Type-Based Translation

When nesting projections, translation happens by TYPE lookup, not variable name:

flowchart LR
    subgraph Child - OrderSummary
        C_Map["Order → r<br/>OrderItem → o"]
    end
    
    subgraph Parent - UserWithOrders
        P_Map["User → r<br/>Order → o"]
    end
    
    subgraph Translation
        T["Child needs Order<br/>Parent's Order is o<br/>Translate: r → o"]
    end
    
    C_Map --> T
    P_Map --> T

Generated SQL Aliases

SQL uses auto-generated aliases to prevent collisions. The format includes the type name for debugging clarity:

Format: val_N_TypeName where N is incremental and TypeName is typeof(T).name()

// UserWithOrders projection
session.register_projection<UserWithOrders>(p => p
    .source<User>("r")
    .join<Order>("o", "r.id == o.user_id")
    .select<int64>("user_id", "r.id", ...)
    .select_many<OrderSummary>("o", ...)
);

// Generated SQL (with debug-friendly aliases):
// SELECT val_1_User.id, val_2_Order.id, SUM(val_4_OrderItem.price)
// FROM users val_1_User
// JOIN orders val_2_Order ON val_1_User.id = val_2_Order.user_id
// JOIN orders val_3_Order ON val_2_Order.id = val_3_Order.id
// JOIN order_items val_4_OrderItem ON val_3_Order.id = val_4_OrderItem.order_id
// GROUP BY val_1_User.id, val_3_Order.id

This makes debugging queries much easier - you can immediately see which table each alias refers to.

Auto-Join for Nested Projections

When using select_many, the child projection's additional joins are automatically added:

flowchart TB
    subgraph Parent
        PSrc["source User - r"]
        PJoin["join Order - o"]
        PSM["select_many OrderSummary - o"]
    end
    
    subgraph Child - OrderSummary
        CSrc["source Order - r"]
        CJoin["join OrderItem - o"]
    end
    
    subgraph Generated SQL
        V1["val1 = users"]
        V2["val2 = orders"]
        V3["val3 = orders - for child"]
        V4["val4 = order_items - auto-added"]
    end
    
    PSrc --> V1
    PJoin --> V2
    PSM --> V3
    CJoin --> V4

Entry Point Type Check

The entry point variable type must match the child projection's source type:

// ✅ Valid: "o" is Order, child's source is Order
.select_many<OrderSummary>("o", ...)

// ❌ Invalid: "r" is User, child's source is Order
.select_many<OrderSummary>("r", ...)
// Error: ProjectionEntryTypeMismatchError
// "select_many<OrderSummary> requires source of type Order, but 'r' is type User"

Expression Translation

Friendly Name Resolution Chain

flowchart LR
    subgraph Input
        I["where - user_id == 55"]
    end
    
    subgraph Step 1 - Friendly Name Lookup
        S1["user_id → r.id"]
    end
    
    subgraph Step 2 - Variable Translation
        S2["r.id → val1.id"]
    end
    
    subgraph SQL
        SQL["WHERE val1.id = 55"]
    end
    
    I --> S1 --> S2 --> SQL

Nested Property Navigation

// Navigate into nested projections
.where("profile.id == 5")
.where("profile.bio contains 'developer'")

Translation: profile.id → lookup profile → find nested ProfileSummary → lookup id in ProfileSummary → pr.idval2.id

WHERE vs HAVING Detection

Aggregate expressions are automatically placed in HAVING clause:

.where("user_id > 100")        // Non-aggregate → WHERE
.where("order_count >= 5")     // Aggregate → HAVING
.where("user_id > 100 && order_count >= 5")  // Split: WHERE + HAVING

Mixed Aggregate/Non-Aggregate OR - Auto Subquery

When OR combines aggregate and non-aggregate expressions, the query is automatically wrapped in a subquery:

.where("user_id > 100 || order_count >= 5")

// Generated SQL:
// SELECT * FROM (
//     SELECT u.id as user_id, COUNT(o.id) as order_count
//     FROM users u
//     LEFT JOIN orders o ON u.id = o.user_id
//     GROUP BY u.id
// ) subq
// WHERE user_id > 100 OR order_count >= 5

Complete Example

// Result types
public class OrderSummary : Object {
    public int64 order_id { get; set; }
    public double total { get; set; }
}

public class UserWithOrderStats : Object {
    public int64 user_id { get; set; }
    public string user_name { get; set; }
    public int64 order_count { get; set; }
    public double total_spent { get; set; }
    public Enumerable<OrderSummary> orders { get; set; }
}

// Register projections
session.register_projection<OrderSummary>(p => p
    .source<Order>("r")
    .join<OrderItem>("oi", "r.id == oi.order_id")
    .group_by("r.id")
    .select<int64>("order_id", "r.id", (x, v) => x.order_id = v)
    .select<double>("total", "SUM(oi.price)", (x, v) => x.total = v)
);

session.register_projection<UserWithOrderStats>(p => p
    .source<User>("r")
    .join<Order>("o", "r.id == o.user_id")
    .group_by("r.id")
    .select<int64>("user_id", "r.id", (x, v) => x.user_id = v)
    .select<string>("user_name", "r.name", (x, v) => x.user_name = v)
    .select<int64>("order_count", "COUNT(o.id)", (x, v) => x.order_count = v)
    .select<double>("total_spent", "SUM(o.total)", (x, v) => x.total_spent = v)
    .select_many<OrderSummary>("orders", "o", (x, v) => x.orders = v)
);

// Query
var results = session.query<UserWithOrderStats>()
    .where("user_id > 100")
    .or_where("order_count >= 5")
    .order_by_desc("total_spent")
    .limit(10)
    .materialise_async();

// Generated SQL:
// SELECT * FROM (
//     SELECT 
//         val1.id as user_id,
//         val1.name as user_name,
//         COUNT(val2.id) as order_count,
//         SUM(val2.total) as total_spent,
//         val3.id as order_id,
//         SUM(val4.price) as total
//     FROM users val1
//     JOIN orders val2 ON val1.id = val2.user_id
//     JOIN orders val3 ON val1.id = val3.user_id
//     JOIN order_items val4 ON val3.id = val4.order_id
//     GROUP BY val1.id, val3.id
// ) subq
// WHERE user_id > 100 OR order_count >= 5
// ORDER BY total_spent DESC
// LIMIT 10

Implementation Components

New Classes

Class Responsibility
ProjectionDefinition Stores source, joins, selections, group_by for a projection
ProjectionBuilder<T> Fluent builder for registering projections
ProjectionQuery<T> Query builder for projections (extends existing Query API)
ProjectionMapper<T> Materializes projection results from database rows
VariableTranslator Translates variables to SQL aliases with type information
FriendlyNameResolver Resolves friendly names to SQL expressions
AggregateAnalyzer Detects aggregates in expressions for WHERE/HAVING split
ProjectionSqlBuilder Builds complete SELECT queries from ProjectionDefinition

Modified Classes

Class Changes
OrmSession Add register_projection<T>(), modify query<T>() to support projections
Query<T> Support friendly name resolution, WHERE/HAVING split
ExpressionToSqlVisitor Support variable translation, aggregate detection

SqlDialect Interface Extensions

The SqlDialect interface needs new methods for projection support:

public interface SqlDialect : Object {
    // ... existing methods ...
    
    /**
     * Builds a SELECT statement with JOINs for a projection query.
     *
     * @param sources The source and join definitions
     * @param selections The column selections with expressions
     * @param group_by The GROUP BY expressions
     * @param where_clause The WHERE clause expression (already translated)
     * @param having_clause The HAVING clause expression (already translated)
     * @param order_by The ORDER BY expressions
     * @param limit Optional limit
     * @param offset Optional offset
     * @return The complete SQL SELECT statement
     */
    public abstract string build_projection_select(
        Vector<SourceDefinition> sources,
        Vector<SelectionDefinition> selections,
        Vector<string> group_by,
        string? where_clause,
        string? having_clause,
        Vector<OrderByDefinition> order_by,
        int64? limit,
        int64? offset
    );
    
    /**
     * Builds a subquery wrapper for mixed aggregate/non-aggregate OR conditions.
     *
     * @param inner_query The inner query with GROUP BY
     * @param combined_where The combined WHERE condition to apply to the subquery result
     * @return The wrapped SQL query
     */
    public abstract string wrap_subquery_for_mixed_or(
        string inner_query,
        string combined_where
    );
    
    /**
     * Generates a table alias with type information for debugging.
     *
     * @param index The alias index (1-based)
     * @param type_name The entity type name
     * @return The formatted alias (e.g., "val_1_User")
     */
    public abstract string generate_table_alias(int index, string type_name);
}

SQLite-Specific Implementation (SqliteDialect)

public class SqliteDialect : Object, SqlDialect {
    
    // ... existing methods ...
    
    public string build_projection_select(
        Vector<SourceDefinition> sources,
        Vector<SelectionDefinition> selections,
        Vector<string> group_by,
        string? where_clause,
        string? having_clause,
        Vector<OrderByDefinition> order_by,
        int64? limit,
        int64? offset
    ) {
        var sql = new StringBuilder();
        
        // SELECT clause
        sql.append("SELECT ");
        var select_parts = new Vector<string>();
        foreach (var sel in selections) {
            select_parts.add(@"$(sel.expression) AS $(sel.friendly_name)");
        }
        sql.append(string.joinv(", ", select_parts.to_array()));
        
        // FROM clause
        var primary = sources[0];
        sql.append(@" FROM $(primary.table_name) $(primary.alias)");
        
        // JOIN clauses
        for (int i = 1; i < sources.size; i++) {
            var join = sources[i];
            sql.append(@" JOIN $(join.table_name) $(join.alias) ON $(join.condition)");
        }
        
        // WHERE clause
        if (where_clause != null) {
            sql.append(@" WHERE $(where_clause)");
        }
        
        // GROUP BY clause
        if (group_by.size > 0) {
            sql.append(" GROUP BY ");
            sql.append(string.joinv(", ", group_by.to_array()));
        }
        
        // HAVING clause
        if (having_clause != null) {
            sql.append(@" HAVING $(having_clause)");
        }
        
        // ORDER BY clause
        if (order_by.size > 0) {
            sql.append(" ORDER BY ");
            var order_parts = new Vector<string>();
            foreach (var ob in order_by) {
                order_parts.add(ob.descending ? @"$(ob.expression) DESC" : ob.expression);
            }
            sql.append(string.joinv(", ", order_parts.to_array()));
        }
        
        // LIMIT clause
        if (limit != null) {
            sql.append(@" LIMIT $(limit)");
        }
        
        // OFFSET clause
        if (offset != null) {
            sql.append(@" OFFSET $(offset)");
        }
        
        return sql.str;
    }
    
    public string wrap_subquery_for_mixed_or(string inner_query, string combined_where) {
        return @"SELECT * FROM ($inner_query) subq WHERE $combined_where";
    }
    
    public string generate_table_alias(int index, string type_name) {
        return @"val_$(index)_$(type_name)";
    }
}

ExpressionToSqlVisitor Extensions

The existing ExpressionToSqlVisitor needs to support:

  1. Variable translation: Replace user-defined variables with generated aliases
  2. Aggregate function detection: Identify COUNT, SUM, AVG, MIN, MAX calls
  3. Friendly name resolution: Look up friendly names in the projection context

    // New constructor parameter for projection context
    public ExpressionToSqlVisitor(
    EntityMapper? mapper,
    ProjectionDefinition? projection = null,  // NEW
    Map<string, string>? variable_aliases = null  // NEW: user_var -> sql_alias
    ) {
    // ...
    }
    
    // New method to check if expression contains aggregates
    public bool contains_aggregate(Expression expr) {
    // Visit expression and return true if any aggregate function found
    }
    
    // New method to translate variable references
    protected override void visit_property_access(PropertyAccess expr) {
    if (variable_aliases != null && variable_aliases.has_key(expr.property_name)) {
        // Translate user variable to SQL alias
        builder.append(variable_aliases[expr.property_name]);
    } else {
        // Existing logic
    }
    }
    

Error Types

Error When Thrown
ProjectionEntryTypeMismatchError Entry point type doesn't match child's source type
UndefinedFriendlyNameError Friendly name not found in projection
DuplicateVariableError Variable name already defined in projection
MissingGroupByError Aggregate query without group_by
ProjectionNotRegisteredError Query requested for unregistered projection type

Constraints

  1. Unique variable names - Each source/join must have a unique variable name within a projection
  2. Entry point type match - select_many<Child>(entry_point, ...) requires entry_point type to match child's source type
  3. Explicit GROUP BY - Aggregate queries require explicit .group_by() declaration
  4. Client-side grouping - 1:N collections are grouped client-side (no JSON aggregation)
  5. Invercargill Expressions only - All expression parameters must use Invercargill.Expressions syntax, not raw SQL