safepath-design.md 14 KB

SafePath API Design

Overview

This document describes the design for a SafePath API that provides a succinct, variadic constructor for creating URL-encoded entity paths in Implexus.

Analysis of Current Path System

Current Implementation: EntityPath

The existing EntityPath class provides:

  1. Multiple constructors:

    • EntityPath(string path_string) - parses a path string
    • EntityPath.root() - creates the root path
    • EntityPath.from_segments(Enumerable<string> segments) - creates from segment collection
    • EntityPath.with_child(EntityPath parent, string name) - creates child path
  2. Current escaping mechanism (tilde-based, similar to RFC 6901 JSON Pointer):

    ~  → ~7e
    /  → ~2f
    \  → ~5c
    \0 → ~00
    
  3. Limitations:

    • No variadic constructor for inline path building
    • Requires string concatenation or multiple method calls for multi-segment paths
    • Tilde escaping is non-standard and may confuse users

Usage Patterns from Tests

From EntityPathTest.vala, typical usage includes:

// Current verbose patterns
var root = new EntityPath.root();
var users = root.append_child("users");
var john = users.append_child("john");

// Or string-based
var path = new EntityPath("/users/john/profile");

Proposed SafePath API

Design Goals

  1. Succinct API: Single constructor call with variadic segments
  2. URL Encoding: Automatic percent-encoding of each segment using standard RFC 3986
  3. Integration: Produces EntityPath instances for seamless integration
  4. Safety: Handles special characters, empty segments, and edge cases

API Design

namespace Implexus.Core {

/**
 * SafePath provides a convenient factory for creating EntityPath instances
 * with automatic URL encoding of path segments.
 * 
 * Example usage:
 * {{{
 * var path = SafePath.path("users", "john doe", "profile");
 * // Creates EntityPath for /users/john%20doe/profile
 * 
 * var root = SafePath.path();  // Root path
 * var simple = SafePath.path("catalogue");  // Single segment
 * }}}
 */
public class SafePath : Object {

    /**
     * Characters that MUST be encoded in path segments.
     * Based on RFC 3986 with additional safety characters.
     */
    private const string RESERVED_CHARS = "!*'();:@&=+$,/?#[]%\"\\<>^`{|}~";
    
    /**
     * Creates an EntityPath from variadic segments, URL-encoding each segment.
     * 
     * @param first_segment The first path segment (required to start variadic args)
     * @param ... Additional segments, terminated by null
     * @return A new EntityPath with encoded segments
     * 
     * Example:
     * {{{
     * var path = SafePath.path("users", "john", "profile", null);
     * // Result: /users/john/profile
     * 
     * var encoded = SafePath.path("data", "2024/01", "file name", null);
     * // Result: /data/2024%2F01/file%20name
     * }}}
     */
    public static EntityPath path(string? first_segment, ...) {
        var segments = new Invercargill.DataStructures.Vector<string>();
        
        if (first_segment == null) {
            return new EntityPath.root();
        }
        
        // Add first segment
        segments.add(encode_segment(first_segment));
        
        // Process variadic arguments
        va_list args = va_list();
        while (true) {
            string? segment = args.arg();
            if (segment == null) {
                break;
            }
            segments.add(encode_segment(segment));
        }
        
        return new EntityPath.from_segments(segments.as_enumerable());
    }
    
    /**
     * Creates an EntityPath from an array of segments.
     * Alternative API for when array-based construction is preferred.
     * 
     * @param segments Array of path segments
     * @return A new EntityPath with encoded segments
     */
    public static EntityPath from_array(string[] segments) {
        if (segments.length == 0) {
            return new EntityPath.root();
        }
        
        var encoded_segments = new Invercargill.DataStructures.Vector<string>();
        foreach (var segment in segments) {
            encoded_segments.add(encode_segment(segment));
        }
        
        return new EntityPath.from_segments(encoded_segments.as_enumerable());
    }
    
    /**
     * URL-encodes a path segment according to RFC 3986.
     * 
     * Encodes:
     * - All reserved URI characters
     * - Space (as %20, not +)
     * - Non-ASCII characters (as percent-encoded UTF-8)
     * - Control characters
     * 
     * @param segment The raw segment to encode
     * @return The URL-encoded segment
     */
    public static string encode_segment(string segment) {
        if (segment.length == 0) {
            return "";
        }
        
        // Use GLib's URI escaping with custom reserved set
        // GLib.Uri.escape_string encodes space as %20 by default
        return Uri.escape_string(segment, RESERVED_CHARS, true);
    }
    
    /**
     * Decodes a URL-encoded path segment.
     * 
     * @param encoded The encoded segment
     * @return The decoded segment
     * @throws EntityError.INVALID_PATH if the segment contains invalid percent-encoding
     */
    public static string decode_segment(string encoded) throws EntityError {
        string? decoded = Uri.unescape_string(encoded);
        if (decoded == null) {
            throw new EntityError.INVALID_PATH(
                "Invalid percent-encoding in path segment: %s".printf(encoded)
            );
        }
        return decoded;
    }
}

} // namespace Implexus.Core

Alternative: EntityPath Extension

An alternative design extends EntityPath directly with static factory methods:

// Add to EntityPath class
public partial class EntityPath {
    
    /**
     * Creates an EntityPath from variadic segments with automatic URL encoding.
     * Terminate with null.
     * 
     * Example:
     * {{{
     * var path = EntityPath.from_parts("users", "john doe", null);
     * }}}
     */
    public static EntityPath from_parts(string? first_segment, ...) {
        var segments = new Invercargill.DataStructures.Vector<string>();
        
        if (first_segment == null) {
            return new EntityPath.root();
        }
        
        segments.add(SafePath.encode_segment(first_segment));
        
        va_list args = va_list();
        while (true) {
            string? segment = args.arg();
            if (segment == null) break;
            segments.add(SafePath.encode_segment(segment));
        }
        
        return new EntityPath.from_segments(segments.as_enumerable());
    }
}

URL Encoding Strategy

Characters to Encode

Following RFC 3986 with additional safety considerations:

Category Characters Encoding Example
Space %20
Reserved ! * ' ( ) ; : @ & = + $ , / ? # [ ] %21, %2A, etc.
Percent % %25
Control \x00-\x1F %00-%1F`
Non-ASCII Unicode chars UTF-8 percent-encoded

GLib.Uri Methods

Use these GLib methods for encoding/decoding:

// Encoding
string encoded = Uri.escape_string(segment, RESERVED_CHARS, true);

// Decoding  
string? decoded = Uri.unescape_string(encoded);

Note: Uri.escape_string() with escape_reserved = true encodes all reserved characters. We pass a custom reserved set to ensure consistent behavior.

Why Not Tilde Escaping?

The current EntityPath uses tilde escaping (~2f for /). SafePath uses standard URL encoding (%2F for /) because:

  1. Standard: RFC 3986 is universally understood
  2. Tool Support: All HTTP tools, debuggers, and libraries handle it
  3. Debugging: %XX format is immediately recognizable
  4. Interoperability: Works with web APIs and external systems

Edge Cases and Error Handling

Empty Segments

// Empty string segment - allowed but produces empty encoded result
var path = SafePath.path("users", "", "profile", null);
// Result: /users//profile (double slash normalized by EntityPath parsing)

// Recommendation: Validate segments before calling SafePath

Null Handling

// null terminates the variadic list
var path = SafePath.path("a", "b", null, "c", null);
// Result: /a/b (stops at first null)

Special Characters

// Slashes in segment names are encoded
var path = SafePath.path("data", "2024/01/15", "log", null);
// Result: /data/2024%2F01%2F15/log

// Percent signs are double-encoded safely
var path = SafePath.path("query", "100%", null);
// Result: /query/100%25

Unicode Handling

// Unicode characters are UTF-8 percent-encoded
var path = SafePath.path("users", "日本語", null);
// Result: /users/%E6%97%A5%E6%9C%AC%E8%AA%9E

Integration with EntityPath

Conversion Flow

flowchart LR
    A[Raw segments] --> B[SafePath.path]
    B --> C[URL encode each segment]
    C --> D[EntityPath.from_segments]
    D --> E[EntityPath instance]
    
    E --> F[to_string: escaped display]
    E --> G[to_key: raw storage key]

Storage Considerations

The EntityPath stores raw (unencoded) segments internally. The encoding happens at construction time:

// Input: "john doe" (contains space)
var path = SafePath.path("users", "john doe", null);

// Internal storage: segments = ["users", "john doe"]
// to_string(): "/users/john%20doe" (URL encoded for display)
// to_key(): "users/john doe" (raw for storage keys)

Important: This design stores raw segments, not encoded ones. This matches the current EntityPath behavior where escaping is only applied in to_string().

Revised Design: Store Encoded Segments

For true safety, we should store the encoded segments:

public static EntityPath path(string? first_segment, ...) {
    var segments = new Invercargill.DataStructures.Vector<string>();
    
    if (first_segment == null) {
        return new EntityPath.root();
    }
    
    // Store ENCODED segments
    segments.add(encode_segment(first_segment));
    
    va_list args = va_list();
    while (true) {
        string? segment = args.arg();
        if (segment == null) break;
        segments.add(encode_segment(segment));
    }
    
    return new EntityPath.from_segments(segments.as_enumerable());
}

With this approach:

  • to_string(): /users/john%20doe (segments already encoded)
  • to_key(): users/john%20doe (encoded in storage too)

This ensures special characters never appear in storage keys.

Example Usage Patterns

Basic Path Construction

// Simple path
var catalogue = SafePath.path("catalogue", null);
// EntityPath: /catalogue

// Nested path
var document = SafePath.path("catalogue", "category", "document", null);
// EntityPath: /catalogue/category/document

Paths with Special Characters

// Spaces
var user_path = SafePath.path("users", "John Smith", null);
// EntityPath: /users/John%20Smith

// Slashes in names
var date_path = SafePath.path("logs", "2024/01/15", null);
// EntityPath: /logs/2024%2F01%2F15

// Query strings (common in document IDs)
var doc = SafePath.path("docs", "id=123&type=pdf", null);
// EntityPath: /docs/id%3D123%26type%3Dpdf

Array-Based Construction

// When segments are already in an array
string[] parts = { "users", user_id, "settings" };
var settings_path = SafePath.from_array(parts);

Integration with Engine Operations

// Creating documents with safe paths
public async Document create_document(Engine engine, string catalogue, 
                                       string category, string doc_name) throws Error {
    var path = SafePath.path(catalogue, category, doc_name, null);
    return yield engine.create_document(path);
}

// Building index paths
var index_path = SafePath.path("catalogue", "products", "indexes", "price", null);

Implementation Checklist

When implementing this design:

  1. Create SafePath class in src/Core/SafePath.vala
  2. Implement path() variadic method with null terminator
  3. Implement from_array() array-based method
  4. Implement encode_segment() using GLib.Uri.escape_string
  5. Implement decode_segment() using GLib.Uri.unescape_string
  6. Add unit tests in tests/Core/SafePathTest.vala:
    • Basic path construction
    • URL encoding verification
    • Special character handling
    • Unicode handling
    • Empty segment handling
    • Null terminator handling
    • Round-trip encode/decode
  7. Update src/meson.build to include new file
  8. Add integration examples to documentation

Open Questions

  1. Segment Validation: Should SafePath reject empty segments, or pass them through?

    • Recommendation: Pass through, let EntityPath handle normalization
  2. Encoding Storage: Should segments be stored encoded or raw?

    • Recommendation: Store encoded for consistency and safety
  3. Error on Invalid Input: What should happen with null bytes in segments?

    • Recommendation: Encode as %00 (already handled by URL encoding)
  4. API Style: Static factory class vs. EntityPath extension method?

    • Recommendation: Start with SafePath class, add EntityPath.from_parts() as convenience alias

Summary

The SafePath API provides:

  • Succinct variadic construction: SafePath.path("a", "b", "c", null)
  • Automatic URL encoding: Standard RFC 3986 percent-encoding
  • Seamless integration: Returns EntityPath instances
  • Edge case handling: Proper handling of special characters, unicode, and empty segments

This design enables safer, more readable path construction while maintaining full compatibility with the existing EntityPath system.