# SafePath API Design ## Overview This document describes the design for a `SafePath` API that provides a succinct, variadic constructor for creating URL-encoded entity paths in Implexus. ## Analysis of Current Path System ### Current Implementation: EntityPath The existing [`EntityPath`](../src/Core/EntityPath.vala) class provides: 1. **Multiple constructors**: - `EntityPath(string path_string)` - parses a path string - `EntityPath.root()` - creates the root path - `EntityPath.from_segments(Enumerable segments)` - creates from segment collection - `EntityPath.with_child(EntityPath parent, string name)` - creates child path 2. **Current escaping mechanism** (tilde-based, similar to RFC 6901 JSON Pointer): ``` ~ → ~7e / → ~2f \ → ~5c \0 → ~00 ``` 3. **Limitations**: - No variadic constructor for inline path building - Requires string concatenation or multiple method calls for multi-segment paths - Tilde escaping is non-standard and may confuse users ### Usage Patterns from Tests From [`EntityPathTest.vala`](../tests/Core/EntityPathTest.vala), typical usage includes: ```vala // Current verbose patterns var root = new EntityPath.root(); var users = root.append_child("users"); var john = users.append_child("john"); // Or string-based var path = new EntityPath("/users/john/profile"); ``` ## Proposed SafePath API ### Design Goals 1. **Succinct API**: Single constructor call with variadic segments 2. **URL Encoding**: Automatic percent-encoding of each segment using standard RFC 3986 3. **Integration**: Produces `EntityPath` instances for seamless integration 4. **Safety**: Handles special characters, empty segments, and edge cases ### API Design ```vala namespace Implexus.Core { /** * SafePath provides a convenient factory for creating EntityPath instances * with automatic URL encoding of path segments. * * Example usage: * {{{ * var path = SafePath.path("users", "john doe", "profile"); * // Creates EntityPath for /users/john%20doe/profile * * var root = SafePath.path(); // Root path * var simple = SafePath.path("catalogue"); // Single segment * }}} */ public class SafePath : Object { /** * Characters that MUST be encoded in path segments. * Based on RFC 3986 with additional safety characters. */ private const string RESERVED_CHARS = "!*'();:@&=+$,/?#[]%\"\\<>^`{|}~"; /** * Creates an EntityPath from variadic segments, URL-encoding each segment. * * @param first_segment The first path segment (required to start variadic args) * @param ... Additional segments, terminated by null * @return A new EntityPath with encoded segments * * Example: * {{{ * var path = SafePath.path("users", "john", "profile", null); * // Result: /users/john/profile * * var encoded = SafePath.path("data", "2024/01", "file name", null); * // Result: /data/2024%2F01/file%20name * }}} */ public static EntityPath path(string? first_segment, ...) { var segments = new Invercargill.DataStructures.Vector(); if (first_segment == null) { return new EntityPath.root(); } // Add first segment segments.add(encode_segment(first_segment)); // Process variadic arguments va_list args = va_list(); while (true) { string? segment = args.arg(); if (segment == null) { break; } segments.add(encode_segment(segment)); } return new EntityPath.from_segments(segments.as_enumerable()); } /** * Creates an EntityPath from an array of segments. * Alternative API for when array-based construction is preferred. * * @param segments Array of path segments * @return A new EntityPath with encoded segments */ public static EntityPath from_array(string[] segments) { if (segments.length == 0) { return new EntityPath.root(); } var encoded_segments = new Invercargill.DataStructures.Vector(); foreach (var segment in segments) { encoded_segments.add(encode_segment(segment)); } return new EntityPath.from_segments(encoded_segments.as_enumerable()); } /** * URL-encodes a path segment according to RFC 3986. * * Encodes: * - All reserved URI characters * - Space (as %20, not +) * - Non-ASCII characters (as percent-encoded UTF-8) * - Control characters * * @param segment The raw segment to encode * @return The URL-encoded segment */ public static string encode_segment(string segment) { if (segment.length == 0) { return ""; } // Use GLib's URI escaping with custom reserved set // GLib.Uri.escape_string encodes space as %20 by default return Uri.escape_string(segment, RESERVED_CHARS, true); } /** * Decodes a URL-encoded path segment. * * @param encoded The encoded segment * @return The decoded segment * @throws EntityError.INVALID_PATH if the segment contains invalid percent-encoding */ public static string decode_segment(string encoded) throws EntityError { string? decoded = Uri.unescape_string(encoded); if (decoded == null) { throw new EntityError.INVALID_PATH( "Invalid percent-encoding in path segment: %s".printf(encoded) ); } return decoded; } } } // namespace Implexus.Core ``` ### Alternative: EntityPath Extension An alternative design extends EntityPath directly with static factory methods: ```vala // Add to EntityPath class public partial class EntityPath { /** * Creates an EntityPath from variadic segments with automatic URL encoding. * Terminate with null. * * Example: * {{{ * var path = EntityPath.from_parts("users", "john doe", null); * }}} */ public static EntityPath from_parts(string? first_segment, ...) { var segments = new Invercargill.DataStructures.Vector(); if (first_segment == null) { return new EntityPath.root(); } segments.add(SafePath.encode_segment(first_segment)); va_list args = va_list(); while (true) { string? segment = args.arg(); if (segment == null) break; segments.add(SafePath.encode_segment(segment)); } return new EntityPath.from_segments(segments.as_enumerable()); } } ``` ## URL Encoding Strategy ### Characters to Encode Following RFC 3986 with additional safety considerations: | Category | Characters | Encoding Example | |----------|------------|------------------| | Space | ` ` | `%20` | | Reserved | `! * ' ( ) ; : @ & = + $ , / ? # [ ]` | `%21`, `%2A`, etc. | | Percent | `%` | `%25` | | Control | `\x00-\x1F` | `%00`-%1F` | | Non-ASCII | Unicode chars | UTF-8 percent-encoded | ### GLib.Uri Methods Use these GLib methods for encoding/decoding: ```vala // Encoding string encoded = Uri.escape_string(segment, RESERVED_CHARS, true); // Decoding string? decoded = Uri.unescape_string(encoded); ``` **Note**: `Uri.escape_string()` with `escape_reserved = true` encodes all reserved characters. We pass a custom reserved set to ensure consistent behavior. ### Why Not Tilde Escaping? The current EntityPath uses tilde escaping (`~2f` for `/`). SafePath uses standard URL encoding (`%2F` for `/`) because: 1. **Standard**: RFC 3986 is universally understood 2. **Tool Support**: All HTTP tools, debuggers, and libraries handle it 3. **Debugging**: `%XX` format is immediately recognizable 4. **Interoperability**: Works with web APIs and external systems ## Edge Cases and Error Handling ### Empty Segments ```vala // Empty string segment - allowed but produces empty encoded result var path = SafePath.path("users", "", "profile", null); // Result: /users//profile (double slash normalized by EntityPath parsing) // Recommendation: Validate segments before calling SafePath ``` ### Null Handling ```vala // null terminates the variadic list var path = SafePath.path("a", "b", null, "c", null); // Result: /a/b (stops at first null) ``` ### Special Characters ```vala // Slashes in segment names are encoded var path = SafePath.path("data", "2024/01/15", "log", null); // Result: /data/2024%2F01%2F15/log // Percent signs are double-encoded safely var path = SafePath.path("query", "100%", null); // Result: /query/100%25 ``` ### Unicode Handling ```vala // Unicode characters are UTF-8 percent-encoded var path = SafePath.path("users", "日本語", null); // Result: /users/%E6%97%A5%E6%9C%AC%E8%AA%9E ``` ## Integration with EntityPath ### Conversion Flow ```mermaid flowchart LR A[Raw segments] --> B[SafePath.path] B --> C[URL encode each segment] C --> D[EntityPath.from_segments] D --> E[EntityPath instance] E --> F[to_string: escaped display] E --> G[to_key: raw storage key] ``` ### Storage Considerations The EntityPath stores raw (unencoded) segments internally. The encoding happens at construction time: ```vala // Input: "john doe" (contains space) var path = SafePath.path("users", "john doe", null); // Internal storage: segments = ["users", "john doe"] // to_string(): "/users/john%20doe" (URL encoded for display) // to_key(): "users/john doe" (raw for storage keys) ``` **Important**: This design stores raw segments, not encoded ones. This matches the current EntityPath behavior where escaping is only applied in `to_string()`. ### Revised Design: Store Encoded Segments For true safety, we should store the encoded segments: ```vala public static EntityPath path(string? first_segment, ...) { var segments = new Invercargill.DataStructures.Vector(); if (first_segment == null) { return new EntityPath.root(); } // Store ENCODED segments segments.add(encode_segment(first_segment)); va_list args = va_list(); while (true) { string? segment = args.arg(); if (segment == null) break; segments.add(encode_segment(segment)); } return new EntityPath.from_segments(segments.as_enumerable()); } ``` With this approach: - `to_string()`: `/users/john%20doe` (segments already encoded) - `to_key()`: `users/john%20doe` (encoded in storage too) This ensures special characters never appear in storage keys. ## Example Usage Patterns ### Basic Path Construction ```vala // Simple path var catalogue = SafePath.path("catalogue", null); // EntityPath: /catalogue // Nested path var document = SafePath.path("catalogue", "category", "document", null); // EntityPath: /catalogue/category/document ``` ### Paths with Special Characters ```vala // Spaces var user_path = SafePath.path("users", "John Smith", null); // EntityPath: /users/John%20Smith // Slashes in names var date_path = SafePath.path("logs", "2024/01/15", null); // EntityPath: /logs/2024%2F01%2F15 // Query strings (common in document IDs) var doc = SafePath.path("docs", "id=123&type=pdf", null); // EntityPath: /docs/id%3D123%26type%3Dpdf ``` ### Array-Based Construction ```vala // When segments are already in an array string[] parts = { "users", user_id, "settings" }; var settings_path = SafePath.from_array(parts); ``` ### Integration with Engine Operations ```vala // Creating documents with safe paths public async Document create_document(Engine engine, string catalogue, string category, string doc_name) throws Error { var path = SafePath.path(catalogue, category, doc_name, null); return yield engine.create_document(path); } // Building index paths var index_path = SafePath.path("catalogue", "products", "indexes", "price", null); ``` ## Implementation Checklist When implementing this design: 1. [ ] Create `SafePath` class in `src/Core/SafePath.vala` 2. [ ] Implement `path()` variadic method with null terminator 3. [ ] Implement `from_array()` array-based method 4. [ ] Implement `encode_segment()` using `GLib.Uri.escape_string` 5. [ ] Implement `decode_segment()` using `GLib.Uri.unescape_string` 6. [ ] Add unit tests in `tests/Core/SafePathTest.vala`: - [ ] Basic path construction - [ ] URL encoding verification - [ ] Special character handling - [ ] Unicode handling - [ ] Empty segment handling - [ ] Null terminator handling - [ ] Round-trip encode/decode 7. [ ] Update `src/meson.build` to include new file 8. [ ] Add integration examples to documentation ## Open Questions 1. **Segment Validation**: Should SafePath reject empty segments, or pass them through? - Recommendation: Pass through, let EntityPath handle normalization 2. **Encoding Storage**: Should segments be stored encoded or raw? - Recommendation: Store encoded for consistency and safety 3. **Error on Invalid Input**: What should happen with null bytes in segments? - Recommendation: Encode as `%00` (already handled by URL encoding) 4. **API Style**: Static factory class vs. EntityPath extension method? - Recommendation: Start with `SafePath` class, add `EntityPath.from_parts()` as convenience alias ## Summary The `SafePath` API provides: - **Succinct variadic construction**: `SafePath.path("a", "b", "c", null)` - **Automatic URL encoding**: Standard RFC 3986 percent-encoding - **Seamless integration**: Returns `EntityPath` instances - **Edge case handling**: Proper handling of special characters, unicode, and empty segments This design enables safer, more readable path construction while maintaining full compatibility with the existing EntityPath system.