Data Structure
Data Structure
Section titled “Data Structure”OstrichDB uses a hierarchical data structure that provides intuitive organization, efficient querying, and natural user isolation. This document explains how data is organized, stored, and accessed.
Hierarchical Organization
Section titled “Hierarchical Organization”Overview
Section titled “Overview”The OstrichDB hierarchy consists of four levels:
Projects (User isolation & top-level organization) └── Collections (Logical data groupings, can be encrypted) └── Clusters (Record groupings for organization) └── Records (Individual data items with types & values)
Level Purposes
Section titled “Level Purposes”Projects
Section titled “Projects”- User isolation: Each user has their own project namespace
- Top-level organization: Logical separation of different applications or use cases
- Access control: Projects define security boundaries
- Resource management: Quotas and limits applied at project level
Collections
Section titled “Collections”- Data grouping: Related data organized together (e.g., “users”, “products”, “orders”)
- Encryption boundary: Collections can be encrypted independently
- Schema flexibility: Each collection can have different data organization
- Logical separation: Different data types or business entities
Clusters
Section titled “Clusters”- Record organization: Group related records together (e.g., “active_users”, “archived_orders”)
- Query optimization: Searches can target specific clusters
- Performance tuning: Data locality for related records
- Logical subsets: Further categorization within collections
Records
Section titled “Records”- Data storage: Individual data items with names, types, and values
- Type enforcement: Each record has a specific data type
- Atomic operations: Records are the smallest unit of data manipulation
- Metadata: Each record includes creation/modification timestamps
File System Structure
Section titled “File System Structure”Physical Layout
Section titled “Physical Layout”data/├── projects/│ ├── project1/│ │ ├── metadata.json│ │ └── collections/│ │ ├── users/│ │ │ ├── metadata.json│ │ │ └── clusters/│ │ │ ├── active_users/│ │ │ │ ├── metadata.json│ │ │ │ └── records/│ │ │ │ ├── username.json│ │ │ │ ├── email.json│ │ │ │ └── age.json│ │ │ └── archived_users/│ │ │ ├── metadata.json│ │ │ └── records/│ │ └── products/│ │ ├── metadata.json│ │ └── clusters/│ └── project2/└── system/ ├── users.json └── config.json
Metadata Files
Section titled “Metadata Files”Project Metadata
Section titled “Project Metadata”{ "name": "my-project", "owner": "user123", "created": "2024-01-15T10:00:00Z", "modified": "2024-01-15T10:00:00Z", "description": "Project description", "settings": { "default_encryption": false, "backup_enabled": true }}
Collection Metadata
Section titled “Collection Metadata”{ "name": "users", "encrypted": true, "created": "2024-01-15T10:00:00Z", "modified": "2024-01-15T10:00:00Z", "record_count": 150, "cluster_count": 3, "encryption": { "algorithm": "AES-256", "key_id": "user123_master" }}
Cluster Metadata
Section titled “Cluster Metadata”{ "name": "active_users", "created": "2024-01-15T10:00:00Z", "modified": "2024-01-15T10:00:00Z", "record_count": 45, "size_bytes": 2048, "last_accessed": "2024-01-15T15:30:00Z"}
Record File Structure
Section titled “Record File Structure”{ "name": "username", "type": "STRING", "value": "john_doe", "created": "2024-01-15T10:00:00Z", "modified": "2024-01-15T10:00:00Z", "metadata": { "id": "rec_123456", "size_bytes": 64 }}
Data Types and Storage
Section titled “Data Types and Storage”Supported Types
Section titled “Supported Types”Basic Types
Section titled “Basic Types”- STRING/STR/CHAR: Text data, stored as UTF-8
- INTEGER/INT: 64-bit signed integers
- FLOAT/FLT: 64-bit floating-point numbers
- BOOLEAN/BOOL: True/false values
Temporal Types
Section titled “Temporal Types”- DATE: Date values (YYYY-MM-DD format)
- TIME: Time values (HH:MM:SS format)
- DATETIME: Combined date and time (ISO 8601 format)
Special Types
Section titled “Special Types”- UUID: Universally unique identifiers
- NULL: Null/empty values
- CREDENTIAL: Encrypted credential storage
Array Types
Section titled “Array Types”Arrays of any basic type:
- []STRING, []INTEGER, []FLOAT, []BOOLEAN
- []DATE, []TIME, []DATETIME, []UUID
Type Storage Examples
Section titled “Type Storage Examples”// String record{ "name": "username", "type": "STRING", "value": "alice_smith"}
// Integer record{ "name": "age", "type": "INTEGER", "value": 28}
// Array record{ "name": "tags", "type": "[]STRING", "value": ["admin", "active", "verified"]}
// Date record{ "name": "created_date", "type": "DATETIME", "value": "2024-01-15T10:30:00Z"}
Encryption and Security
Section titled “Encryption and Security”Collection-Level Encryption
Section titled “Collection-Level Encryption”Collections can be encrypted using user-specific master keys:
Encrypted Collection Structure
Section titled “Encrypted Collection Structure”encrypted_collection/├── metadata.json (unencrypted metadata)└── clusters/ └── cluster_name/ ├── metadata.json (unencrypted) └── records/ ├── record1.enc (encrypted record data) └── record2.enc (encrypted record data)
Encryption Process
Section titled “Encryption Process”- Key derivation: Master key derived from user credentials
- Data encryption: Record data encrypted with AES-256
- Metadata preservation: Structure metadata remains unencrypted
- Transparent operations: Automatic encrypt/decrypt during operations
Access Control
Section titled “Access Control”- Project ownership: Users can only access their own projects
- Collection access: Encrypted collections require proper keys
- Record-level security: All operations validate user permissions
Query and Access Patterns
Section titled “Query and Access Patterns”Hierarchical Queries
Section titled “Hierarchical Queries”Queries follow the hierarchical structure:
/projects/{project}/collections/{collection}/clusters/{cluster}/records
Efficient Access Patterns
Section titled “Efficient Access Patterns”- Direct access: Fast lookup by exact path
- Hierarchical browsing: Navigate structure level by level
- Filtered queries: Search within specific levels
- Bulk operations: Operate on entire clusters or collections
Indexing Strategy
Section titled “Indexing Strategy”- Path-based indexing: Fast hierarchical lookups
- Type indexing: Quick filtering by data type
- Name indexing: Efficient record name searches
- Metadata indexing: Query by creation date, size, etc.
Performance Considerations
Section titled “Performance Considerations”Storage Efficiency
Section titled “Storage Efficiency”- Separate metadata: Metadata separated from data for faster queries
- File-per-record: Individual record files for atomic operations
- Hierarchical caching: Cache frequently accessed metadata
- Lazy loading: Load data only when accessed
Query Performance
Section titled “Query Performance”- Path optimization: Direct path resolution without scanning
- Metadata queries: Fast listing without loading record data
- Filtered scans: Early termination when filters don’t match
- Concurrent access: Multiple readers, controlled writers
Memory Management
Section titled “Memory Management”- Stream processing: Large results streamed rather than loaded entirely
- Resource cleanup: Automatic cleanup with defer patterns
- Memory pools: Reuse allocated memory for common operations
- Garbage avoidance: Manual memory management eliminates GC overhead
Backup and Recovery
Section titled “Backup and Recovery”Backup Strategy
Section titled “Backup Strategy”- Hierarchical backups: Backup at any level of hierarchy
- Incremental backups: Only changed data since last backup
- Metadata preservation: Backup includes all metadata
- Encryption aware: Encrypted data backed up encrypted
Recovery Process
Section titled “Recovery Process”- Point-in-time recovery: Restore to specific timestamp
- Partial recovery: Restore specific projects or collections
- Consistency checks: Verify data integrity after recovery
- Rollback capability: Revert changes if needed
Best Practices
Section titled “Best Practices”Data Organization
Section titled “Data Organization”- Logical grouping: Use projects for different applications
- Collection design: Group related data in collections
- Cluster strategy: Use clusters for natural data subsets
- Naming conventions: Use consistent, descriptive names
Performance Optimization
Section titled “Performance Optimization”- Access patterns: Design hierarchy around query patterns
- Batch operations: Group related operations together
- Encryption planning: Consider encryption overhead for sensitive data
- Monitoring: Track performance metrics and query patterns
Security Guidelines
Section titled “Security Guidelines”- Encryption boundaries: Use collection-level encryption appropriately
- Access control: Implement proper user isolation
- Key management: Secure handling of encryption keys
- Audit trails: Log all data access and modifications
Next Steps
Section titled “Next Steps”Learn more about working with the data structure:
- API Reference - How to interact with the hierarchy via REST API
- Security - Encryption and access control details
- Configuration - Database configuration options