Skip to content

Data Structure

OstrichDB uses a hierarchical data structure that provides intuitive organization, efficient querying, and natural user isolation. This document explains how data is organized, stored, and accessed.

The OstrichDB hierarchy consists of four levels:

Projects (User isolation & top-level organization)
└── Collections (Logical data groupings, can be encrypted)
└── Clusters (Record groupings for organization)
└── Records (Individual data items with types & values)
  • User isolation: Each user has their own project namespace
  • Top-level organization: Logical separation of different applications or use cases
  • Access control: Projects define security boundaries
  • Resource management: Quotas and limits applied at project level
  • Data grouping: Related data organized together (e.g., “users”, “products”, “orders”)
  • Encryption boundary: Collections can be encrypted independently
  • Schema flexibility: Each collection can have different data organization
  • Logical separation: Different data types or business entities
  • Record organization: Group related records together (e.g., “active_users”, “archived_orders”)
  • Query optimization: Searches can target specific clusters
  • Performance tuning: Data locality for related records
  • Logical subsets: Further categorization within collections
  • Data storage: Individual data items with names, types, and values
  • Type enforcement: Each record has a specific data type
  • Atomic operations: Records are the smallest unit of data manipulation
  • Metadata: Each record includes creation/modification timestamps
data/
├── projects/
│ ├── project1/
│ │ ├── metadata.json
│ │ └── collections/
│ │ ├── users/
│ │ │ ├── metadata.json
│ │ │ └── clusters/
│ │ │ ├── active_users/
│ │ │ │ ├── metadata.json
│ │ │ │ └── records/
│ │ │ │ ├── username.json
│ │ │ │ ├── email.json
│ │ │ │ └── age.json
│ │ │ └── archived_users/
│ │ │ ├── metadata.json
│ │ │ └── records/
│ │ └── products/
│ │ ├── metadata.json
│ │ └── clusters/
│ └── project2/
└── system/
├── users.json
└── config.json
{
"name": "my-project",
"owner": "user123",
"created": "2024-01-15T10:00:00Z",
"modified": "2024-01-15T10:00:00Z",
"description": "Project description",
"settings": {
"default_encryption": false,
"backup_enabled": true
}
}
{
"name": "users",
"encrypted": true,
"created": "2024-01-15T10:00:00Z",
"modified": "2024-01-15T10:00:00Z",
"record_count": 150,
"cluster_count": 3,
"encryption": {
"algorithm": "AES-256",
"key_id": "user123_master"
}
}
{
"name": "active_users",
"created": "2024-01-15T10:00:00Z",
"modified": "2024-01-15T10:00:00Z",
"record_count": 45,
"size_bytes": 2048,
"last_accessed": "2024-01-15T15:30:00Z"
}
{
"name": "username",
"type": "STRING",
"value": "john_doe",
"created": "2024-01-15T10:00:00Z",
"modified": "2024-01-15T10:00:00Z",
"metadata": {
"id": "rec_123456",
"size_bytes": 64
}
}
  • STRING/STR/CHAR: Text data, stored as UTF-8
  • INTEGER/INT: 64-bit signed integers
  • FLOAT/FLT: 64-bit floating-point numbers
  • BOOLEAN/BOOL: True/false values
  • DATE: Date values (YYYY-MM-DD format)
  • TIME: Time values (HH:MM:SS format)
  • DATETIME: Combined date and time (ISO 8601 format)
  • UUID: Universally unique identifiers
  • NULL: Null/empty values
  • CREDENTIAL: Encrypted credential storage

Arrays of any basic type:

  • []STRING, []INTEGER, []FLOAT, []BOOLEAN
  • []DATE, []TIME, []DATETIME, []UUID
// String record
{
"name": "username",
"type": "STRING",
"value": "alice_smith"
}
// Integer record
{
"name": "age",
"type": "INTEGER",
"value": 28
}
// Array record
{
"name": "tags",
"type": "[]STRING",
"value": ["admin", "active", "verified"]
}
// Date record
{
"name": "created_date",
"type": "DATETIME",
"value": "2024-01-15T10:30:00Z"
}

Collections can be encrypted using user-specific master keys:

encrypted_collection/
├── metadata.json (unencrypted metadata)
└── clusters/
└── cluster_name/
├── metadata.json (unencrypted)
└── records/
├── record1.enc (encrypted record data)
└── record2.enc (encrypted record data)
  1. Key derivation: Master key derived from user credentials
  2. Data encryption: Record data encrypted with AES-256
  3. Metadata preservation: Structure metadata remains unencrypted
  4. Transparent operations: Automatic encrypt/decrypt during operations
  • Project ownership: Users can only access their own projects
  • Collection access: Encrypted collections require proper keys
  • Record-level security: All operations validate user permissions

Queries follow the hierarchical structure:

/projects/{project}/collections/{collection}/clusters/{cluster}/records
  • Direct access: Fast lookup by exact path
  • Hierarchical browsing: Navigate structure level by level
  • Filtered queries: Search within specific levels
  • Bulk operations: Operate on entire clusters or collections
  • Path-based indexing: Fast hierarchical lookups
  • Type indexing: Quick filtering by data type
  • Name indexing: Efficient record name searches
  • Metadata indexing: Query by creation date, size, etc.
  • Separate metadata: Metadata separated from data for faster queries
  • File-per-record: Individual record files for atomic operations
  • Hierarchical caching: Cache frequently accessed metadata
  • Lazy loading: Load data only when accessed
  • Path optimization: Direct path resolution without scanning
  • Metadata queries: Fast listing without loading record data
  • Filtered scans: Early termination when filters don’t match
  • Concurrent access: Multiple readers, controlled writers
  • Stream processing: Large results streamed rather than loaded entirely
  • Resource cleanup: Automatic cleanup with defer patterns
  • Memory pools: Reuse allocated memory for common operations
  • Garbage avoidance: Manual memory management eliminates GC overhead
  • Hierarchical backups: Backup at any level of hierarchy
  • Incremental backups: Only changed data since last backup
  • Metadata preservation: Backup includes all metadata
  • Encryption aware: Encrypted data backed up encrypted
  • Point-in-time recovery: Restore to specific timestamp
  • Partial recovery: Restore specific projects or collections
  • Consistency checks: Verify data integrity after recovery
  • Rollback capability: Revert changes if needed
  • Logical grouping: Use projects for different applications
  • Collection design: Group related data in collections
  • Cluster strategy: Use clusters for natural data subsets
  • Naming conventions: Use consistent, descriptive names
  • Access patterns: Design hierarchy around query patterns
  • Batch operations: Group related operations together
  • Encryption planning: Consider encryption overhead for sensitive data
  • Monitoring: Track performance metrics and query patterns
  • Encryption boundaries: Use collection-level encryption appropriately
  • Access control: Implement proper user isolation
  • Key management: Secure handling of encryption keys
  • Audit trails: Log all data access and modifications

Learn more about working with the data structure: