Skip to main content

Schema Specification

This document describes how to define table schemas in DataCovey using JSON format.

Overview

A schema defines the structure of your table, including column names, data types, and privacy settings. The schema is specified using a JSON array where each element describes a column in your table.

JSON Schema Format

Each column in the schema is represented by a JSON object with the following properties:

  • name: The name of the column (required)
  • type: The data type of the column (required)
  • mode: Whether the column is NULLABLE, REQUIRED, or REPEATED (optional, defaults to NULLABLE)
  • description: A description of the column (optional)
  • defaultValueExpression: The default value for the column (optional)
  • private: Whether the column should be private to the tenant (optional, defaults to false)

Example schema:

[
{
"name": "user_id",
"type": "STRING",
"mode": "REQUIRED",
"description": "Unique identifier for the user"
},
{
"name": "credit_card_token",
"type": "STRING",
"mode": "REQUIRED",
"private": true,
"description": "Tokenized credit card number - only visible to tenant"
},
{
"name": "transaction_amount",
"type": "NUMERIC",
"mode": "NULLABLE",
"defaultValueExpression": "0.00",
"description": "Amount of the transaction"
}
]

Column Names

Column names must follow these rules:

  • Must start with a letter (a-z, A-Z) or underscore (_)
  • Can contain letters, numbers, and underscores
  • Maximum length of 300 characters
  • Cannot use reserved prefixes
  • Case-sensitive

Data Types

DataCovey supports the following data types:

  • STRING: Variable-length character data
  • BYTES: Variable-length binary data
  • INT64: 64-bit integer
  • FLOAT64: Double precision floating point
  • NUMERIC: Exact numeric values with 38 digits of precision and 9 decimal digits
  • BIGNUMERIC: Exact numeric values with 76.76 digits of precision and 38 decimal digits
  • BOOL: TRUE or FALSE
  • DATE: Calendar date (year, month, day)
  • DATETIME: Date and time
  • TIME: Time of day
  • TIMESTAMP: Absolute point in time
  • STRUCT: Container of ordered fields
  • GEOGRAPHY: Geography data type
  • JSON: JSON data type

Column Modes

  • NULLABLE: Column can contain NULL values (default)
  • REQUIRED: Column cannot contain NULL values
  • REPEATED: Column contains an array of values

Privacy Settings

DataCovey allows columns to be marked as private using the private property:

{
"name": "sensitive_data",
"type": "STRING",
"private": true,
"description": "This column is only visible to the tenant who owns the data"
}

When a column is marked as private:

  • Only users from the tenant who owns the data can see the actual values
  • Other tenants can use the column in computations but cannot see the raw values
  • The privacy setting cannot be changed after table creation

Schema Change Constraints

Once a table is created, there are certain limitations on schema modifications:

Allowed Changes

  • Adding new columns
  • Making a required column nullable
  • Relaxing length limits on STRING or BYTES columns
  • Adding or updating column descriptions

Prohibited Changes

  • Renaming existing columns
  • Changing column data types
  • Making a nullable column required
  • Removing columns
  • Changing privacy settings of columns
  • Modifying the precision or scale of NUMERIC columns

Best Practices

  1. Plan Your Schema Carefully: Since many schema changes are restricted after table creation, it's important to plan your schema carefully.

  2. Use Descriptions: Add clear descriptions to your columns to help other users understand the data.

  3. Consider Privacy Requirements: Identify sensitive columns that should be marked as private during schema creation.

  4. Default Values: Use default values when appropriate to handle missing data consistently.

  5. Choose Appropriate Data Types: Select the most appropriate data type for your data to ensure optimal storage and query performance.