Loading Data to Tables

This tutorial will guide you through uploading data files and loading them into your DataCovey tables using batch load jobs. You'll learn how to prepare your data, configure load options, monitor job progress, and handle errors.

Prerequisites

A DataCovey account with appropriate permissions
A table created in DataCovey (see Getting Started if you need to create one)
A data file in a supported format (CSV, JSON, Avro, Parquet, or ORC)

Introduction

Load jobs allow you to upload data files and import them into DataCovey tables efficiently. This is the recommended way to:

Initial data import: Load your first dataset into a new table
Bulk updates: Add large amounts of data to existing tables
Data backfills: Import historical data in batches
Regular data refreshes: Update tables with new data files

Load jobs are processed asynchronously by DataCovey, making them ideal for large files. You can monitor job progress in real-time and receive notifications when jobs complete.

Supported File Formats

DataCovey supports the following file formats for load jobs:

CSV: Comma-separated values (.csv)
JSON: Newline-delimited JSON (.json, .ndjson, .jsonl)
AVRO: Apache Avro format (.avro)
PARQUET: Apache Parquet format (.parquet, .pqt)
ORC: Optimized Row Columnar format (.orc)

File Size Limits

Maximum file size: 100MB for direct upload
For larger files, consider using the presigned URL upload method (available via API)

Step 1: Prepare Your Data File

Before uploading, ensure your data file meets these requirements:

Format Requirements

CSV files:

First row should contain column headers (optional but recommended)
Use commas as delimiters
Enclose values containing commas in quotes
Use UTF-8 encoding

JSON files (newline-delimited):

Each line must be a valid JSON object
Each object should represent one row

Example format:

{"user_id": "123", "email": "user@example.com", "amount": 100.50}
{"user_id": "456", "email": "user2@example.com", "amount": 200.75}

Avro, Parquet, ORC files:

Must be valid files in their respective formats
Schema should match or be compatible with the target table schema

Data Validation

Ensure data types match the table schema
Required fields (REQUIRED mode) must have values
Date and timestamp formats should match DataCovey expectations
Numeric values should be valid numbers

Step 2: Upload Data to a Table

There are two ways to upload data:

Method 1: Direct Upload (Recommended for files < 100MB)

Navigate to the Tables page and select the table you want to load data into
Click the "Upload Data" button on the table details page
In the upload dialog:
- Click "Choose File" and select your data file
- The file format will be auto-detected from the file extension
- Configure load options (see Step 3)
Click "Upload & Start Load Job"
You'll be redirected to the load job details page to monitor progress

Method 2: Presigned URL Upload (For large files or API integration)

For files larger than 100MB or programmatic uploads:

Use the API endpoint POST /load_job/ without a file to get a presigned URL
Upload your file directly to the provided GCS URL
Complete the job using POST /load_job/{job_id}/upload

See the API documentation for details on presigned URL uploads.

Step 3: Configure Load Options

When uploading a file, you can configure several options:

Write Disposition

Controls how existing data in the table is handled:

Append (WRITE_APPEND): Adds new rows to existing data (default)
- Use this for regular data updates
- Best for incremental data loads
Overwrite (WRITE_TRUNCATE): Replaces all existing data
- Use this to completely refresh a table
- Warning: This will delete all existing data in the table
Write Empty (WRITE_EMPTY): Only loads if the table is empty
- Use this to ensure you don't accidentally overwrite existing data
- Job will fail if the table already contains data

Schema Auto-detect

When enabled, DataCovey will automatically detect the schema from your file and replace the existing table schema.

When to use:

Loading data into a new, empty table
You want DataCovey to infer column types from the data
The file structure matches your desired schema

When NOT to use:

Table already has a defined schema you want to keep
You need specific data types or modes (REQUIRED, NULLABLE)
You want to preserve private column settings

Important: Schema auto-detect will replace the entire table schema. Any existing schema definition will be lost.

File Format

The file format is usually auto-detected from the file extension. You can manually select a format if needed:

CSV
JSON (Newline-delimited)
AVRO
PARQUET
ORC

Step 4: Monitor Load Job Status

After starting a load job, you can monitor its progress in several ways:

On the Load Job Details Page

After uploading, you'll be automatically redirected to the load job details page
The page shows:
- Current Status: Pending, Running, Completed, or Failed
- Job Information: Job ID, table, job ID
- File Information: Source URI, file format
- Configuration: Write disposition, schema auto-detect setting
- Statistics: Rows loaded, bytes processed, duration
- Error Details: If the job failed

Understanding Job Statuses

Pending: Job created, waiting to start
Running: DataCovey is processing the load job
Completed: Job finished successfully
Failed: Job encountered an error

Real-time Updates

The job details page automatically refreshes every 5 seconds while the job is active
You'll receive a browser notification when the job completes
The page will update automatically when the status changes

Viewing All Load Jobs

Navigate to Load Jobs in the Data section of the sidebar
View all load jobs across all your tables
Filter and search to find specific jobs
Click on any job to view details

Viewing Jobs for a Specific Table

Navigate to the table details page
Scroll to the "Load Jobs" section
Click to expand and view recent load jobs for that table
Click on any job to view full details

Step 5: Handle Errors

If a load job fails, here's how to diagnose and fix the issue:

Common Error Scenarios

Schema Mismatch Errors

Error: "Schema mismatch" or "Field type mismatch"

Cause: Data types in your file don't match the table schema

Solution:

Review the error details on the job details page
Check which columns have type mismatches
Update your data file to match the expected types:
- Convert strings to numbers if the column expects NUMERIC
- Ensure dates are in the correct format
- Check that required fields have values
Retry the load job

Invalid Data Format Errors

Error: "Invalid JSON" or "CSV parsing error"

Cause: File format is incorrect or corrupted

Solution:

Verify your file is in the correct format
For JSON files, ensure each line is valid JSON
For CSV files, check for unclosed quotes or special characters
Re-export or regenerate your file if needed

Concurrent Job Conflicts

Error: "Another load job is already running for this table"

Cause: Only one load job can run per table at a time

Solution:

Wait for the current job to complete
Or cancel the existing job if needed
Then retry your upload

File Size Errors

Error: "File size exceeds limit"

Cause: File is larger than 100MB (for direct upload)

Solution:

Split your file into smaller chunks
Or use the presigned URL upload method via API for larger files

How to Fix and Retry

Review Error Details:
- Go to the load job details page
- Scroll to the error section
- Expand error details to see the full error message
Fix the Issue:
- Correct your data file based on the error message
- Ensure data matches the table schema
- Verify file format is correct
Create a New Load Job:
- Navigate back to the table details page
- Click "Upload Data" again
- Upload the corrected file

Best Practices

File Size Recommendations

Small files (< 10MB): Direct upload works well
Medium files (10-100MB): Direct upload is fine, but may take a few minutes
Large files (> 100MB): Use presigned URL upload via API or split into multiple files

Schema Design Tips

Define schemas explicitly: Don't rely on auto-detect for production tables
Use appropriate data types: Choose the most specific type (e.g., INT64 vs NUMERIC)
Set required fields carefully: Only mark fields as REQUIRED if they always have values
Document your schema: Add descriptions to help others understand your data

Performance Considerations

Batch multiple files: If you have many small files, consider combining them
Use compressed formats: Avro and Parquet are more efficient for large datasets
Schedule regular loads: Set up a process to load data on a schedule
Monitor job history: Review past jobs to identify patterns or issues

Data Quality

Validate before upload: Check your data file for common issues
Test with small samples: Upload a small test file first
Review statistics: Check rows loaded and bytes processed match expectations
Verify data after load: Run a query to confirm data loaded correctly

Troubleshooting

Job Stuck in "Running" State

If a job appears stuck:

Wait a bit longer: Large files can take time to process
Check using queries directly: The job may have completed but status hasn't updated, but queries may work .
Refresh the page: Sometimes the UI needs a manual refresh
Contact support: If a job has been running for more than an hour, contact support

Failed Jobs

If a job fails:

Read the error message: The job details page shows the specific error
Check the error details section: Expand it to see full error information
Review your data file: Look for the issues mentioned in the error
Fix and retry: Correct the issue and create a new load job

Concurrent Job Conflicts

If you see a conflict error:

Check existing jobs: Go to the Load Jobs page and filter by your table
Wait for completion: Let the current job finish
Or cancel if needed: If the existing job is stuck, you may need to contact support

Next Steps

After successfully loading data:

Query your data: Use the Queries page to run SQL queries on your loaded data
Export results: Export query results in various formats
Set up regular loads: Consider automating data loads for regular updates
Explore the API: Use the API for programmatic data loading

Summary

In this tutorial, you learned how to:

✅ Prepare data files in supported formats
✅ Upload files directly through the web interface
✅ Configure load options (write disposition, schema auto-detect)
✅ Monitor load job status in real-time
✅ Handle common errors and retry failed jobs
✅ Follow best practices for data loading

You're now ready to load data into your DataCovey tables efficiently!

Prerequisites​

Introduction​

Supported File Formats​

File Size Limits​

Step 1: Prepare Your Data File​

Format Requirements​

Data Validation​

Step 2: Upload Data to a Table​

Method 1: Direct Upload (Recommended for files < 100MB)​

Method 2: Presigned URL Upload (For large files or API integration)​

Step 3: Configure Load Options​

Write Disposition​

Schema Auto-detect​

File Format​

Step 4: Monitor Load Job Status​

On the Load Job Details Page​

Understanding Job Statuses​

Real-time Updates​

Viewing All Load Jobs​

Viewing Jobs for a Specific Table​

Step 5: Handle Errors​

Common Error Scenarios​

Schema Mismatch Errors​

Invalid Data Format Errors​

Concurrent Job Conflicts​

File Size Errors​

How to Fix and Retry​

Best Practices​

File Size Recommendations​

Schema Design Tips​

Performance Considerations​

Data Quality​

Troubleshooting​

Job Stuck in "Running" State​

Failed Jobs​

Concurrent Job Conflicts​

Next Steps​

Summary​

Prerequisites

Introduction

Supported File Formats

File Size Limits

Step 1: Prepare Your Data File

Format Requirements

Data Validation

Step 2: Upload Data to a Table

Method 1: Direct Upload (Recommended for files < 100MB)

Method 2: Presigned URL Upload (For large files or API integration)

Step 3: Configure Load Options

Write Disposition

Schema Auto-detect

File Format

Step 4: Monitor Load Job Status

On the Load Job Details Page

Understanding Job Statuses

Real-time Updates

Viewing All Load Jobs

Viewing Jobs for a Specific Table

Step 5: Handle Errors

Common Error Scenarios

Schema Mismatch Errors

Invalid Data Format Errors

Concurrent Job Conflicts

File Size Errors

How to Fix and Retry

Best Practices

File Size Recommendations

Schema Design Tips

Performance Considerations

Data Quality

Troubleshooting

Job Stuck in "Running" State

Failed Jobs

Concurrent Job Conflicts

Next Steps

Summary