Loading Data to Tables
This tutorial will guide you through uploading data files and loading them into your DataCovey tables using batch load jobs. You'll learn how to prepare your data, configure load options, monitor job progress, and handle errors.
Prerequisites
- A DataCovey account with appropriate permissions
- A table created in DataCovey (see Getting Started if you need to create one)
- A data file in a supported format (CSV, JSON, Avro, Parquet, or ORC)
Introduction
Load jobs allow you to upload data files and import them into DataCovey tables efficiently. This is the recommended way to:
- Initial data import: Load your first dataset into a new table
- Bulk updates: Add large amounts of data to existing tables
- Data backfills: Import historical data in batches
- Regular data refreshes: Update tables with new data files
Load jobs are processed asynchronously by DataCovey, making them ideal for large files. You can monitor job progress in real-time and receive notifications when jobs complete.
Supported File Formats
DataCovey supports the following file formats for load jobs:
- CSV: Comma-separated values (
.csv) - JSON: Newline-delimited JSON (
.json,.ndjson,.jsonl) - AVRO: Apache Avro format (
.avro) - PARQUET: Apache Parquet format (
.parquet,.pqt) - ORC: Optimized Row Columnar format (
.orc)
File Size Limits
- Maximum file size: 100MB for direct upload
- For larger files, consider using the presigned URL upload method (available via API)
Step 1: Prepare Your Data File
Before uploading, ensure your data file meets these requirements:
Format Requirements
CSV files:
- First row should contain column headers (optional but recommended)
- Use commas as delimiters
- Enclose values containing commas in quotes
- Use UTF-8 encoding
JSON files (newline-delimited):
- Each line must be a valid JSON object
- Each object should represent one row
- Example format:
{"user_id": "123", "email": "user@example.com", "amount": 100.50}
{"user_id": "456", "email": "user2@example.com", "amount": 200.75}
Avro, Parquet, ORC files:
- Must be valid files in their respective formats
- Schema should match or be compatible with the target table schema
Data Validation
- Ensure data types match the table schema
- Required fields (REQUIRED mode) must have values
- Date and timestamp formats should match DataCovey expectations
- Numeric values should be valid numbers
Step 2: Upload Data to a Table
There are two ways to upload data:
Method 1: Direct Upload (Recommended for files < 100MB)
- Navigate to the Tables page and select the table you want to load data into
- Click the "Upload Data" button on the table details page
- In the upload dialog:
- Click "Choose File" and select your data file
- The file format will be auto-detected from the file extension
- Configure load options (see Step 3)
- Click "Upload & Start Load Job"
- You'll be redirected to the load job details page to monitor progress
Method 2: Presigned URL Upload (For large files or API integration)
For files larger than 100MB or programmatic uploads:
- Use the API endpoint
POST /load_job/without a file to get a presigned URL - Upload your file directly to the provided GCS URL
- Complete the job using
POST /load_job/{job_id}/upload
See the API documentation for details on presigned URL uploads.
Step 3: Configure Load Options
When uploading a file, you can configure several options:
Write Disposition
Controls how existing data in the table is handled:
-
Append (
WRITE_APPEND): Adds new rows to existing data (default)- Use this for regular data updates
- Best for incremental data loads
-
Overwrite (
WRITE_TRUNCATE): Replaces all existing data- Use this to completely refresh a table
- Warning: This will delete all existing data in the table
-
Write Empty (
WRITE_EMPTY): Only loads if the table is empty- Use this to ensure you don't accidentally overwrite existing data
- Job will fail if the table already contains data
Schema Auto-detect
When enabled, DataCovey will automatically detect the schema from your file and replace the existing table schema.
When to use:
- Loading data into a new, empty table
- You want DataCovey to infer column types from the data
- The file structure matches your desired schema
When NOT to use:
- Table already has a defined schema you want to keep
- You need specific data types or modes (REQUIRED, NULLABLE)
- You want to preserve private column settings
Important: Schema auto-detect will replace the entire table schema. Any existing schema definition will be lost.
File Format
The file format is usually auto-detected from the file extension. You can manually select a format if needed:
- CSV
- JSON (Newline-delimited)
- AVRO
- PARQUET
- ORC
Step 4: Monitor Load Job Status
After starting a load job, you can monitor its progress in several ways:
On the Load Job Details Page
- After uploading, you'll be automatically redirected to the load job details page
- The page shows:
- Current Status: Pending, Running, Completed, or Failed
- Job Information: Job ID, table, job ID
- File Information: Source URI, file format
- Configuration: Write disposition, schema auto-detect setting
- Statistics: Rows loaded, bytes processed, duration
- Error Details: If the job failed
Understanding Job Statuses
- Pending: Job created, waiting to start
- Running: DataCovey is processing the load job
- Completed: Job finished successfully
- Failed: Job encountered an error
Real-time Updates
- The job details page automatically refreshes every 5 seconds while the job is active
- You'll receive a browser notification when the job completes
- The page will update automatically when the status changes
Viewing All Load Jobs
- Navigate to Load Jobs in the Data section of the sidebar
- View all load jobs across all your tables
- Filter and search to find specific jobs
- Click on any job to view details
Viewing Jobs for a Specific Table
- Navigate to the table details page
- Scroll to the "Load Jobs" section
- Click to expand and view recent load jobs for that table
- Click on any job to view full details
Step 5: Handle Errors
If a load job fails, here's how to diagnose and fix the issue:
Common Error Scenarios
Schema Mismatch Errors
Error: "Schema mismatch" or "Field type mismatch"
Cause: Data types in your file don't match the table schema
Solution:
- Review the error details on the job details page
- Check which columns have type mismatches
- Update your data file to match the expected types:
- Convert strings to numbers if the column expects NUMERIC
- Ensure dates are in the correct format
- Check that required fields have values
- Retry the load job
Invalid Data Format Errors
Error: "Invalid JSON" or "CSV parsing error"
Cause: File format is incorrect or corrupted
Solution:
- Verify your file is in the correct format
- For JSON files, ensure each line is valid JSON
- For CSV files, check for unclosed quotes or special characters
- Re-export or regenerate your file if needed
Concurrent Job Conflicts
Error: "Another load job is already running for this table"
Cause: Only one load job can run per table at a time
Solution:
- Wait for the current job to complete
- Or cancel the existing job if needed
- Then retry your upload
File Size Errors
Error: "File size exceeds limit"
Cause: File is larger than 100MB (for direct upload)
Solution:
- Split your file into smaller chunks
- Or use the presigned URL upload method via API for larger files
How to Fix and Retry
-
Review Error Details:
- Go to the load job details page
- Scroll to the error section
- Expand error details to see the full error message
-
Fix the Issue:
- Correct your data file based on the error message
- Ensure data matches the table schema
- Verify file format is correct
-
Create a New Load Job:
- Navigate back to the table details page
- Click "Upload Data" again
- Upload the corrected file
Best Practices
File Size Recommendations
- Small files (< 10MB): Direct upload works well
- Medium files (10-100MB): Direct upload is fine, but may take a few minutes
- Large files (> 100MB): Use presigned URL upload via API or split into multiple files
Schema Design Tips
- Define schemas explicitly: Don't rely on auto-detect for production tables
- Use appropriate data types: Choose the most specific type (e.g., INT64 vs NUMERIC)
- Set required fields carefully: Only mark fields as REQUIRED if they always have values
- Document your schema: Add descriptions to help others understand your data
Performance Considerations
- Batch multiple files: If you have many small files, consider combining them
- Use compressed formats: Avro and Parquet are more efficient for large datasets
- Schedule regular loads: Set up a process to load data on a schedule
- Monitor job history: Review past jobs to identify patterns or issues
Data Quality
- Validate before upload: Check your data file for common issues
- Test with small samples: Upload a small test file first
- Review statistics: Check rows loaded and bytes processed match expectations
- Verify data after load: Run a query to confirm data loaded correctly
Troubleshooting
Job Stuck in "Running" State
If a job appears stuck:
- Wait a bit longer: Large files can take time to process
- Check using queries directly: The job may have completed but status hasn't updated, but queries may work .
- Refresh the page: Sometimes the UI needs a manual refresh
- Contact support: If a job has been running for more than an hour, contact support
Failed Jobs
If a job fails:
- Read the error message: The job details page shows the specific error
- Check the error details section: Expand it to see full error information
- Review your data file: Look for the issues mentioned in the error
- Fix and retry: Correct the issue and create a new load job
Concurrent Job Conflicts
If you see a conflict error:
- Check existing jobs: Go to the Load Jobs page and filter by your table
- Wait for completion: Let the current job finish
- Or cancel if needed: If the existing job is stuck, you may need to contact support
Next Steps
After successfully loading data:
- Query your data: Use the Queries page to run SQL queries on your loaded data
- Export results: Export query results in various formats
- Set up regular loads: Consider automating data loads for regular updates
- Explore the API: Use the API for programmatic data loading
Summary
In this tutorial, you learned how to:
- ✅ Prepare data files in supported formats
- ✅ Upload files directly through the web interface
- ✅ Configure load options (write disposition, schema auto-detect)
- ✅ Monitor load job status in real-time
- ✅ Handle common errors and retry failed jobs
- ✅ Follow best practices for data loading
You're now ready to load data into your DataCovey tables efficiently!