Error 6: CANNOT_PARSE_TEXT
Tip
This error occurs when ClickHouse cannot parse text data according to the expected format.
This typically happens during data imports from CSV, TSV, JSON, or other text-based formats when the data doesn't match the expected schema or contains malformed values.
Most common causes
-
Incorrect format specification
- Using CSV format for tab-delimited files (should use TSV)
- Format mismatch between actual data and declared format
- Wrong delimiter character specified
- Missing or incorrect escape characters
-
Malformed CSV/TSV data
- Missing delimiters (commas or tabs)
- Unescaped special characters in string fields
- Quotes not properly closed or escaped
- Embedded newlines without proper escaping
- Extra or missing columns compared to table schema
-
Data type mismatches
- String data in numeric columns
- Invalid date/datetime formats
- Values exceeding type boundaries (e.g., too large for Int32)
- Empty strings where numbers are expected
- Special characters in numeric fields
-
Character encoding issues
- UTF-8 encoding errors
- Byte order marks (BOM) at file beginning
- Invalid characters in string fields
- Mixed character encodings in the same file
-
Inconsistent data structure
- Variable number of columns per row
- Headers don't match data rows
- Schema inference fails with complex nested data
- Mixed data formats within same column
What to do when you encounter this error
1. Check the error message for specific details
The error message typically indicates:
- Which row failed
- Which column had the problem
- What was expected vs. what was found
- The actual parsed text that failed
Cannot parse input: expected ',' before: 'some_text': (at row 429980)
Row 429979: Column 8, name: blockspending, type: Int32, ERROR: text "<TAB><TAB>7027<TAB>181" is not like Int32
2. Verify the actual file format
# Check first few lines of your file
head -n 5 your_file.csv
# Check for tabs vs commas
cat your_file.csv | head -n 1 | od -c
# Check character encoding
file -i your_file.csv
3. Test with a small sample
-- Try parsing just the first few rows
SELECT *
FROM file('sample.csv', 'CSV')
LIMIT 10;
-- Let ClickHouse infer the schema
DESCRIBE file('sample.csv', 'CSV');
4. Check logs for more details
SELECT
event_time,
query,
exception
FROM system.query_log
WHERE exception_code = 6
AND event_time > now() - INTERVAL 1 HOUR
ORDER BY event_time DESC
LIMIT 5;
Quick fixes
1. Use correct format for your data
-- For tab-delimited files
INSERT INTO table FROM INFILE 'file.tsv' FORMAT TSV;
-- Or
INSERT INTO table FROM INFILE 'file.tsv' FORMAT TSVWithNames;
-- For comma-delimited files
INSERT INTO table FROM INFILE 'file.csv' FORMAT CSV;
-- Or
INSERT INTO table FROM INFILE 'file.csv' FORMAT CSVWithNames;
2. Skip malformed rows
-- Skip specific number of bad rows
INSERT INTO table
SELECT * FROM file('data.csv', 'CSV')
SETTINGS input_format_allow_errors_num = 100;
-- Skip percentage of bad rows
INSERT INTO table
SELECT * FROM file('data.csv', 'CSV')
SETTINGS input_format_allow_errors_ratio = 0.1; -- Allow 10% errors
3. Handle NULL values correctly
-- Treat empty fields as default values
SET input_format_null_as_default = 1;
-- For CSV specifically
SET input_format_csv_empty_as_default = 1;
-- Allow missing fields
SET input_format_skip_unknown_fields = 1;
4. Use custom delimiters for tab-delimited CSV
-- For tab-delimited data with CSV quoting
SET format_custom_escaping_rule = 'CSV';
SET format_custom_field_delimiter = '\x09'; -- Tab character
INSERT INTO table FROM INFILE 'data.tsv' FORMAT CustomSeparated;
5. Specify schema explicitly
-- Instead of relying on schema inference
SELECT * FROM file(
'data.csv',
'CSV',
'id UInt64, name String, date Date, value Float64'
);
Common specific scenarios
Scenario 1: CSV expected comma but found tab
Cannot parse input: expected ',' before: '<TAB><TAB>7027<TAB>181'
Cause: File is actually tab-delimited (TSV) but being read as CSV.
Solution:
-- Use TSV format instead
INSERT INTO table FROM INFILE 'file.tsv' FORMAT TSVWithNames;
-- Or if you must use CSV-style quoting with tabs
SET format_custom_escaping_rule = 'CSV';
SET format_custom_field_delimiter = '\x09';
INSERT INTO table FROM INFILE 'file.tsv' FORMAT CustomSeparated;
Scenario 2: Malformed string with embedded delimiters
Cannot parse input: expected '\t' before: 'I49d(I\""\t\t\t13\t1350000'
Cause: String field contains delimiter characters (tabs, commas) and special characters that aren't properly escaped or quoted.
Solution:
-- Use CSV-style escaping for tab-delimited data
SET format_custom_escaping_rule = 'CSV';
SET format_custom_field_delimiter = '\x09';
-- Allow errors in problematic rows
SET input_format_allow_errors_num = 100;
INSERT INTO table FROM INFILE 'file.tsv' FORMAT CustomSeparated;
Scenario 3: Syntax error at unexpected position
Syntax error: failed at position 1 ('85c59771') (line 1, col 1): 85c59771-ae5d-4a53-9eed...
Cause: Wrong format specified - file is TSV but being read as CSV.
Solution:
-- Check actual delimiter in file
-- If you see wide spacing, it's likely tabs not commas
-- Use TSV instead of CSV
SELECT * FROM file('data.tsv', 'TSVWithNames');
Scenario 4: Cannot parse decimal type from Parquet
Cannot parse type Decimal(76, 38), expected non-empty binary data with size equal to or less than 32, got 36
Cause: Decimal precision in Parquet file exceeds ClickHouse maximum (Decimal256 max precision is 76, but internal representation limits apply).
Solution:
-- Read as String first, then convert
SELECT
CAST(decimal_col AS Decimal(38, 10)) AS decimal_col
FROM file('data.parquet', 'Parquet', 'decimal_col String, ...');
-- Or use Double for very large values
SELECT
toFloat64(decimal_col) AS decimal_col
FROM file('data.parquet', 'Parquet', 'decimal_col String, ...');
Scenario 5: Schema inference fails on complex data
The table structure cannot be extracted from a JSONEachRow format file
Cause: File is empty, inaccessible, or schema inference can't determine structure from sample.
Solution:
-- Increase bytes read for schema inference
SET input_format_max_bytes_to_read_for_schema_inference = 999999999;
-- Or specify schema manually
SELECT * FROM s3(
'https://bucket/file.json',
'JSONEachRow',
'id UInt64, name String, data String'
);
Prevention best practices
-
Validate data format before importing
# Check actual delimiter
head -n 1 file.csv | od -c
# Verify consistent column count
awk -F',' 'NR==1{cols=NF} NF!=cols{print "Line " NR " has " NF " columns"}' file.csv
# Check for encoding issues
file -i file.csv
-
Use appropriate format for your data
- CSV: Comma-delimited with optional CSV-style quoting
- TSV/TabSeparated: Tab-delimited, no quoting
- TSVWithNames: Tab-delimited with header row
- CustomSeparated: Custom delimiter with CSV-style quoting
-
Test with small samples first
-- Test schema inference
DESCRIBE file('sample.csv', 'CSV');
-- Test parsing first 100 rows
SELECT * FROM file('sample.csv', 'CSV') LIMIT 100;
-
Specify schemas explicitly for production
-- Don't rely on inference for critical imports
SELECT * FROM file(
'data.csv',
'CSV',
'id UInt64, timestamp DateTime, value Float64, status String'
);
-
Use settings to handle imperfect data
-- Common settings for dealing with real-world data
SET input_format_allow_errors_ratio = 0.01; -- Allow 1% errors
SET input_format_null_as_default = 1; -- Empty = default
SET input_format_skip_unknown_fields = 1; -- Ignore extra fields
SET input_format_csv_empty_as_default = 1; -- Empty CSV fields = default
-
Monitor parsing errors
-- Set up monitoring query
SELECT
count() AS error_count,
any(exception) AS sample_error
FROM system.query_log
WHERE exception_code = 6
AND event_time > now() - INTERVAL 1 DAY;
-- Error handling
SET input_format_allow_errors_num = 100; -- Skip N bad rows
SET input_format_allow_errors_ratio = 0.1; -- Skip up to 10% bad rows
-- NULL and default handling
SET input_format_null_as_default = 1; -- NULL becomes default value
SET input_format_csv_empty_as_default = 1; -- Empty CSV field = default
SET input_format_skip_unknown_fields = 1; -- Ignore extra columns
-- Schema inference
SET input_format_max_bytes_to_read_for_schema_inference = 1000000;
SET schema_inference_make_columns_nullable = 0; -- Don't infer Nullable types
-- CSV-specific
SET format_csv_delimiter = ','; -- CSV delimiter
SET format_csv_allow_single_quotes = 1; -- Allow single quotes
SET format_csv_allow_double_quotes = 1; -- Allow double quotes
-- Custom format
SET format_custom_escaping_rule = 'CSV'; -- Use CSV escaping
SET format_custom_field_delimiter = '\x09'; -- Tab delimiter
-- Date/time parsing
SET date_time_input_format = 'best_effort'; -- Flexible date parsing
Debugging tips
-- 1. Check what ClickHouse sees in the problematic row
SELECT * FROM file('data.csv', 'CSV')
WHERE rowNumberInAllBlocks() = 429980; -- The row number from error
-- 2. Examine the raw bytes
SELECT hex(column_name) FROM file('data.csv', 'CSV', 'column_name String')
LIMIT 10;
-- 3. Test different formats
SELECT * FROM file('data.txt', 'TSV') LIMIT 5;
SELECT * FROM file('data.txt', 'CSV') LIMIT 5;
SELECT * FROM file('data.txt', 'CSVWithNames') LIMIT 5;
-- 4. Use LineAsString to see raw data
SELECT * FROM file('data.csv', 'LineAsString') LIMIT 10;