Error 6: CANNOT_PARSE_TEXT

Tip

This error occurs when ClickHouse cannot parse text data according to the expected format. This typically happens during data imports from CSV, TSV, JSON, or other text-based formats when the data doesn't match the expected schema or contains malformed values.

Most common causes

Incorrect format specification
- Using CSV format for tab-delimited files (should use TSV)
- Format mismatch between actual data and declared format
- Wrong delimiter character specified
- Missing or incorrect escape characters
Malformed CSV/TSV data
- Missing delimiters (commas or tabs)
- Unescaped special characters in string fields
- Quotes not properly closed or escaped
- Embedded newlines without proper escaping
- Extra or missing columns compared to table schema
Data type mismatches
- String data in numeric columns
- Invalid date/datetime formats
- Values exceeding type boundaries (e.g., too large for Int32)
- Empty strings where numbers are expected
- Special characters in numeric fields
Character encoding issues
- UTF-8 encoding errors
- Byte order marks (BOM) at file beginning
- Invalid characters in string fields
- Mixed character encodings in the same file
Inconsistent data structure
- Variable number of columns per row
- Headers don't match data rows
- Schema inference fails with complex nested data
- Mixed data formats within same column

What to do when you encounter this error

1. Check the error message for specific details

The error message typically indicates:

Which row failed
Which column had the problem
What was expected vs. what was found
The actual parsed text that failed

Cannot parse input: expected ',' before: 'some_text': (at row 429980)
Row 429979: Column 8, name: blockspending, type: Int32, ERROR: text "<TAB><TAB>7027<TAB>181" is not like Int32

2. Verify the actual file format

# Check first few lines of your file
head -n 5 your_file.csv

# Check for tabs vs commas
cat your_file.csv | head -n 1 | od -c

# Check character encoding
file -i your_file.csv

3. Test with a small sample

-- Try parsing just the first few rows
SELECT * 
FROM file('sample.csv', 'CSV')
LIMIT 10;

-- Let ClickHouse infer the schema
DESCRIBE file('sample.csv', 'CSV');

4. Check logs for more details

SELECT
    event_time,
    query,
    exception
FROM system.query_log
WHERE exception_code = 6
  AND event_time > now() - INTERVAL 1 HOUR
ORDER BY event_time DESC
LIMIT 5;

Quick fixes

1. Use correct format for your data

-- For tab-delimited files
INSERT INTO table FROM INFILE 'file.tsv' FORMAT TSV;
-- Or
INSERT INTO table FROM INFILE 'file.tsv' FORMAT TSVWithNames;

-- For comma-delimited files
INSERT INTO table FROM INFILE 'file.csv' FORMAT CSV;
-- Or
INSERT INTO table FROM INFILE 'file.csv' FORMAT CSVWithNames;

2. Skip malformed rows

-- Skip specific number of bad rows
INSERT INTO table 
SELECT * FROM file('data.csv', 'CSV')
SETTINGS input_format_allow_errors_num = 100;

-- Skip percentage of bad rows
INSERT INTO table 
SELECT * FROM file('data.csv', 'CSV')
SETTINGS input_format_allow_errors_ratio = 0.1;  -- Allow 10% errors

3. Handle NULL values correctly

-- Treat empty fields as default values
SET input_format_null_as_default = 1;

-- For CSV specifically
SET input_format_csv_empty_as_default = 1;

-- Allow missing fields
SET input_format_skip_unknown_fields = 1;

4. Use custom delimiters for tab-delimited CSV

-- For tab-delimited data with CSV quoting
SET format_custom_escaping_rule = 'CSV';
SET format_custom_field_delimiter = '\x09';  -- Tab character

INSERT INTO table FROM INFILE 'data.tsv' FORMAT CustomSeparated;

5. Specify schema explicitly

-- Instead of relying on schema inference
SELECT * FROM file(
    'data.csv',
    'CSV',
    'id UInt64, name String, date Date, value Float64'
);

Common specific scenarios

Scenario 1: CSV expected comma but found tab

Cannot parse input: expected ',' before: '<TAB><TAB>7027<TAB>181'

Cause: File is actually tab-delimited (TSV) but being read as CSV.

Solution:

-- Use TSV format instead
INSERT INTO table FROM INFILE 'file.tsv' FORMAT TSVWithNames;

-- Or if you must use CSV-style quoting with tabs
SET format_custom_escaping_rule = 'CSV';
SET format_custom_field_delimiter = '\x09';
INSERT INTO table FROM INFILE 'file.tsv' FORMAT CustomSeparated;

Scenario 2: Malformed string with embedded delimiters

Cannot parse input: expected '\t' before: 'I49d(I\""\t\t\t13\t1350000'

Cause: String field contains delimiter characters (tabs, commas) and special characters that aren't properly escaped or quoted.

Solution:

-- Use CSV-style escaping for tab-delimited data
SET format_custom_escaping_rule = 'CSV';
SET format_custom_field_delimiter = '\x09';

-- Allow errors in problematic rows
SET input_format_allow_errors_num = 100;

INSERT INTO table FROM INFILE 'file.tsv' FORMAT CustomSeparated;

Scenario 3: Syntax error at unexpected position

Syntax error: failed at position 1 ('85c59771') (line 1, col 1): 85c59771-ae5d-4a53-9eed...

Cause: Wrong format specified - file is TSV but being read as CSV.

Solution:

-- Check actual delimiter in file
-- If you see wide spacing, it's likely tabs not commas

-- Use TSV instead of CSV
SELECT * FROM file('data.tsv', 'TSVWithNames');

Scenario 4: Cannot parse decimal type from Parquet

Cannot parse type Decimal(76, 38), expected non-empty binary data with size equal to or less than 32, got 36

Cause: Decimal precision in Parquet file exceeds ClickHouse maximum (Decimal256 max precision is 76, but internal representation limits apply).

Solution:

-- Read as String first, then convert
SELECT
    CAST(decimal_col AS Decimal(38, 10)) AS decimal_col
FROM file('data.parquet', 'Parquet', 'decimal_col String, ...');

-- Or use Double for very large values
SELECT
    toFloat64(decimal_col) AS decimal_col  
FROM file('data.parquet', 'Parquet', 'decimal_col String, ...');

Scenario 5: Schema inference fails on complex data

The table structure cannot be extracted from a JSONEachRow format file

Cause: File is empty, inaccessible, or schema inference can't determine structure from sample.

Solution:

-- Increase bytes read for schema inference
SET input_format_max_bytes_to_read_for_schema_inference = 999999999;

-- Or specify schema manually
SELECT * FROM s3(
    'https://bucket/file.json',
    'JSONEachRow',
    'id UInt64, name String, data String'
);

Prevention best practices

Validate data format before importing

# Check actual delimiter
head -n 1 file.csv | od -c

# Verify consistent column count
awk -F',' 'NR==1{cols=NF} NF!=cols{print "Line " NR " has " NF " columns"}' file.csv

# Check for encoding issues
file -i file.csv

Use appropriate format for your data
- CSV: Comma-delimited with optional CSV-style quoting
- TSV/TabSeparated: Tab-delimited, no quoting
- TSVWithNames: Tab-delimited with header row
- CustomSeparated: Custom delimiter with CSV-style quoting

Test with small samples first

-- Test schema inference
DESCRIBE file('sample.csv', 'CSV');

-- Test parsing first 100 rows
SELECT * FROM file('sample.csv', 'CSV') LIMIT 100;

Specify schemas explicitly for production

-- Don't rely on inference for critical imports
SELECT * FROM file(
    'data.csv',
    'CSV',  
    'id UInt64, timestamp DateTime, value Float64, status String'
);

Use settings to handle imperfect data

-- Common settings for dealing with real-world data
SET input_format_allow_errors_ratio = 0.01;        -- Allow 1% errors
SET input_format_null_as_default = 1;               -- Empty = default
SET input_format_skip_unknown_fields = 1;           -- Ignore extra fields
SET input_format_csv_empty_as_default = 1;          -- Empty CSV fields = default

Monitor parsing errors

-- Set up monitoring query
SELECT
    count() AS error_count,
    any(exception) AS sample_error
FROM system.query_log
WHERE exception_code = 6
  AND event_time > now() - INTERVAL 1 DAY;

-- Error handling
SET input_format_allow_errors_num = 100;              -- Skip N bad rows
SET input_format_allow_errors_ratio = 0.1;            -- Skip up to 10% bad rows

-- NULL and default handling  
SET input_format_null_as_default = 1;                 -- NULL becomes default value
SET input_format_csv_empty_as_default = 1;            -- Empty CSV field = default
SET input_format_skip_unknown_fields = 1;             -- Ignore extra columns

-- Schema inference
SET input_format_max_bytes_to_read_for_schema_inference = 1000000;
SET schema_inference_make_columns_nullable = 0;       -- Don't infer Nullable types

-- CSV-specific
SET format_csv_delimiter = ',';                       -- CSV delimiter
SET format_csv_allow_single_quotes = 1;               -- Allow single quotes
SET format_csv_allow_double_quotes = 1;               -- Allow double quotes

-- Custom format
SET format_custom_escaping_rule = 'CSV';              -- Use CSV escaping
SET format_custom_field_delimiter = '\x09';           -- Tab delimiter

-- Date/time parsing
SET date_time_input_format = 'best_effort';           -- Flexible date parsing

Debugging tips

-- 1. Check what ClickHouse sees in the problematic row
SELECT * FROM file('data.csv', 'CSV')
WHERE rowNumberInAllBlocks() = 429980;  -- The row number from error

-- 2. Examine the raw bytes
SELECT hex(column_name) FROM file('data.csv', 'CSV', 'column_name String')
LIMIT 10;

-- 3. Test different formats
SELECT * FROM file('data.txt', 'TSV') LIMIT 5;
SELECT * FROM file('data.txt', 'CSV') LIMIT 5;
SELECT * FROM file('data.txt', 'CSVWithNames') LIMIT 5;

-- 4. Use LineAsString to see raw data
SELECT * FROM file('data.csv', 'LineAsString') LIMIT 10;

Most common causes​

What to do when you encounter this error​

Quick fixes​

Common specific scenarios​

Prevention best practices​

Related settings​

Debugging tips​

Most common causes

What to do when you encounter this error

Quick fixes

Common specific scenarios

Prevention best practices

Related settings

Debugging tips