About — Indent Comma Format (ICF)

The origin

Indent Comma Format was created by Edison Williams to export and import structured hierarchical data in a Document Management System. Real invoices, real folders, real line items — data that is genuinely hierarchical, highly repetitive, and needs to move between databases, OCR pipelines and archives without bloat.

Existing formats each made a trade ICF wasn't willing to make. CSV is compact but flat. JSON and XML carry the hierarchy but repeat every key on every record. YAML is readable but loosely structured. ICF takes the best of each: declare the schema once, then store every record positionally — as compact as CSV, as readable as YAML, as hierarchical as JSON.

The specification was written with help from ChatGPT, and the reference icfj library was built with Claude Code.

Design goals

✓

Human readable

Indentation for hierarchy; no brackets to balance.

✓

Compact storage

No repeated field names — the schema holds them once.

✓

Streamable records

Line-oriented and append-friendly for huge archives.

✓

AI-efficient

Low token overhead for RAG and LLM contexts.

✓

Schema-driven validation

Predictable structure that's easy to verify.

✓

Git-friendly

Plain UTF-8 text that diffs cleanly.

The ecosystem

ICF is a small, focused ecosystem — an open specification and two faithful reference libraries.

ICF v1.1 — the core serialization language. Stable (v1.0 remains available here).
ICX v1.2 — the optional companion index for random access, integrity, and tag-based search (older: v1.1, v1.0).
icfj — the Java 11+ reference implementation and conformance authority.
icfpy — a pure-Python port, behaviorally matched to icfj.
icf.js — a zero-dependency TypeScript port for the browser and Node.
Online validator — instant, private, in-browser structural checks.

When to use ICF — and when not to

ICF is best for structured business data where the schema is predefined and consistent: OCR extraction, invoice and ERP interchange, document archival, AI/RAG datasets and repetitive hierarchical records. It is intentionally schema-constrained — that constraint is what buys smaller files, faster parsing and predictable structure.

For highly dynamic or irregular data, formats such as JSON, XML or YAML may still be the better fit. ICF doesn't try to replace them everywhere — it wins decisively where data is hierarchical and repetitive.

License

The specification is © 2026 Edison Williams and licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt it, including commercially, provided you give appropriate credit.

The reference libraries — icfj (Java), icfpy (Python) and icf.js (JavaScript) — are licensed separately under the permissive MIT License, so you can embed them freely in commercial or closed-source software. In short: spec is CC BY 4.0, code is MIT.

Get involved. The specification and all three libraries are developed in the open at github.com/icformat. Issues, conformance tests and ports to new languages are welcome.

A format born from real documents

The origin

Design goals

Human readable

Compact storage

Streamable records

AI-efficient

Schema-driven validation

Git-friendly

The ecosystem

When to use ICF — and when not to

License