The origin
Indent Comma Format was created by Edison Williams to export and import structured hierarchical data in a Document Management System. Real invoices, real folders, real line items — data that is genuinely hierarchical, highly repetitive, and needs to move between databases, OCR pipelines and archives without bloat.
Existing formats each made a trade ICF wasn't willing to make. CSV is compact but flat. JSON and XML carry the hierarchy but repeat every key on every record. YAML is readable but loosely structured. ICF takes the best of each: declare the schema once, then store every record positionally — as compact as CSV, as readable as YAML, as hierarchical as JSON.
The specification was written with help from ChatGPT, and the reference icfj library was built with Claude Code.
Design goals
Human readable
Indentation for hierarchy; no brackets to balance.
Compact storage
No repeated field names — the schema holds them once.
Streamable records
Line-oriented and append-friendly for huge archives.
AI-efficient
Low token overhead for RAG and LLM contexts.
Schema-driven validation
Predictable structure that's easy to verify.
Git-friendly
Plain UTF-8 text that diffs cleanly.
The ecosystem
ICF is a small, focused ecosystem — an open specification and two faithful reference libraries.
- ICF v1.0 — the core serialization format. Stable.
- ICX v1.0 — the optional companion index for random access and integrity.
- icfj — the Java 11+ reference implementation and conformance authority.
- icfpy — a pure-Python port, behaviorally matched to icfj.
- Online validator — instant, private, in-browser structural checks.
When to use ICF — and when not to
ICF is best for structured business data where the schema is predefined and consistent: OCR extraction, invoice and ERP interchange, document archival, AI/RAG datasets and repetitive hierarchical records. It is intentionally schema-constrained — that constraint is what buys smaller files, faster parsing and predictable structure.
For highly dynamic or irregular data, formats such as JSON, XML or YAML may still be the better fit. ICF doesn't try to replace them everywhere — it wins decisively where data is hierarchical and repetitive.
License
The specification is © 2026 Edison Williams and licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt it, including commercially, provided you give appropriate credit.