About

A format born from real documents

ICF was created to export and import structured, hierarchical data in a Document Management System — and grew into an open specification for the AI era.

The origin

Indent Comma Format was created by Edison Williams to export and import structured hierarchical data in a Document Management System. Real invoices, real folders, real line items — data that is genuinely hierarchical, highly repetitive, and needs to move between databases, OCR pipelines and archives without bloat.

Existing formats each made a trade ICF wasn't willing to make. CSV is compact but flat. JSON and XML carry the hierarchy but repeat every key on every record. YAML is readable but loosely structured. ICF takes the best of each: declare the schema once, then store every record positionally — as compact as CSV, as readable as YAML, as hierarchical as JSON.

The specification was written with help from ChatGPT, and the reference icfj library was built with Claude Code.

Design goals

Human readable

Indentation for hierarchy; no brackets to balance.

Compact storage

No repeated field names — the schema holds them once.

Streamable records

Line-oriented and append-friendly for huge archives.

AI-efficient

Low token overhead for RAG and LLM contexts.

Schema-driven validation

Predictable structure that's easy to verify.

Git-friendly

Plain UTF-8 text that diffs cleanly.

The ecosystem

ICF is a small, focused ecosystem — an open specification and two faithful reference libraries.

  • ICF v1.0 — the core serialization format. Stable.
  • ICX v1.0 — the optional companion index for random access and integrity.
  • icfj — the Java 11+ reference implementation and conformance authority.
  • icfpy — a pure-Python port, behaviorally matched to icfj.
  • Online validator — instant, private, in-browser structural checks.

When to use ICF — and when not to

ICF is best for structured business data where the schema is predefined and consistent: OCR extraction, invoice and ERP interchange, document archival, AI/RAG datasets and repetitive hierarchical records. It is intentionally schema-constrained — that constraint is what buys smaller files, faster parsing and predictable structure.

For highly dynamic or irregular data, formats such as JSON, XML or YAML may still be the better fit. ICF doesn't try to replace them everywhere — it wins decisively where data is hierarchical and repetitive.

License

The specification is © 2026 Edison Williams and licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt it, including commercially, provided you give appropriate credit.

Get involved. The specification and both libraries are developed in the open at github.com/icformat. Issues, conformance tests and ports to new languages are welcome.