Rethinking how AI systems could improve exchange data

A lighthearted image — but a serious topic.

From Human-Readable Formats to Machine-Native Protocols

In a recent study titled “Which Table Format Do LLMs Understand Best?”, Daniel Carcamo and the team at Improving Agents demonstrated something striking: the format you use to feed structured data to a large language model (LLM) can significantly influence both accuracy and cost.

They evaluated 1,000 synthetic employee records across 11 different data formats and found almost a 20 percentage-point difference in model accuracy purely based on representation. Their work provides one of the strongest empirical signals yet that how we encode data for machines matters, especially as AI systems become more autonomous.

With that research as inspiration and validation, I started to question a deeper assumption:
Are the formats we use today still appropriate when machines begin to communicate directly with each other?

The hidden cost of human-oriented data

Most of the formats we rely on today, JSON, CSV, YAML, Markdown, were all designed for humans.
They prioritise readability, not efficiency. That made sense when data moved between people and software. But when AI models and agents increasingly exchange information directly, these human-centric formats start to show their limits.

Every time an LLM reads or writes data, it pays a cost in tokens, parsing time, and ambiguity. Even lightweight structures like JSON introduce friction in a world that is supposed to move faster, not slower.

The conclusion is simple: readability for humans equals overhead for machines.

Introducing LLDX — Large Language Data eXchange

To explore this further, I’ve been working with ChatGPT-5 to prototype what a next-generation data format could look like, one created by AI, for AI. The result is a concept I call LLDX (Large Language Data eXchange): a minimal, schema-aware data representation optimised for LLM-to-service communication.

LLDX isn’t meant to replace JSON for traditional APIs. It’s designed for scenarios where AI systems, agents, and services exchange data directly — where token efficiency, precision, and interpretability matter more than human readability.

Unlike traditional formats, LLDX focuses on alignment with how LLMs process text. It’s compact, consistent, and context-aware, starting with a lightweight @context header that defines schema, meaning, and versioning. Each message can even be negotiated dynamically between systems — allowing both sides to agree on the most efficient encoding (text, gzip, or binary).

Example: from Markdown to LLDX

Traditional format

id: 3

name: Eve C2

age: 50

city: Singapore

department: Sales

salary: 102915

years_experience: 14

project_count: 11

LLDX equivalent

@context:

entity: Employee

schema: https://schemas.example.org/hr/employee/v1

encoding: LLDX-v2

fields: e(id),n(name),a(age),c(city),d(department),s(salary),y(years_experience),p(project_count)

data:

e=3;n="Eve C2";a=50;c="Singapore";d="Sales";s=102915;y=14;p=11

The result is semantically identical but roughly 60 % smaller (when handling large volumes), easier to parse in a single pass, and fully self-describing through its @context.

Why this matters

This shift from human-readable to machine-native data is more than an optimisation. It signals a transition toward machine-to-machine semantics, where systems don’t just exchange text, but negotiate meaning. As AI agents evolve, they’ll need to dynamically agree on structure, schema, and efficiency. Formats like LLDX point toward that future: one where communication is contextual, dynamic, and purpose-aware.

It’s not about abandoning human standards — it’s about acknowledging that machines now form their own network of understanding. And they deserve their own language.

Closing thoughts

When APIs were first designed, they were made for people, and that meant they were verbose. It started with XML and SOAP (we all remember the pain), then evolved into REST and JSON, followed by GraphQL and gRPC. Each step simplified and streamlined what came before, making data exchange faster and cleaner. The next step will be different: as machines begin designing for themselves, efficiency will no longer be an optimisation, it will be the very definition of elegance.

LLDX is a step in that direction, but it represents a larger change in how we think about data, communication, and reasoning in AI systems. We’re still exploring and testing, but if we get it right, it could redefine how the next generation of intelligent systems talk to each other.

Acknowledgments

A sincere thank-you to Daniel Carcamo and the Improving Agents team for their detailed research on LLM data formats. Their work validated much of the thinking behind LLDX. You can read their original post here: https://www.improvingagents.com/blog/best-input-data-format-for-llms

Disclaimer: Portions of this article were developed in collaboration with ChatGPT-5, an AI-assisted writing and reasoning tool. All interpretations and conclusions are my own.

Content Insights

Search This Blog