Opinion / Technical·9 min read·March 2026

From Forms to Voice: The Deeper Inclusive Transition

By Alex Nwoko

I've spent years building reporting systems where 71 organizations submit 244 reports a month, tracking 3.8 million services across 1,322 locations. The data is powerful. But every number started as a human observation that had to survive a form, a cleaning process, and a monthly reporting cycle before it became actionable.

And I say that as someone who built the forms.

Every humanitarian data form is an act of pre-judgment. Someone in a capital city decides which questions matter, which response options exist, which categories are worth tracking. The beneficiary's job is to fit their reality into those boxes. The field officer's job is to translate what they see into pre-coded categories. The analyst's job is to aggregate those categories into something meaningful.

At every step, context is lost. Nuance is stripped. The original observation is compressed into something our systems can process — not something that reflects what actually happened.

What Forms Cost Us

A form asks "Was the assistance adequate?" — Yes or No. But a displaced woman in northeast Nigeria doesn't think in yes or no. She thinks: "The rice came but it was half of what we needed, my daughter is sick and there's no medicine at the clinic, and I'm afraid to go to the distribution point alone."

None of that fits a checkbox. We did the best we could with the tools we had. But we must also acknowledge how structurally inadequate those tools were for understanding the real needs of the most vulnerable. The humanitarian agenda was designed to centre affected voices. Our data infrastructure has been doing the opposite — encoding their realities into categories we find convenient to analyse.

Even qualitative methods — the approach we trust to preserve nuance — pass through layers of interpretation. An enumerator translates. A researcher codes themes. An analyst writes findings. The original intent of the person who spoke has been reshaped at least three times before it informs a decision.

I've conducted several Key Informant Interviews in my humanitarian career, and during the COVID-19 pandemic, I led secondary data analysis using the DEEP platform with several steps of workflow designed to reduce cognitive bias. The rigour was real. But the original voices of affected populations were still mediated through documents written about them, not by them.

Voice-to-Schema: The Technical Shift

One spoken sentence — "Borehole contaminated in Ward 7, cholera cases rising, we need ORS supplies by Thursday" — contains six structured data points. Location: Ward 7. Infrastructure affected: borehole. Status: contaminated. Health impact: cholera. Need: ORS supplies. Urgency: Thursday.

No form needed. Voice-to-schema AI handles the extraction, classification, and structuring automatically. The original recording remains as the auditable source of truth — something no form-based system has ever provided.

Modern voice AI doesn't just transcribe. It extracts entities, classifies urgency, detects sentiment, geo-tags references, and maps speech into analytical frameworks. It does this in real time, at scale, for under a cent per interaction.

The same information that takes a reporting officer 30 minutes to enter into a form takes 30 seconds to speak. Multiply that across 200+ organizations and thousands of field workers, and you're looking at a fundamental acceleration of the evidence generation pipeline.

But the real gain isn't speed — it's fidelity. Voice captures what forms can't: emphasis, uncertainty, urgency, context. When a health worker says "cholera cases rising" with alarm in her voice, that urgency is data. A checkbox marked "health concern" strips all of that away.

The Question Isn't Whether — It's Who Goes First

Voice AI VC investment surged 7x in two years. About 78% of businesses are deploying it. The voice AI market crossed $22 billion. Cost per query: under $0.01. The commercial sector has already moved.

The humanitarian sector hasn't. Not because the technology doesn't work — but because our institutional architecture is built around forms. Our M&E frameworks assume structured questionnaires. Our databases assume tabular data. Our quality assurance processes assume manual review of coded responses.

The question isn't whether voice replaces humanitarian forms. It's who redesigns and leverages their voice data pipeline first. The first mover advantage here isn't about technology — it's about evidence quality. The organization that builds voice-native data collection will generate richer, more timely, more inclusive evidence than any competitor still running on forms.

After a decade of building platforms that run on forms, and working within the limitations of form-based data systems, I'm now building the ones that run on voice. The form served us well. But its time is over.

Share this post