How it works.

MachineContext is a deterministic pipeline that transforms noisy company signals into verified, cacheable identity objects—and powers semantic search, similarity mapping, and change detection on top.

Architecture

Three stages. Zero guessing.

1

Resolution

Accept any input: company names, domains, typos, aliases. Normalize and attempt canonical resolution. If multiple candidates exist with similar confidence, return AMBIGUOUS with ranked options.

Without this: 47 CRM duplicates. Leads routed to dead accounts. Agents acting on wrong entities.
// Input variations
"Stripe" → stripe.com
"stripe inc" → stripe.com
"strpie" → stripe.com (typo corrected)
"STRIPE PAYMENTS COMPANY" → stripe.com
✓ 4 inputs → 1 canonical entity
2

Extraction

Polite, controlled crawls acquire structured signals from authoritative pages. We prioritize identity signals (Schema.org, Open Graph, legal disclosures) over raw content. Redirect chains are traced. DNS records are verified.

Without this: Scraped data from random pages. No provenance. Stale information mixed with current.
// Signal sources for stripe.com
Schema.orgOrganization: Stripe, Inc.
OG:site_nameStripe
DNS TXTDMARC, SPF verified
Redirect chainstripe.com → www (canonical)
3

Verification

Confidence is computed from signal multiplicity and consistency. Conflicting signals reduce confidence. Missing signals are explicit gaps, not assumptions. Every output includes provenance and freshness timestamps.

Without this: Every resolution is equally "confident." Until it's catastrophically wrong.
// Confidence computation
0.99All signals align
0.72Some signals missing
0.40Signals conflict → AMBIGUOUS

API Surface

Five endpoints. One truth layer.

Every API is powered by the same verified data pipeline. Different interfaces for different problems.

/resolve

POST

Noisy input → canonical entity. Returns domain, confidence, aliases, or AMBIGUOUS with candidates.

Uses: Resolution → Extraction → Verification
{ }

/brand/:id

GET

Full verified object for a known entity. Domain, legal name, description, aliases, confidence, freshness.

Uses: Extraction → Verification (cached)

/search

GET

Semantic discovery by intent. "AI code review tools" returns ranked companies based on self-description.

Uses: Extraction → Embeddings → Vector search

/similar

GET

Find competitors and alternatives. "Companies like Stripe" returns Adyen, Square, Checkout.com.

Uses: Brand embeddings → Cosine similarity
Coming Soon
Δ

/changes

Detect rebrands, domain migrations, and acquisitions. Subscribe to a watchlist, get notified when things change.

Uses: Continuous Extraction → Diff detection

Confidence Scoring

Confidence isn't a feeling. It's computed.

Computed from signal multiplicity. More signals = higher confidence. Conflicting signals = reduced. Missing signals = explicit uncertainty.

0.95+
High confidence
Safe to automate
0.70-0.94
Moderate confidence
Review recommended
<0.70
Low confidence
Returns AMBIGUOUS
Signal weights (illustrative)
Schema.org Organization+0.30
OG meta tags+0.20
Legal/SEC filings+0.25
DNS verification+0.15
Redirect chain clean+0.10
Signal conflict-0.40

Guarantees

=
Deterministic
Explainable
Fresh
Fail-Closed

Security

Public data only

We ingest only publicly available information. No private data. No authenticated sources. No PII.

Stateless APIs

Requests are isolated. No session state. Customer data is never shared or co-mingled.

SOC 2 aligned

Infrastructure follows SOC 2 Type II controls. Audit logs for all access. Encryption at rest and in transit.

Rate limiting

Per-key rate limits prevent abuse. Burst capacity for legitimate spikes. Graceful degradation.

Build on ground truth, not inference.

MachineContext is the infrastructure layer that makes AI decisions auditable and safe.

MachineContext © 2026