Trace Provider Data Audit — Integration & API Reference
This is the single source of truth for the Trace data audit API. It serves both provider integrators (onboarding, auth, schema shape, field guidance, examples) and frontend/backend consumers (endpoints, limits, validation rules, search, stats, scoped groups). The API and Trace Schema described here are provider-agnostic — every provider integrates through the same endpoints, headers, and schema. The concrete examples below use Kled, a live provider, to make the request and response shapes easier to follow. Substitute your own provider identity (theX-Provider value, source IDs, URLs, and policy references) wherever Kled appears.
Environments
| Environment | Base URL | Auth |
|---|---|---|
| Staging | https://staging-api.storyprotocol.net | Write gated by staging API key. Reads are public. |
| Production | https://api.dataapis.io | Write gated by production API key. Reads are public. |
Getting Access
Write access is gated. Before integrating, a provider must contact the Story team to be onboarded. We whitelist your provider identity, issue staging and production API keys, and assign theX-Provider value your write requests must use. Until that is done, write requests will be rejected. Read, search, and scoped-group endpoints are public audit views and do not require a key.
To request access, reach out to the Story team to start the onboarding process.
Versioning
Current V1 version constants:| Version | Value | Purpose |
|---|---|---|
| Trace schema | trace-v1.0 | Normalized metadata shape expected in initial_metadata_json and metadata_json. |
| Event hash | event-hash-v1 | Version of Story’s internal audit event hash envelope. |
| Event hash canonicalization | json-canonical-v1 | JSON is normalized before hashing so object key order does not affect the event hash. |
| Search index schema | index-v1 | Version embedded in search index partition keys and stored on index rows. |
| Search shard schema | shard-v1 | Version embedded in sharded index partition keys and stored on index rows. |
API Flow
records:batch response always includes per-item statuses. 202 Accepted means at least one item was accepted for asynchronous processing and no item conflicted. 200 OK means no new async work was enqueued because every item was a duplicate. 409 Conflict means at least one item conflicted; any accepted items in that response were still enqueued. Clients should inspect item statuses and retry transient request failures with the same payload and the same X-Batch-Id.
Auth and Provider Scope
Every write request must include:X-Provider must match that provider on write requests. Read/search/scoped-group endpoints are public audit views.
For backlog ingestion, add X-Ingestion-Source: backlog. For normal live ingestion, omit the header; explicit live is not accepted.
Read APIs are keyed by global Story data_id. Provider is returned as a field and can be used as an optional filter or searchable field.
Current Provider / Trace Alignment
A provider’s draft public payload maps cleanly into the current Trace Schema v1.0 working shape. The provider should send the normalized Trace fields shown below and include the full provider public payload underprovider_payload so no provider-specific detail is lost.
These fields are part of the current Trace v1.0 working shape:
contributor.kyc_country: country-level KYC jurisdiction signal. Country only; no address or GPS-derived country.contributor.consent.tos_*andcontributor.consent.privacy_policy_*: the exact policy versions and hashes the contributor accepted for the record.attestation.signed_at_utcandattestation.key_url: verification metadata for the top-levelattestationblock.file.behavior: non-PII capture/upload behavior signals.file_specific.base.motion: shared motion signals across media types.app.legal_entity: legal counterparty information alongside the app/platform name.
app.legal_entityfile.behavior.*file_specific.base.motion.*file.hashes.dhash64file.hashes.ahash64file.hashes.keyframe_phashesfile.content_md5/file.hashes.md5app.platform_name
- Metadata update cap is
100perdata_idon both staging and production. Multiple mutable field changes can be coalesced into one metadata update with oneseqwhen they are part of the same provider-side revision. - Read/search/scoped-group APIs are public audit views. Do not put enterprise-only or sensitive fields into the public Trace payload. If a future provider-only or partner-only read tier is needed, it should be a separate API/product decision.
- The only searchable hash type is canonical content SHA-256.
phash64,dhash64,ahash64, andkeyframe_phashesshould still be sent when useful, but they are stored-only for now. - Recommended
contributor.tax_statusvalues aresubmitted,not_submitted,not_applicable, andunknown. A provider can maptax_form_on_file=truetosubmittedandfalsetonot_submitted. - Recommended
contributor.account_verification_statusvalues areverified,pending,failed, andunverified. - Attestation signature is optional on staging. For production verification, the provider should send
payload_hash,signature,key_id,key_url, andsigned_at_utc. source_record_idshould contain the provider’s stable public media ID, for examplekmf_.... The service trims surrounding whitespace, preserves case, rejects control characters, and accepts source IDs up to 512 bytes.
Trace Schema v1.0
Story stores a provider-normalized trace metadata object instead of treating the provider’s raw payload as the top-level Story contract. The provider’s full original public payload should be preserved underprovider_payload.
The provider should populate the standardized Trace Schema fields directly. Story also preserves the provider’s full original public payload under provider_payload, but the normalized fields are the portable Trace contract that other providers and the frontend should use.
Use schema_version: trace-v1.0 in initial_metadata_json and metadata_json. Story canonicalizes metadata JSON before computing internal event hashes, so object key order does not affect idempotency or conflict detection.
A provider does not need to send a transaction hash in write payloads. Story owns tx_hash and returns it on read responses for registration, metadata update, and search result rows. It is returned as an empty string until Story fills it after broadcast.
Initial registrations and metadata updates must include one canonical content
hash. Accepted fields are asset.hash, content_hash, file.content_sha256,
or file.hashes.sha256; all normalize to sha256:<64-lowercase-hex>. If more
than one alias is present, they must represent the same hash.
Recommended top-level shape:
app.platform_name, app.legal_entity, file.behavior, file_specific.base.motion, file.hashes.phash64, dhash64, ahash64, md5, and keyframe_phashes may be preserved in the stored payload, but they are not exact-match indexed in the current implementation. Use /stats for distributions and use provider as an optional query scope for Trace frontend/audit views.
Field guidance
source_record_id: provider-owned stable public media ID. The provider’skmf_...value belongs here. Accepted up to 512 bytes.contributor.anon_id: provider-owned public anonymized contributor ID.contributor.kyc_status: recommended values areverified,pending,failed,unverified.contributor.kyc_country: ISO 3166-1 alpha-2 country code from KYC, if available. Country only; no address or GPS-derived country.contributor.tax_status: recommended values aresubmitted,not_submitted,not_applicable,unknown. For a provider’stax_form_on_file, maptruetosubmittedandfalsetonot_submitted.contributor.account_verification_status: recommended values areverified,pending,failed,unverified.contributor.consent.tos_*andcontributor.consent.privacy_policy_*: the accepted policy version, hash, and URI for this record.- Provider active policies are set separately through
PUT /webhook/v1/data-audit/provider-policy. attestation.signature: optional on staging. For production verification, the provider should sendpayload_hash,signature,key_id,key_url, andsigned_at_utc.file.behavior: non-PII upload/capture behavior signals.file_specific.base.motion: shared motion signals that can apply across media kinds.
Write API
Set active provider policies
Use this endpoint when the provider publishes a new current Terms of Service or Privacy Policy. Trace stores the active version, SHA-256 hash, and URI so the frontend can link to the policy document and stats can compare record-level accepted policy references to the provider’s current policies. The policy document rows are stored for audit/visibility only. They are not indexed or aggregated directly. Record payloads should still includecontributor.consent.tos_* and contributor.consent.privacy_policy_*; those record-level accepted policy references are what /stats, scoped-group summaries, and policy hash search use.
Register records
Use this endpoint for initial backlog and live registration batches. The provider sends its stablesource_record_id; Story generates the data_id and returns the mapping.
X-Ingestion-Source: backlog for backlog batches. Live batches do not need an ingestion-source header.
Request body is a JSON array:
initial_metadata_root should be the provider’s deterministic non-zero hash of the canonical Trace Schema v1.0 metadata JSON. Story stores this value as submitted; hash verification against initial_metadata_json is not enforced yet.
Successful response:
202 Accepted: one or more records were newly accepted and enqueued.200 OK: request was processed, but no records were enqueued because every item wasduplicate.409 Conflict: one or more items wereconflict; anyaccepteditems in the same response were still enqueued.
data_id is the Story ID for future metadata updates and reads. Re-sending the same X-Provider + source_record_id generates the same data_id.
If a record was already persisted with the exact same initial registration payload, the item returns status: "duplicate" and is not re-enqueued. If the same source_record_id was already persisted with different initial metadata, the item returns status: "conflict" and is not enqueued. An overlapping retry while the first request is still queued may return accepted; downstream ingestion remains idempotent. Other valid items in the same batch can still return accepted.
If a caller already has Story-assigned UUIDs, the lower-level endpoint is:
data_id on every record and is not the recommended provider path.
Submit metadata updates
Use this endpoint for later corrections or mutable metadata changes.seq must be between 1 and 100 for the same data_id.
metadata_json must include a canonical content hash and should be the full latest Trace metadata state after the change, not a partial diff or JSON patch. For example, if only KYC changes, the provider should still include the unchanged file, asset, app, consent, and provider payload fields that remain part of the current metadata state. This keeps each metadata event independently verifiable against metadata_root and lets Story rebuild the latest state without provider-specific merge rules.
metadata_root should be the provider’s deterministic non-zero hash of the canonical full updated Trace Schema v1.0 metadata JSON. prev_metadata_root must also be a non-zero root. Story stores these values as submitted; hash verification against metadata_json is not enforced yet.
Optional backlog file endpoints
The normal integration path isapplication/json on /records:batch. For backlog tooling, these lower-level route variants also exist, but data ID file routes require data_id in every record.
Content-Type: text/csv.
Data ID CSV columns:
initial_metadata_json and metadata_json must be valid JSON in a quoted CSV field and include one canonical content hash.
TXT requires Content-Type: text/plain and accepts line-delimited JSON, a JSON array, or header-delimited comma/tab text using the same CSV columns.
Read API
Read endpoints are public audit views.provider is optional and acts as a narrowing filter.
Read model:
GET /data-ids/{data_id}returns the registration profile plus the latest raw metadata event. The latest metadata event is not guaranteed to be a diff; it is the exact update payload submitted for the highest sequence currently stored.GET /data-ids/{data_id}/metadatasreturns the full append-only metadata history, including registration atseq: 0and later metadata updates.- Search, asset receipt lookup, and scoped-group summaries use Story’s normalized latest-state projection derived from those events. This projection is what powers fields such as MIME type, media category, KYC status, TOS/privacy versions, lifecycle status, and
tx_hash.
Get trace by Story data ID
List metadata history
tx_hash, initially as an empty string:
Search by indexed field
tx_hash, initially as an empty string. Search responses include next_cursor. Use /search to locate records by an exact field value, then use /api/v1/data-audit/data-ids/{data_id} or /api/v1/data-audit/data-ids/{data_id}/metadatas for the canonical event payload used in audit verification. Use /stats for broad distribution counts such as MIME type, media category, KYC status, country/region, TOS, and Privacy Policy versions. Use /recent or /feed when the UI needs the newest records.
source_record_id search is exact and case-sensitive.
Other examples:
Provider totals
provider + contributor_anon_id values when a contributor ID is present. Metadata updates do not increase these totals. Distribution maps and size totals follow the latest asset projection, so full-state metadata updates can move a record from one bucket to another. average_size_bytes is total_size_bytes / size_record_count, where size_record_count counts latest projections with a positive size. App/platform fields can be stored on records, but they are not stats scopes.
Recent ingestion feed
event_type, seq, data_id, source_record_id, asset_hash, occurred_at, and ingested_at. Use next_cursor to fetch older rows.
Recent registered records
next_cursor to fetch older rows.
Asset receipts by content hash
sha256:<64-hex>, plain <64-hex>, 0x<64-hex>, and 0x1220<64-hex>. The response returns the canonical sha256:<64-lowercase-hex> asset hash and all matched receipt rows.
Scoped groups
Scoped groups let the Trace frontend or a partner reviewer create a public snapshot over a review set. For provider review workflows, labs should use the provider’ssource_record_id values, not Story-generated data_ids or content hashes. Trace computes the deterministic data_id, verifies every submitted record exists, and only creates the group if the full set is valid.
Create from source record IDs:
provider + source_record_id values return 400, missing records return 404, profile/source mismatches return 409, and records that were registered without an asset projection return 422.
For larger source-record CSV/TXT inputs, request a presigned upload URL first:
csv and txt.
One-column CSV uses source_record_id and requires provider in the create request:
provider:
provider in the create request:
upload_url with the returned Content-Type header, then create the group with:
provider only when the uploaded CSV has a provider column.
Read scoped group:
GET /scoped-groups/{group_id} returns the group status and, once complete, aggregate metrics in summary:
profile.status is pending or processing and summary is omitted.
/items returns one row per submitted source_record_id with the resolved data_id, asset_hash, and receipt summary. It is cursor-paginated and returns next_cursor when more rows exist. CSV export starts with input_type,provider,source_record_id,data_id,asset_hash,status,....
Limits and Retry Behavior
Current limits:| Limit | Value |
|---|---|
| Max request body size | 25 MiB |
| Max SQS message chunk | 240 KiB |
| Max SQS batch payload | 256 KiB |
| Max serialized record size | 350 KiB |
| Max metadata updates per data ID | 100 |
| Max inline scoped-group body | 5 MiB |
| Max inline scoped-group hashes | 10,000 |
| Max inline scoped-group source records | 10,000 |
| Max source_record_id length | 512 bytes |
- Retry
502,503,504, network timeouts, and429with exponential backoff and jitter. - Do not retry validation/auth
4xxuntil the request is fixed. - Keep
data_id, request body, andX-Batch-Idstable across retries. - Use
X-Ingestion-Source: backlogonly for backlog work; omit it for live work. - The write path is idempotent for the same
data_id, event key, and event hash. - If the same
data_idand event key are retried with different metadata, the record is treated as a conflict and rejected.
Validation Rules
| Rule | Behavior |
|---|---|
Missing X-Provider on write endpoints | Request is rejected. |
| Invalid provider name | Request is rejected. |
| Provider outside the configured allowlist | Write request is rejected. |
WEBHOOK key does not match X-Provider | Write request is rejected. |
Missing X-Batch-Id on write endpoints | Request is rejected. |
X-Ingestion-Source present with any value other than backlog | Request is rejected. |
Missing source_record_id on /records:batch | Request is rejected. |
Duplicate source_record_id inside one /records:batch request | Request is rejected. |
Existing /records:batch item with the same initial registration payload | Item returns status: "duplicate". |
Existing /records:batch item with different initial registration payload | Item returns status: "conflict". |
Non-UUID data_id on lower-level data ID and metadata update routes | Request is rejected. |
Missing occurred_at | Request is rejected. |
Invalid occurred_at timestamp | Request is rejected. |
Metadata seq outside 1 through 100 | Request is rejected. |
| Missing, malformed, or zero required metadata root fields | Request is rejected. |
Notes
- Delivery is at least once, so duplicate submissions may occur.
- Duplicate submissions of the same event are treated idempotently.
- Same
data_id, same metadata sequence, and different event content is treated as a conflict. - Metadata updates may arrive before the initial data ID registration.
- Audit data is durable and does not expire.
- Event hashes are computed from canonicalized metadata JSON plus the event/hash/schema version fields.