A document in Tupshar is UTF-8 text plus a rich metadata envelope. Every document is identified by a server-assigned ULID — not its filename. The filename is searchable metadata, not an address.
Document Fields
Every document carries these fields:
| Field | Description |
|---|---|
id | Server-assigned ULID (immutable, the address of the document) |
filename | Human-readable name (up to 512 chars) |
contents | UTF-8 text body |
tags | Array of strings — up to 64 tags, each up to 64 chars |
properties | String→string map — up to 64 entries; key ≤128 chars, value ≤2048 chars |
links | Typed edges to other documents (see below) |
size_bytes | Byte length of contents |
token_count | Number of tokens from the v1 analyzer |
unique_terms | Count of distinct tokens in this document |
sha256 | Hex SHA-256 of contents |
analyzer_version | Always "v1" in the current release |
version | Integer; starts at 1, incremented on every content update |
created_at | RFC 3339 timestamp, set at creation |
updated_at | RFC 3339 timestamp, set on each content update |
accessed_at | Best-effort last-read timestamp |
etag | Concurrency token (see below) |
Tags, Properties, and Links
Tags, properties, and links are first-class fields — you can supply them at create time and update them independently via dedicated endpoints. There is no need to embed metadata inside the document text.
Tags are plain string labels. Useful for grouping and filtering.
Properties are a string→string map for structured attributes ({"project": "tupshar", "status": "draft"}).
Links express typed, directional relationships between documents. Each link has:
target_id— the ULID of the target documentrel— one ofreferences(default),supersedes,derived_from,related- Optional per-link properties
Up to 256 links per document.
Limits
| Resource | Limit |
|---|---|
| Content size | 4 MiB per document |
| Filename length | 512 chars |
| Tags | 64 per document, 64 chars each |
| Properties | 64 per document; key 128 chars, value 2048 chars |
| Links | 256 per document |
| Content type | UTF-8 text only (no binary) |
Concurrency — ETags and If-Match
Tupshar uses optimistic concurrency for updates. Every document response includes an etag of the form:
"sha256:<hex>|<version>|<updated_at_nanos>"
To update safely, pass the current ETag in an If-Match header. If the document was changed since you read it, the server returns 412 Precondition Failed. This prevents lost updates when two writers race.
See HTTP API for the full request/response shapes.
Indexing
When you store or update a document, Tupshar tokenizes and lowercases the contents and builds a BM25 index. The original text is preserved and returned verbatim on retrieval. See Search for how the index is queried.