Upside Down Research, LLC · Est. MMXXIII A little different look at the world
Pre-release · subject to change
Documentation

Documents

A document in Tupshar is UTF-8 text plus a rich metadata envelope. Every document is identified by a server-assigned ULID — not its filename. The filename is searchable metadata, not an address.

Document Fields

Every document carries these fields:

FieldDescription
idServer-assigned ULID (immutable, the address of the document)
filenameHuman-readable name (up to 512 chars)
contentsUTF-8 text body
tagsArray of strings — up to 64 tags, each up to 64 chars
propertiesString→string map — up to 64 entries; key ≤128 chars, value ≤2048 chars
linksTyped edges to other documents (see below)
size_bytesByte length of contents
token_countNumber of tokens from the v1 analyzer
unique_termsCount of distinct tokens in this document
sha256Hex SHA-256 of contents
analyzer_versionAlways "v1" in the current release
versionInteger; starts at 1, incremented on every content update
created_atRFC 3339 timestamp, set at creation
updated_atRFC 3339 timestamp, set on each content update
accessed_atBest-effort last-read timestamp
etagConcurrency token (see below)

Tags, properties, and links are first-class fields — you can supply them at create time and update them independently via dedicated endpoints. There is no need to embed metadata inside the document text.

Tags are plain string labels. Useful for grouping and filtering.

Properties are a string→string map for structured attributes ({"project": "tupshar", "status": "draft"}).

Links express typed, directional relationships between documents. Each link has:

  • target_id — the ULID of the target document
  • rel — one of references (default), supersedes, derived_from, related
  • Optional per-link properties

Up to 256 links per document.

Limits

ResourceLimit
Content size4 MiB per document
Filename length512 chars
Tags64 per document, 64 chars each
Properties64 per document; key 128 chars, value 2048 chars
Links256 per document
Content typeUTF-8 text only (no binary)

Concurrency — ETags and If-Match

Tupshar uses optimistic concurrency for updates. Every document response includes an etag of the form:

"sha256:<hex>|<version>|<updated_at_nanos>"

To update safely, pass the current ETag in an If-Match header. If the document was changed since you read it, the server returns 412 Precondition Failed. This prevents lost updates when two writers race.

See HTTP API for the full request/response shapes.

Indexing

When you store or update a document, Tupshar tokenizes and lowercases the contents and builds a BM25 index. The original text is preserved and returned verbatim on retrieval. See Search for how the index is queried.