What is Tupshar?
Tupshar (Akkadian: scribe; tablet-writer) is an AI-native document store. It's a research platform exploring how AI systems can effectively work with knowledge bases.
You store documents via HTTP API or MCP tools. Tupshar indexes them with BM25 full-text search. You query with natural language or structured filters. Results come back ranked by relevance.
That's it. No ML magic, no fine-tuning, no black boxes. Just fast, honest search built for AI systems.
Why Now?
Large language models have changed what retrieval means. LLMs don't just need documents — they need relevant context in the right form at the right time.
Most knowledge base systems were built for humans:
- Heavy on UI. Light on APIs.
- Designed for users to click around. Not designed for AI agents to integrate.
- Single-replica, single-tenant, runs-on-your-laptop patterns.
Tupshar is different. It's designed from the ground up as an API. No UI. No console. Just clean, simple HTTP endpoints and MCP tools. Store, search, retrieve. That's the contract.
Research, Not Product
This matters: Tupshar is research software. We're shipping it early to learn.
What that means:
- No SLA. No uptime guarantee. No support team.
- APIs may change. We'll communicate, but breaking changes may happen.
- Data durability guarantees are weaker than production systems.
- We make decisions for learning, not for scale.
What it doesn't mean:
- It's not broken. 220 tests. Stable API. Works.
- It's not "beta". We're not hiding behind marketing language.
- It won't be archived. If this proves useful, it becomes a product.
Research Goals
We're exploring:
-
What AI systems actually need from knowledge bases
- Is BM25 search enough, or do we need semantic embeddings?
- How should APIs look? What operations matter?
- What failure modes hurt?
-
How to build this at reasonable cost
- Can we use commodity SurrealDB for this workload?
- What does it cost to run per-tenant isolation?
- Can we provide search at API-only cost?
-
How retrieval fits into LLM workflows
- What latency matters?
- What does "relevant" mean when the user is an AI?
- How do quotas affect the experience?
Your feedback shapes these answers.
Technology
Language: Rust (for performance and safety)
Storage: SurrealDB (per-tenant, multi-modal: documents + queries)
Search: BM25 full-text search with configurable ranking
API: HTTP/REST with Bearer token authentication
Integration: MCP tools for Claude and compatible clients
Observability: Structured logging, metrics, request tracing
Team
Tupshar is maintained by the Upside Down Research team. We build authorization and research systems.
Roadmap
Research Phase (now)
- Collect usage patterns
- Identify pain points
- Decide on long-term architecture
Hardening Phase (TBD)
- Multi-tenant safety improvements
- Key rotation / tenant separation
- Email verification for signup
Product Phase (TBD)
- Production SLA
- Multi-key tenants
- Admin UI
- Fine-grained quotas and rate limiting
We'll post updates as we learn. Watch this space.
Questions? Email us at paul@upside-down-research.com