UUBED Python Library
uubed is a high-performance library for encoding embedding vectors into position-safe strings that solve the “substring pollution” problem in search systems.
Key Features
- Position-Safe Encoding: QuadB64 family prevents false substring matches
- Blazing Fast: 40-105x faster than pure Python with Rust acceleration
- Multiple Encoding Methods: Full precision, SimHash, Top-k, Z-order
- Search Engine Friendly: No more substring pollution in Elasticsearch/Solr
- Easy Integration: Simple API, works with any vector database
Quick Example
import numpy as np
from uubed import encode
# Create a sample embedding
embedding = np.random.rand(384).astype(np.float32)
# Encode to position-safe string
encoded = encode(embedding, method="auto")
print(f"Encoded: {encoded[:50]}...")
Project Structure
The uubed project is organized across multiple repositories:
- uubed - Main project hub
- uubed-rs - High-performance Rust implementation
- uubed-py - Python bindings and API
- uubed-docs - Comprehensive documentation
Installation
Using pip
pip install uubed
For maximum performance, install from source to get the native Rust acceleration:
git clone https://github.com/twardoch/uubed-py
cd uubed-py
pip install -e .
Using uv (recommended)
uv pip install uubed
Python API Overview
The Python library provides a simple, high-level interface to all UUBED encoding methods:
Main Functions
encode(embedding, method="auto", validate=True)
- Encode an embedding to a position-safe stringdecode(encoded_string)
- Decode back to the original embedding (Eq64 only)
Encoding Methods
eq64
- Full precision encoding with complete reversibilityshq64
- SimHash for locality-sensitive hashingt8q64
- Top-k indices for sparse representationszoq64
- Z-order encoding for spatial queries
Performance
With native Rust acceleration:
- Eq64: 234 MB/s encoding speed
- Shq64: 105 MB/s with similarity preservation
- T8q64: 94 MB/s for sparse encoding
- Zoq64: 168 MB/s for spatial encoding
Next Steps
- Installation Guide - Detailed installation instructions
- Quickstart Tutorial - Get started in 5 minutes
- API Reference - Complete API documentation
- Performance Guide - Optimization tips and benchmarks