The QuadB64 Family: A Suite of Position-Safe Encodings

Overview

The QuadB64 family consists of four specialized encoding schemes, each optimized for different use cases while maintaining the core principle of position safety. This modular approach allows you to choose the perfect encoding for your specific needs without compromising on substring pollution prevention.

Family Members at a Glance

Encoding	Purpose	Output Size	Reversible	Best For
Eq64	Full fidelity encoding	~1.33x input	✅ Yes	Complete data preservation
Shq64	Similarity hashing	16 chars (fixed)	❌ No	Deduplication, clustering
T8q64	Sparse representation	16 chars (fixed)	❌ No	Feature extraction
Zoq64	Spatial encoding	Variable	❌ No	Geospatial, multi-dimensional

Core Design Principles

All QuadB64 family members share these fundamental characteristics:

1. Position Safety

Every encoding incorporates positional information, making arbitrary substring matches impossible:

# Traditional Base64 - position-agnostic
base64("ABC") at position 0 == base64("ABC") at position 100

# QuadB64 - position-aware
quad64("ABC", pos=0) != quad64("ABC", pos=100)

2. Dot-Separated Chunks

Visual and algorithmic boundaries every 4 characters:

Traditional: SGVsbG8gV29ybGQh
QuadB64:     SGVs.bG8g.V29y.bGQh

3. Consistent Alphabet

All variants use the same 64-character alphabet with position-dependent permutations:

Letters: A-Z, a-z (52 chars)
Digits: 0-9 (10 chars)
Special: . and / (2 chars)

4. Search Engine Friendly

Designed specifically for modern search infrastructure:

No characters that require URL encoding
Compatible with tokenizers
Preserves word boundaries with dots

Choosing the Right Encoding

Decision Tree

graph TD
    A[What's your use case?] --> B{Need exact data recovery?}
    B -->|Yes| C[Eq64]
    B -->|No| D{What type of data?}
    D -->|Embeddings/Vectors| E{Purpose?}
    D -->|Spatial/Geographic| F[Zoq64]
    E -->|Similarity Search| G[Shq64]
    E -->|Feature Selection| H[T8q64]

Use Case Matrix

Scenario	Recommended	Why
Storing ML embeddings	Eq64	Full precision needed
Deduplication system	Shq64	Fast similarity comparison
Search engine integration	Eq64 or Shq64	Depends on precision needs
Recommendation systems	T8q64	Sparse features suffice
Mapping applications	Zoq64	Spatial locality preserved
Document fingerprinting	Shq64	Compact, similarity-aware
Binary file storage	Eq64	Lossless requirement

Performance Characteristics

Encoding Speed (MB/s)

Eq64:   ████████████████████ 230 MB/s (with native)
Shq64:  ██████████ 117 MB/s (with native)
T8q64:  █████████████ 156 MB/s (with native)
Zoq64:  ████████████████████████████████████████ 480 MB/s (with native)

Space Efficiency

# Original: 768-dimensional float32 embedding (3072 bytes)
original_size = 3072

# Encoded sizes
eq64_size = 4096    # ~1.33x (same as Base64)
shq64_size = 16     # 0.005x (192x compression!)
t8q64_size = 16     # 0.005x (sparse representation)
zoq64_size = 32     # 0.01x (for 2D spatial, adjustable)

Implementation Architecture

Shared Components

All family members share a common architecture:

class QuadB64Encoder:
    def __init__(self, variant: str):
        self.variant = variant
        self.alphabets = self._init_alphabets()
        self.position = 0
    
    def _get_alphabet(self, position: int) -> str:
        """Returns position-specific alphabet"""
        phase = position % 4
        return self.alphabets[phase]
    
    def encode(self, data: bytes) -> str:
        """Variant-specific encoding logic"""
        raise NotImplementedError

Variant-Specific Logic

Each variant implements its own encoding strategy:

Eq64: Direct byte-to-character mapping with position rotation
Shq64: SimHash generation followed by position-safe encoding
T8q64: Top-k selection algorithm with magnitude preservation
Zoq64: Z-order curve calculation with adaptive precision

Integration Patterns

Unified API

from uubed import encode, decode

# Automatic variant selection based on method
encoded = encode(data, method="eq64")   # Full encoding
hashed = encode(data, method="shq64")   # Similarity hash
sparse = encode(data, method="t8q64")   # Top-k indices
spatial = encode(data, method="zoq64")  # Z-order encoding

# Decoding (only supported for Eq64)
original = decode(encoded)  # Works
decode(hashed)  # Raises: NotReversibleError

Variant Detection

from uubed import detect_variant

encoded_string = "SGVs.bG8g.V29y.bGQh"
variant = detect_variant(encoded_string)
print(f"This is {variant} encoding")  # "This is eq64 encoding"

Batch Operations

from uubed import BatchEncoder

# Efficient batch processing
encoder = BatchEncoder(method="shq64")
embeddings = [...]  # List of 1000 embeddings
encoded_batch = encoder.encode_all(embeddings)  # Parallel processing

Advanced Features

Hybrid Encoding

Combine multiple variants for complex use cases:

from uubed import HybridEncoder

# Store full + compact representation
hybrid = HybridEncoder(primary="eq64", secondary="shq64")
result = hybrid.encode(embedding)
# Returns: {"full": "...", "compact": "...", "variant": "hybrid"}

Custom Alphabets

For specialized domains:

from uubed import CustomQuadB64

# Domain-specific alphabet (e.g., DNA sequences)
dna_encoder = CustomQuadB64(
    alphabet="ACGT" * 16,  # Must be 64 chars
    variant="eq64"
)

Streaming Support

For large-scale processing:

from uubed import StreamEncoder

# Process large files efficiently
with StreamEncoder("eq64") as encoder:
    with open("embeddings.bin", "rb") as f:
        for chunk in iter(lambda: f.read(4096), b''):
            encoded_chunk = encoder.encode_chunk(chunk)
            process(encoded_chunk)

Security Considerations

While QuadB64 is not a security tool, it offers some interesting properties:

Pattern Obfuscation: Position-dependent encoding makes pattern analysis harder
No Information Leakage: Encoded strings don’t reveal position information
Tamper Evidence: Modified characters likely break position consistency

Important: QuadB64 is NOT encryption. Use proper cryptographic tools for security needs.

Future Directions

The QuadB64 family is designed for extensibility:

Planned Variants

Mq64: Matryoshka embedding support with nested precision
Hq64: Hierarchical encoding for tree structures
Cq64: Compression-aware variant for bandwidth optimization

Research Areas

Integration with homomorphic encryption
Quantum-resistant variants
Hardware acceleration (GPU/TPU)
Distributed encoding protocols

Summary

The QuadB64 family provides a comprehensive solution to substring pollution while offering flexibility for various use cases. Whether you need full fidelity with Eq64, compact hashes with Shq64, sparse representations with T8q64, or spatial encoding with Zoq64, there’s a variant optimized for your needs.

Choose your variant based on:

Data recovery needs: Reversible (Eq64) vs. one-way (others)
Size constraints: Full size vs. fixed compact representations
Use case: Similarity, sparsity, or spatial relationships
Performance requirements: All variants support native acceleration

Next, explore each variant in detail: