Contributing Guidelines
Thank you for your interest in contributing to uubed! This guide will help you understand our development process and contribute effectively.
Getting Started
Before You Begin
- Read the Documentation: Familiarize yourself with uubed’s concepts and API
- Check Existing Issues: Look for existing issues or discussions related to your idea
- Development Setup: Follow the development setup guide
- Code of Conduct: Read and agree to follow our Code of Conduct
Finding Ways to Contribute
- Bug Reports: Help us identify and fix issues
- Feature Requests: Suggest new functionality
- Documentation: Improve guides, examples, and API documentation
- Code Contributions: Fix bugs, implement features, or optimize performance
- Testing: Add test cases or improve test coverage
- Examples: Create usage examples and tutorials
Development Process
Repository Structure
uubed is organized as a multi-repository project:
uubed-py/
: Python implementation and APIuubed-rs/
: Rust implementation for performance-critical componentsuubed-docs/
: Documentation and guides
Branch Strategy
main
: Stable release branchdevelop
: Development branch for integrationfeature/*
: Feature development branchesbugfix/*
: Bug fix brancheshotfix/*
: Critical fixes for production issues
Workflow
- Fork the Repository: Create your own fork
- Create a Branch: Use descriptive branch names
- Make Changes: Follow coding standards and best practices
- Test Thoroughly: Ensure all tests pass
- Submit a Pull Request: Provide clear description and context
Coding Standards
Python Code Style
We use these tools for Python code quality:
- Formatting:
ruff format
(Black-compatible) - Linting:
ruff check
- Type Checking:
mypy
- Testing:
pytest
Code Style Rules
# Good: Clear function names and type hints
def encode_embedding(
data: np.ndarray,
method: EncodingMethod = "auto"
) -> str:
"""Encode embedding data using specified method.
Args:
data: Input embedding array
method: Encoding method to use
Returns:
Encoded string representation
Raises:
UubedValidationError: If input validation fails
"""
validate_embedding_input(data, method)
return _encode_impl(data, method)
# Bad: Unclear names and missing documentation
def enc(d, m="auto"):
return _enc_impl(d, m)
Documentation Standards
# Use Google-style docstrings
def process_batch(
embeddings: List[np.ndarray],
batch_size: int = 1000,
**kwargs: Any
) -> List[str]:
"""Process embeddings in batches for efficiency.
This function processes large lists of embeddings by splitting them
into smaller batches to manage memory usage and improve performance.
Args:
embeddings: List of embedding arrays to process
batch_size: Maximum number of embeddings per batch
**kwargs: Additional arguments passed to encoding function
Returns:
List of encoded strings corresponding to input embeddings
Raises:
UubedValidationError: If any embedding fails validation
UubedResourceError: If batch processing exceeds memory limits
Example:
>>> embeddings = [np.random.rand(128) for _ in range(5000)]
>>> encoded = process_batch(embeddings, batch_size=500)
>>> len(encoded) == len(embeddings)
True
"""
Rust Code Style
Follow standard Rust conventions:
- Formatting:
cargo fmt
- Linting:
cargo clippy
- Documentation:
cargo doc
- Testing:
cargo test
Rust Style Guidelines
// Good: Well-documented public API
/// Encodes a vector using the specified Q64 variant.
///
/// # Arguments
///
/// * `data` - The input vector as a slice of bytes
/// * `method` - The encoding method to use
///
/// # Returns
///
/// Returns `Ok(String)` with the encoded result, or `Err(UubedError)`
/// if encoding fails.
///
/// # Examples
///
/// ```
/// use uubed_rs::{encode_q64, Q64Method};
///
/// let data = vec![1, 2, 3, 4];
/// let encoded = encode_q64(&data, Q64Method::Eq64)?;
/// ```
pub fn encode_q64(data: &[u8], method: Q64Method) -> Result<String, UubedError> {
validate_input(data)?;
match method {
Q64Method::Eq64 => encode_eq64(data),
Q64Method::Shq64 => encode_shq64(data),
// ... other methods
}
}
// Bad: Poor naming and no documentation
pub fn enc(d: &[u8], m: u8) -> Result<String, Box<dyn Error>> {
// implementation
}
Error Handling
Python Error Handling
# Use custom exception hierarchy
from uubed.exceptions import UubedValidationError, UubedEncodingError
def validate_embedding_input(data: np.ndarray, method: str) -> None:
"""Validate embedding input with detailed error messages."""
if data is None:
raise UubedValidationError(
"Embedding cannot be None",
parameter="data",
expected="numpy array or list",
received="None"
)
if data.size == 0:
raise UubedValidationError(
"Embedding cannot be empty",
parameter="data",
expected="non-empty array",
received="empty array"
)
Rust Error Handling
// Use Result types and custom error enum
#[derive(Debug, thiserror::Error)]
pub enum UubedError {
#[error("Validation failed: {message}")]
Validation { message: String },
#[error("Encoding failed: {message}")]
Encoding { message: String },
#[error("IO error: {source}")]
Io { #[from] source: std::io::Error },
}
pub fn validate_input(data: &[u8]) -> Result<(), UubedError> {
if data.is_empty() {
return Err(UubedError::Validation {
message: "Input data cannot be empty".to_string(),
});
}
Ok(())
}
Testing Guidelines
Test Structure
Python Tests
# Use descriptive test class and method names
class TestEmbeddingValidation:
"""Test embedding input validation functionality."""
def test_valid_numpy_array_passes_validation(self):
"""Test that valid numpy arrays pass validation."""
data = np.random.rand(128).astype(np.float32)
# Should not raise any exception
validate_embedding_input(data, "shq64")
def test_empty_array_raises_validation_error(self):
"""Test that empty arrays raise UubedValidationError."""
with pytest.raises(UubedValidationError, match="cannot be empty"):
validate_embedding_input(np.array([]), "shq64")
@pytest.mark.parametrize("method", ["eq64", "shq64", "t8q64"])
def test_encoding_roundtrip_preserves_data(self, method):
"""Test encode/decode roundtrip for all methods."""
original_data = np.random.randint(0, 256, 64, dtype=np.uint8)
encoded = encode(original_data, method=method)
if method in ["eq64", "mq64"]: # Lossless methods
decoded = decode(encoded, method=method)
np.testing.assert_array_equal(original_data, decoded)
Rust Tests
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_valid_input_encodes_successfully() {
let data = vec![1, 2, 3, 4, 5];
let result = encode_q64(&data, Q64Method::Eq64);
assert!(result.is_ok());
}
#[test]
fn test_empty_input_returns_error() {
let data = vec![];
let result = encode_q64(&data, Q64Method::Eq64);
assert!(matches!(result, Err(UubedError::Validation { .. })));
}
#[test]
fn test_roundtrip_preserves_data() {
let original = vec![10, 20, 30, 40];
let encoded = encode_q64(&original, Q64Method::Eq64).unwrap();
let decoded = decode_q64(&encoded, Q64Method::Eq64).unwrap();
assert_eq!(original, decoded);
}
}
Test Coverage
Aim for high test coverage:
- Python: Use
pytest-cov
to measure coverage - Rust: Use
cargo tarpaulin
for coverage analysis
# Python coverage
uvx hatch run test-cov
# Rust coverage
cargo tarpaulin --out Html
Performance Tests
# Python benchmarks
@pytest.mark.benchmark
def test_encoding_performance():
"""Benchmark encoding performance."""
data = np.random.rand(1000, 512)
start_time = time.time()
for embedding in data:
encode(embedding, method="shq64")
duration = time.time() - start_time
rate = len(data) / duration
assert rate > 100, f"Encoding rate too slow: {rate:.2f} embeddings/sec"
// Rust benchmarks
#[bench]
fn bench_encode_eq64(b: &mut Bencher) {
let data = vec![42u8; 1024];
b.iter(|| {
black_box(encode_q64(&data, Q64Method::Eq64).unwrap())
});
}
Pull Request Process
Before Submitting
- Update Dependencies: Ensure you’re using latest versions
- Run All Tests: Make sure nothing is broken
- Check Code Style: Run formatters and linters
- Update Documentation: Add or update relevant docs
- Add Tests: Include tests for new functionality
PR Description Template
## Description
Brief description of changes and motivation.
## Type of Change
- [ ] Bug fix (non-breaking change that fixes an issue)
- [ ] New feature (non-breaking change that adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation update
## Testing
- [ ] Added unit tests for new functionality
- [ ] All existing tests pass
- [ ] Manual testing completed
## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No new warnings introduced
## Related Issues
Closes #123
Review Process
- Automated Checks: CI must pass
- Code Review: At least one maintainer review required
- Testing: Reviewer may run additional tests
- Documentation: Check that docs are updated appropriately
- Approval: Maintainer approval required for merge
Communication
GitHub Issues
When creating issues:
- Use Templates: Follow the provided issue templates
- Be Specific: Include reproduction steps for bugs
- Provide Context: Explain the use case for features
- Include Environment: OS, Python/Rust versions, etc.
Discussions
Use GitHub Discussions for:
- Questions: General usage questions
- Ideas: Brainstorming new features
- Announcements: Project updates and releases
Code Review Feedback
When giving feedback:
- Be Constructive: Suggest improvements, don’t just point out problems
- Be Specific: Reference exact lines or provide examples
- Be Respectful: Remember there’s a person behind the code
When receiving feedback:
- Be Open: Consider all suggestions carefully
- Ask Questions: If feedback isn’t clear, ask for clarification
- Iterate: Make changes and push updates promptly
Recognition
Contributors are recognized in:
- CONTRIBUTORS.md: All contributors listed
- Release Notes: Major contributions highlighted
- Documentation: Examples and guides credited to authors
Questions?
If you have questions about contributing:
- Check this guide and the setup documentation
- Search existing issues and discussions
- Create a new discussion or issue
- Reach out to maintainers directly if needed
Thank you for contributing to uubed! 🎉