Cerebrate File Documentation
Break large files into manageable pieces, preserve context, and process them with Cerebras AI.
Overview
Cerebrate File is a command-line tool for processing large documents through the Cerebras AI API. It splits files intelligently to fit within the model’s context window while keeping track of what came before.
Key Features
- Smart chunking: Automatically break large documents into smaller parts
- Context overlap: Keep snippets from previous chunks to maintain continuity
- Directory support: Recursively process folders using glob patterns
- Parallel execution: Handle multiple files at once with threading
- Terminal UI: Clean progress output that updates in real time
- Retry logic: Handle rate limits and temporary errors without manual intervention
- Format flexibility: Works with text, markdown, code, and semantic content
- Configurable behavior: Plenty of CLI options for tuning how things work
Getting Started
Installation
Install with pip or uv:
# Using pip
pip install cerebrate-file
# Using uv (faster)
uv pip install cerebrate-file
Quick Start
- Set your Cerebras API key:
export CEREBRAS_API_KEY="csk-..."
- Process a single file:
cerebrate-file document.md --output processed.md
- Process all markdown files in a directory tree:
cerebrate-file . --output ./output --recurse "**/*.md"
Use Cases
Use Cerebrate File when you need to:
- Rewrite, summarize, or translate large documents
- Refactor code across an entire project
- Generate new versions or expansions of existing content
- Apply consistent transformations to many files at once
- Clean, format, or analyze large text datasets
Model Details
The tool uses the Qwen-3 Coder 480B model from Cerebras:
- Context window: 131,072 tokens
- Speed: ~570 tokens/second
- Specialty: Good at both code and natural language
- Rate limits:
- 30 requests per minute
- 1,000 requests per day
- 10 million tokens per minute
Documentation Sections
- Installation – Setup instructions
- Usage Guide – Practical examples
- CLI Reference – All command-line flags and options
- Configuration – Settings and tuning tips
- Examples – Real-world workflows
- API Reference – For Python integration
- Troubleshooting – Fixes for common issues
- Development – How to contribute
System Requirements
- Python 3.9+
- Minimum 4GB RAM (8GB recommended for large files)
- Internet connection
- Valid Cerebras API key
License
Licensed under Apache 2.0. See LICENSE for details.
Support
- Report bugs or request features: GitHub Issues
- Ask questions or share ideas: GitHub Discussions
- Maintainer: Adam Twardoch (@twardoch)