Troubleshooting
Solutions to common issues and error messages.
Table of contents
- Troubleshooting
Common Issues
API Key Issues
Error: CEREBRAS_API_KEY not found
Symptom:
Error: CEREBRAS_API_KEY environment variable not found
Solutions:
- Set the environment variable:
export CEREBRAS_API_KEY="csk-your-key-here"
- Create a
.env
file:echo 'CEREBRAS_API_KEY=csk-your-key-here' > .env
- Pass directly (not recommended):
process_document(input_data="file.md", api_key="csk-...")
Error: Invalid API Key Format
Symptom:
Warning: API key appears to be a placeholder
Solution:
- API key must start with
csk-
- Get a valid key from cerebras.ai
- Check for typos or extra spaces
Rate Limiting
Error: Rate limit exceeded
Symptom:
RateLimitError: 429 Too Many Requests
Solutions:
- Wait for reset:
- Per-minute limits: 60 seconds
- Daily limits: midnight UTC
- Reduce parallel workers:
cerebrate-file . --recurse "**/*.md" --workers 2
- Process in batches:
# Process 10 files at a time find . -name "*.md" | head -10 | xargs -I {} cerebrate-file {}
- Check remaining quota:
cerebrate-file small.txt --verbose | grep "Remaining"
Token Limit Issues
Error: Context length exceeded
Symptom:
TokenLimitError: Maximum context length is 131072 tokens
Solutions:
- Reduce chunk size:
cerebrate-file large.md --chunk_size 24000
- Lower completion ratio:
cerebrate-file doc.md --max_tokens_ratio 50
- Reduce sample size:
cerebrate-file doc.md --sample_size 100
- Use simpler prompts:
- Shorter instructions = fewer tokens
- Remove redundant instructions
File Processing Errors
Error: File not found
Symptom:
FileNotFoundError: [Errno 2] No such file or directory
Solutions:
- Check file path:
ls -la input.md pwd # Verify current directory
- Use absolute paths:
cerebrate-file /full/path/to/file.md
- Check permissions:
ls -la file.md chmod 644 file.md # If needed
Error: Permission denied
Symptom:
PermissionError: [Errno 13] Permission denied
Solutions:
- Check file permissions:
chmod 644 input.md # Read permission chmod 755 output_dir/ # Directory access
- Check output directory:
mkdir -p output chmod 755 output
- Run with appropriate user:
sudo chown $USER:$USER file.md
Network Issues
Error: Connection timeout
Symptom:
NetworkError: HTTPSConnectionPool timeout
Solutions:
- Check internet connection:
ping api.cerebras.ai curl https://api.cerebras.ai
- Configure proxy if needed:
export HTTPS_PROXY="http://proxy:8080"
- Increase timeout (in code):
client = CerebrasClient(api_key, timeout=60)
- Retry with verbose mode:
cerebrate-file doc.md --verbose
Chunking Issues
Error: No chunks created
Symptom:
ValueError: No chunks were created from the input
Solutions:
- Check file content:
wc -l input.md # Check if file has content file input.md # Check file type
- Try different format:
cerebrate-file doc.md --data_format text
- Check encoding:
file -bi input.md # Check encoding iconv -f ISO-8859-1 -t UTF-8 input.md > input_utf8.md
Error: Chunks too large
Symptom:
Chunk size exceeds maximum token limit
Solution:
cerebrate-file doc.md --chunk_size 16000
Output Issues
Problem: Output is truncated
Solutions:
- Increase token ratio:
cerebrate-file doc.md --max_tokens_ratio 150
- Check for rate limiting:
- Look for incomplete responses
- Add
--verbose
to see details
- Process smaller chunks:
cerebrate-file doc.md --chunk_size 24000
Problem: Output formatting is broken
Solutions:
- Use appropriate format:
cerebrate-file doc.md --data_format markdown
- Preserve frontmatter:
cerebrate-file doc.md --explain
- Check prompt instructions:
- Ensure prompt doesn’t conflict with format
- Test with simpler prompts first
Recursive Processing Issues
Error: Invalid glob pattern
Symptom:
ValueError: Invalid pattern: **/*.{md,txt}
Solutions:
- Quote the pattern:
cerebrate-file . --recurse "**/*.{md,txt}"
- Use simpler patterns:
cerebrate-file . --recurse "**/*.md"
- Test pattern first:
find . -name "*.md" # Verify files exist
Problem: Not finding files
Solutions:
- Check current directory:
pwd ls -la
- Use correct pattern:
# Current directory only --recurse "*.md" # All subdirectories --recurse "**/*.md" # Specific directory --recurse "docs/**/*.md"
- Check file extensions:
find . -type f | head -20
Performance Issues
Problem: Processing is very slow
Solutions:
- Increase workers:
cerebrate-file . --recurse "**/*.md" --workers 8
- Use larger chunks:
cerebrate-file doc.md --chunk_size 48000
- Reduce sample size:
cerebrate-file doc.md --sample_size 100
- Check system resources:
top # Check CPU and memory df -h # Check disk space
Problem: High memory usage
Solutions:
- Process sequentially:
cerebrate-file . --recurse "**/*.md" --workers 1
- Smaller chunks:
cerebrate-file large.md --chunk_size 16000
- Process in batches:
for file in *.md; do cerebrate-file "$file" sleep 1 # Brief pause done
Error Messages Reference
API Errors
Error Code | Meaning | Solution |
---|---|---|
400 | Bad Request | Check prompt and parameters |
401 | Unauthorized | Verify API key |
403 | Forbidden | Check API key permissions |
429 | Rate Limited | Wait and retry |
500 | Server Error | Retry later |
503 | Service Unavailable | API maintenance, retry later |
Exit Codes
Code | Meaning | Typical Cause |
---|---|---|
0 | Success | Normal completion |
1 | General Error | Various issues |
2 | Invalid Arguments | Bad CLI parameters |
3 | API Key Not Found | Missing CEREBRAS_API_KEY |
4 | File Not Found | Input file doesn’t exist |
5 | Permission Denied | File access issues |
6 | API Error | Cerebras API problem |
7 | Rate Limit | Too many requests |
8 | Network Error | Connection issues |
Debugging Techniques
Enable Verbose Logging
# Maximum debugging information
cerebrate-file doc.md --verbose
# Save logs to file
cerebrate-file doc.md --verbose 2> debug.log
# Separate stdout and stderr
cerebrate-file doc.md --verbose \
1> output.txt \
2> errors.log
Test with Dry Run
# Test chunking without API calls
cerebrate-file large.md --dry_run --verbose
# Check what would be processed
cerebrate-file . --recurse "**/*.md" --dry_run
Validate Environment
# Check API key
echo $CEREBRAS_API_KEY | head -c 10
# Test API connection
curl -H "Authorization: Bearer $CEREBRAS_API_KEY" \
https://api.cerebras.ai/v1/models
# Check Python version
python --version
# Check package version
python -c "import cerebrate_file; print(cerebrate_file.__version__)"
Monitor Processing
# Watch progress
cerebrate-file doc.md --verbose | tee process.log
# Monitor system resources
watch -n 1 'ps aux | grep cerebrate'
# Check output files
watch -n 2 'ls -la output/'
Getting Help
Resources
- Documentation: Full documentation
- GitHub Issues: Report bugs
- Discussions: Ask questions
Reporting Issues
When reporting issues, include:
- Error message: Complete error output
- Command: Exact command used
- Environment:
cerebrate-file --version python --version echo $CEREBRAS_API_KEY | head -c 10
- File sample: Small reproducing example
- Verbose output: Run with
--verbose
Support Checklist
Before requesting help:
- Check this troubleshooting guide
- Update to latest version
- Test with a small file
- Try with
--verbose
flag - Check API key is valid
- Verify file permissions
- Test network connection
- Review error message carefully
FAQ
General Questions
Q: How much does it cost? A: Cerebras offers free tier with daily limits. Check cerebras.ai for pricing.
Q: What file types are supported? A: Any text file. Binary files need conversion to text first.
Q: What’s the maximum file size? A: No hard limit, but very large files may take long to process.
Q: Can I process PDFs? A: Convert PDF to text first using tools like pdftotext
.
Technical Questions
Q: Why is processing slow? A: Large files, small chunks, or rate limiting. Try increasing chunk size and workers.
Q: How do I process code files? A: Use --data_format code
for better code-aware chunking.
Q: Can I use multiple API keys? A: Not simultaneously. Process different batches with different keys.
Q: Does it work offline? A: No, requires internet connection to Cerebras API.
Best Practices
Q: What’s the optimal chunk size? A: 32,000-48,000 tokens for most content. Smaller for code.
Q: How many workers should I use? A: 4-8 workers typically optimal. Depends on system and rate limits.
Q: Should I use streaming? A: Yes (default). Provides better progress feedback.
Q: How do I preserve formatting? A: Use appropriate --data_format
for your content type.
Next Steps
- Review Configuration for optimization
- See Examples for working solutions
- Check API Reference for programmatic use
- Explore CLI Reference for all options