Troubleshooting

Solutions to common issues and error messages.

Troubleshooting

Common Issues

API Key Issues

Error: CEREBRAS_API_KEY not found

Symptom:

Error: CEREBRAS_API_KEY environment variable not found

Solutions:

Set the environment variable:

export CEREBRAS_API_KEY="csk-your-key-here"

Create a .env file:

echo 'CEREBRAS_API_KEY=csk-your-key-here' > .env

Pass directly (not recommended):

process_document(input_data="file.md", api_key="csk-...")

Error: Invalid API Key Format

Symptom:

Warning: API key appears to be a placeholder

Solution:

API key must start with csk-
Get a valid key from cerebras.ai
Check for typos or extra spaces

Rate Limiting

Error: Rate limit exceeded

Symptom:

RateLimitError: 429 Too Many Requests

Solutions:

Wait for reset:
- Per-minute limits: 60 seconds
- Daily limits: midnight UTC

Reduce parallel workers:

cerebrate-file . --recurse "**/*.md" --workers 2

Process in batches:

# Process 10 files at a time
find . -name "*.md" | head -10 | xargs -I {} cerebrate-file {}

Check remaining quota:

cerebrate-file small.txt --verbose | grep "Remaining"

Token Limit Issues

Error: Context length exceeded

Symptom:

TokenLimitError: Maximum context length is 131072 tokens

Solutions:

Reduce chunk size:

cerebrate-file large.md --chunk_size 24000

Lower completion ratio:

cerebrate-file doc.md --max_tokens_ratio 50

Reduce sample size:

cerebrate-file doc.md --sample_size 100

Use simpler prompts:
- Shorter instructions = fewer tokens
- Remove redundant instructions

File Processing Errors

Error: File not found

Symptom:

FileNotFoundError: [Errno 2] No such file or directory

Solutions:

Check file path:

ls -la input.md
pwd  # Verify current directory

Use absolute paths:
```
cerebrate-file /full/path/to/file.md
```

Check permissions:

ls -la file.md
chmod 644 file.md  # If needed

Error: Permission denied

Symptom:

PermissionError: [Errno 13] Permission denied

Solutions:

Check file permissions:

chmod 644 input.md  # Read permission
chmod 755 output_dir/  # Directory access

Check output directory:
```
mkdir -p output
chmod 755 output
```
Run with appropriate user:
```
sudo chown $USER:$USER file.md
```

Network Issues

Error: Connection timeout

Symptom:

NetworkError: HTTPSConnectionPool timeout

Solutions:

Check internet connection:

ping api.cerebras.ai
curl https://api.cerebras.ai

Configure proxy if needed:
```
export HTTPS_PROXY="http://proxy:8080"
```

Increase timeout (in code):

client = CerebrasClient(api_key, timeout=60)

Retry with verbose mode:
```
cerebrate-file doc.md --verbose
```

Chunking Issues

Error: No chunks created

Symptom:

ValueError: No chunks were created from the input

Solutions:

Check file content:

wc -l input.md  # Check if file has content
file input.md    # Check file type

Try different format:

cerebrate-file doc.md --data_format text

Check encoding:

file -bi input.md  # Check encoding
iconv -f ISO-8859-1 -t UTF-8 input.md > input_utf8.md

Error: Chunks too large

Symptom:

Chunk size exceeds maximum token limit

Solution:

cerebrate-file doc.md --chunk_size 16000

Output Issues

Problem: Output is truncated

Solutions:

Increase token ratio:

cerebrate-file doc.md --max_tokens_ratio 150

Check for rate limiting:
- Look for incomplete responses
- Add --verbose to see details

Process smaller chunks:

cerebrate-file doc.md --chunk_size 24000

Problem: Output formatting is broken

Solutions:

Use appropriate format:

cerebrate-file doc.md --data_format markdown

Preserve frontmatter:
```
cerebrate-file doc.md --explain
```
Check prompt instructions:
- Ensure prompt doesn’t conflict with format
- Test with simpler prompts first

Recursive Processing Issues

Error: Invalid glob pattern

Symptom:

ValueError: Invalid pattern: **/*.{md,txt}

Solutions:

Quote the pattern:

cerebrate-file . --recurse "**/*.{md,txt}"

Use simpler patterns:
```
cerebrate-file . --recurse "**/*.md"
```

Test pattern first:

find . -name "*.md"  # Verify files exist

Problem: Not finding files

Solutions:

Check current directory:
```
pwd
ls -la
```

Use correct pattern:

# Current directory only
--recurse "*.md"

# All subdirectories
--recurse "**/*.md"

# Specific directory
--recurse "docs/**/*.md"

Check file extensions:
```
find . -type f | head -20
```

Performance Issues

Problem: Processing is very slow

Solutions:

Increase workers:

cerebrate-file . --recurse "**/*.md" --workers 8

Use larger chunks:

cerebrate-file doc.md --chunk_size 48000

Reduce sample size:

cerebrate-file doc.md --sample_size 100

Check system resources:

top  # Check CPU and memory
df -h  # Check disk space

Problem: High memory usage

Solutions:

Process sequentially:

cerebrate-file . --recurse "**/*.md" --workers 1

Smaller chunks:

cerebrate-file large.md --chunk_size 16000

Process in batches:

for file in *.md; do
  cerebrate-file "$file"
  sleep 1  # Brief pause
done

Error Messages Reference

API Errors

Error Code	Meaning	Solution
400	Bad Request	Check prompt and parameters
401	Unauthorized	Verify API key
403	Forbidden	Check API key permissions
429	Rate Limited	Wait and retry
500	Server Error	Retry later
503	Service Unavailable	API maintenance, retry later

Exit Codes

Code	Meaning	Typical Cause
0	Success	Normal completion
1	General Error	Various issues
2	Invalid Arguments	Bad CLI parameters
3	API Key Not Found	Missing CEREBRAS_API_KEY
4	File Not Found	Input file doesn’t exist
5	Permission Denied	File access issues
6	API Error	Cerebras API problem
7	Rate Limit	Too many requests
8	Network Error	Connection issues

Debugging Techniques

Enable Verbose Logging

# Maximum debugging information
cerebrate-file doc.md --verbose

# Save logs to file
cerebrate-file doc.md --verbose 2> debug.log

# Separate stdout and stderr
cerebrate-file doc.md --verbose \
  1> output.txt \
  2> errors.log

Test with Dry Run

# Test chunking without API calls
cerebrate-file large.md --dry_run --verbose

# Check what would be processed
cerebrate-file . --recurse "**/*.md" --dry_run

Validate Environment

# Check API key
echo $CEREBRAS_API_KEY | head -c 10

# Test API connection
curl -H "Authorization: Bearer $CEREBRAS_API_KEY" \
  https://api.cerebras.ai/v1/models

# Check Python version
python --version

# Check package version
python -c "import cerebrate_file; print(cerebrate_file.__version__)"

Monitor Processing

# Watch progress
cerebrate-file doc.md --verbose | tee process.log

# Monitor system resources
watch -n 1 'ps aux | grep cerebrate'

# Check output files
watch -n 2 'ls -la output/'

Getting Help

Resources

Documentation: Full documentation
GitHub Issues: Report bugs
Discussions: Ask questions

Reporting Issues

When reporting issues, include:

Error message: Complete error output
Command: Exact command used

Environment:

cerebrate-file --version
python --version
echo $CEREBRAS_API_KEY | head -c 10

File sample: Small reproducing example
Verbose output: Run with --verbose

Support Checklist

Before requesting help:

FAQ

General Questions

Q: How much does it cost? A: Cerebras offers free tier with daily limits. Check cerebras.ai for pricing.

Q: What file types are supported? A: Any text file. Binary files need conversion to text first.

Q: What’s the maximum file size? A: No hard limit, but very large files may take long to process.

Q: Can I process PDFs? A: Convert PDF to text first using tools like pdftotext.

Technical Questions

Q: Why is processing slow? A: Large files, small chunks, or rate limiting. Try increasing chunk size and workers.

Q: How do I process code files? A: Use --data_format code for better code-aware chunking.

Q: Can I use multiple API keys? A: Not simultaneously. Process different batches with different keys.

Q: Does it work offline? A: No, requires internet connection to Cerebras API.

Best Practices

Q: What’s the optimal chunk size? A: 32,000-48,000 tokens for most content. Smaller for code.

Q: How many workers should I use? A: 4-8 workers typically optimal. Depends on system and rate limits.

Q: Should I use streaming? A: Yes (default). Provides better progress feedback.

Q: How do I preserve formatting? A: Use appropriate --data_format for your content type.

Next Steps

Review Configuration for optimization
See Examples for working solutions
Check API Reference for programmatic use
Explore CLI Reference for all options

Troubleshooting

Table of contents

Common Issues

API Key Issues

Error: CEREBRAS_API_KEY not found

Error: Invalid API Key Format

Rate Limiting

Error: Rate limit exceeded

Token Limit Issues

Error: Context length exceeded

File Processing Errors

Error: File not found

Error: Permission denied

Network Issues

Error: Connection timeout

Chunking Issues

Error: No chunks created

Error: Chunks too large

Output Issues

Problem: Output is truncated

Problem: Output formatting is broken

Recursive Processing Issues

Error: Invalid glob pattern

Problem: Not finding files

Performance Issues

Problem: Processing is very slow

Problem: High memory usage

Error Messages Reference

API Errors

Exit Codes

Debugging Techniques

Enable Verbose Logging

Test with Dry Run

Validate Environment

Monitor Processing

Getting Help

Resources

Reporting Issues

Support Checklist

FAQ

General Questions

Technical Questions

Best Practices

Next Steps