Usage Guide

How to use Cerebrate File effectively

Table of contents

  1. Basic Usage
    1. Processing a Single File
    2. Specifying Output File
    3. Adding Instructions
    4. Using Instruction Files
  2. Advanced Features
    1. Recursive Processing
    2. Parallel Processing
    3. Chunking Strategies
    4. Chunk Size Control
    5. Context Preservation
  3. Working with Different File Types
    1. Markdown Documents
    2. Source Code
    3. Plain Text
    4. Mixed Content
  4. Metadata Processing
    1. Extracting Metadata
    2. Preserving Frontmatter
  5. Model Parameters
    1. Temperature Control
    2. Top-p Sampling
  6. Monitoring and Debugging
    1. Verbose Mode
    2. Dry Run
    3. Progress Display
  7. Best Practices
    1. 1. Chunk Sizes
    2. 2. Chunking Strategy
    3. 3. Rate Limits
    4. 4. Large Projects
    5. 5. Preserve Context
  8. Common Workflows
    1. Document Translation
    2. Code Documentation
    3. Content Summarization
    4. Style Transformation
    5. Batch Processing
  9. Error Handling
    1. Rate Limits
    2. Network Issues
    3. Large Files
  10. Tips and Tricks
    1. 1. Preview Changes
    2. 2. Save Prompts
    3. 3. Chain Processing
    4. 4. Use Shell Features
  11. Next Steps

Basic Usage

Processing a Single File

The simplest way to use Cerebrate File is on one document:

cerebrate-file input.md

This overwrites input.md with the processed version.

Specifying Output File

To save the result elsewhere:

cerebrate-file input.md --output output.md

Adding Instructions

Add instructions for the AI:

cerebrate-file document.md \
  --prompt "Summarize each section in 2-3 sentences"

Using Instruction Files

For longer or reusable instructions, use a file:

cerebrate-file report.md \
  --file_prompt instructions.md \
  --output summary.md

Advanced Features

Recursive Processing

Process multiple files by pattern:

# All markdown files, recursively
cerebrate-file . --output ./processed --recurse "**/*.md"

# Specific file types
cerebrate-file ./src --output ./docs --recurse "**/*.{py,js,ts}"

# Limit depth
cerebrate-file . --output ./output --recurse "*.txt"      # Current dir only
cerebrate-file . --output ./output --recurse "*/*.txt"     # One level deep

Parallel Processing

Speed up processing with multiple workers:

# Use 8 workers
cerebrate-file . --output ./output --recurse "**/*.md" --workers 8

# Auto-detect based on CPU cores
cerebrate-file . --output ./output --recurse "**/*.md" --workers 0

Chunking Strategies

Choose the right strategy for your content:

# Markdown-aware (default)
cerebrate-file doc.md --data_format markdown

# Code-aware for source files
cerebrate-file script.py --data_format code

# Semantic chunking for natural text
cerebrate-file article.txt --data_format semantic

# Plain text
cerebrate-file data.txt --data_format text

Chunk Size Control

Adjust chunk sizes:

# Smaller chunks = more detail
cerebrate-file large.md --chunk_size 16000

# Larger chunks = more context
cerebrate-file report.md --chunk_size 64000

# Control output size
cerebrate-file doc.md --max_tokens_ratio 50  # Output uses 50% of chunk size

Context Preservation

Control overlap between chunks:

# More overlap = better continuity
cerebrate-file novel.md --sample_size 500

# Less overlap = faster processing
cerebrate-file data.csv --sample_size 50

Working with Different File Types

Markdown Documents

cerebrate-file README.md \
  --prompt "Add emojis to headers" \
  --data_format markdown

Source Code

cerebrate-file app.py \
  --prompt "Add docstrings" \
  --data_format code \
  --chunk_size 24000

Plain Text

cerebrate-file article.txt \
  --prompt "Fix grammar and clarify language" \
  --data_format text

Mixed Content

# Process multiple file types at once
cerebrate-file . --output ./processed \
  --recurse "**/*.{md,py,txt}" \
  --prompt "Improve docs and comments"

Metadata Processing

Extracting Metadata

Use --explain to extract/generate metadata:

cerebrate-file blog_post.md --explain

Extracts:

  • Title
  • Author
  • Document ID
  • Type
  • Date

Preserving Frontmatter

Markdown frontmatter is preserved automatically:

---
title: My Document
author: John Doe
---
# Content starts here...

Model Parameters

Temperature Control

Control creativity:

# High = more creative
cerebrate-file story.md --temp 0.9

# Low = more predictable
cerebrate-file technical.md --temp 0.3

Top-p Sampling

Control vocabulary diversity:

# Wider range of words
cerebrate-file creative.md --top_p 0.95

# Stick to common words
cerebrate-file formal.md --top_p 0.7

Monitoring and Debugging

Verbose Mode

See what’s happening:

cerebrate-file large.md --verbose

Displays:

  • Chunk boundaries
  • Token usage
  • API requests/responses
  • Rate limits
  • Timing info

Dry Run

Test chunking without calling the API:

cerebrate-file huge.md --dry_run --verbose

Useful for:

  • Checking chunk sizes
  • Validating token limits
  • Testing file patterns
  • Debugging

Progress Display

The terminal shows:

  • Current file
  • Progress percentage
  • Output path
  • Remaining API calls

Best Practices

1. Chunk Sizes

  • Small files (<10K tokens): Default 32K chunks work fine
  • Large files (>100K tokens): Try 48K–64K chunks
  • Code files: 24K chunks help keep functions intact

2. Chunking Strategy

  • Markdown: Use markdown
  • Code: Use code
  • Articles: Use semantic
  • Structured data: Use text

3. Rate Limits

  • Watch remaining requests: πŸ“Š Remaining today: X
  • Use --workers carefully
  • Add delays if hitting limits

4. Large Projects

Process in controlled batches:

# Shell-based batching
find . -name "*.md" -print0 | \
  xargs -0 -n 10 cerebrate-file --output ./processed

# Or with limited parallelism
cerebrate-file . --output ./output \
  --recurse "**/*.md" \
  --workers 4

5. Preserve Context

For continuous text:

cerebrate-file book.md \
  --sample_size 500 \
  --chunk_size 48000 \
  --prompt "Keep narrative voice consistent"

Common Workflows

Document Translation

cerebrate-file document.md \
  --prompt "Translate to Spanish, keep formatting" \
  --output documento.md

Code Documentation

cerebrate-file ./src \
  --recurse "**/*.py" \
  --prompt "Add Google-style docstrings" \
  --output ./documented

Content Summarization

cerebrate-file reports/ \
  --recurse "*.pdf.txt" \
  --prompt "Executive summary, 500 words max" \
  --output summaries/

Style Transformation

cerebrate-file blog.md \
  --file_prompt style_guide.md \
  --prompt "Rewrite in professional tone" \
  --output blog_professional.md

Batch Processing

# Apply same instructions to all markdown files
for file in *.md; do
  cerebrate-file "$file" \
    --file_prompt instructions.md \
    --output "processed/${file}"
done

Error Handling

Rate Limits

Cerebrate File handles them automatically:

  • Exponential backoff
  • Retries with delays
  • Clear status updates

Network Issues

For flaky connections:

# Verbose mode helps debug retries
cerebrate-file document.md --verbose

Large Files

If you hit token limits:

# Reduce chunk size and output ratio
cerebrate-file huge.md \
  --chunk_size 24000 \
  --max_tokens_ratio 50

Tips and Tricks

1. Preview Changes

Dry run before processing:

cerebrate-file doc.md --dry_run --verbose

2. Save Prompts

Create reusable instruction files:

echo "Your instructions here" > prompts/summarize.md
cerebrate-file doc.md --file_prompt prompts/summarize.md

3. Chain Processing

Multi-step workflows:

# Step 1: Translate
cerebrate-file doc.md --prompt "Translate to Spanish" --output doc_es.md

# Step 2: Summarize
cerebrate-file doc_es.md --prompt "Summarize key points" --output summary_es.md

4. Use Shell Features

Leverage shell tools:

# Process files modified today
find . -name "*.md" -mtime -1 -exec cerebrate-file {} \;

# Confirm before processing
for file in *.txt; do
  read -p "Process $file? " -n 1 -r
  echo
  if [[ $REPLY =~ ^[Yy]$ ]]; then
    cerebrate-file "$file"
  fi
done

Next Steps


Table of contents


Back to top

Copyright © 2024-2025 Adam Twardoch. Distributed under the Apache 2.0 license.