Usage Guide

How to use Cerebrate File effectively

Basic Usage
Advanced Features
Working with Different File Types
Metadata Processing
1. Extracting Metadata
2. Preserving Frontmatter
Model Parameters
1. Temperature Control
2. Top-p Sampling
Monitoring and Debugging
Best Practices
Common Workflows
Error Handling
Tips and Tricks
Next Steps

Basic Usage

Processing a Single File

The simplest way to use Cerebrate File is on one document:

cerebrate-file input.md

This overwrites input.md with the processed version.

Specifying Output File

To save the result elsewhere:

cerebrate-file input.md --output output.md

Adding Instructions

Add instructions for the AI:

cerebrate-file document.md \
  --prompt "Summarize each section in 2-3 sentences"

Using Instruction Files

For longer or reusable instructions, use a file:

cerebrate-file report.md \
  --file_prompt instructions.md \
  --output summary.md

Advanced Features

Recursive Processing

Process multiple files by pattern:

# All markdown files, recursively
cerebrate-file . --output ./processed --recurse "**/*.md"

# Specific file types
cerebrate-file ./src --output ./docs --recurse "**/*.{py,js,ts}"

# Limit depth
cerebrate-file . --output ./output --recurse "*.txt"      # Current dir only
cerebrate-file . --output ./output --recurse "*/*.txt"     # One level deep

Parallel Processing

Speed up processing with multiple workers:

# Use 8 workers
cerebrate-file . --output ./output --recurse "**/*.md" --workers 8

# Auto-detect based on CPU cores
cerebrate-file . --output ./output --recurse "**/*.md" --workers 0

Chunking Strategies

Choose the right strategy for your content:

# Markdown-aware (default)
cerebrate-file doc.md --data_format markdown

# Code-aware for source files
cerebrate-file script.py --data_format code

# Semantic chunking for natural text
cerebrate-file article.txt --data_format semantic

# Plain text
cerebrate-file data.txt --data_format text

Chunk Size Control

Adjust chunk sizes:

# Smaller chunks = more detail
cerebrate-file large.md --chunk_size 16000

# Larger chunks = more context
cerebrate-file report.md --chunk_size 64000

# Control output size
cerebrate-file doc.md --max_tokens_ratio 50  # Output uses 50% of chunk size

Context Preservation

Control overlap between chunks:

# More overlap = better continuity
cerebrate-file novel.md --sample_size 500

# Less overlap = faster processing
cerebrate-file data.csv --sample_size 50

Working with Different File Types

Markdown Documents

cerebrate-file README.md \
  --prompt "Add emojis to headers" \
  --data_format markdown

Source Code

cerebrate-file app.py \
  --prompt "Add docstrings" \
  --data_format code \
  --chunk_size 24000

Plain Text

cerebrate-file article.txt \
  --prompt "Fix grammar and clarify language" \
  --data_format text

Mixed Content

# Process multiple file types at once
cerebrate-file . --output ./processed \
  --recurse "**/*.{md,py,txt}" \
  --prompt "Improve docs and comments"

Metadata Processing

Extracting Metadata

Use --explain to extract/generate metadata:

cerebrate-file blog_post.md --explain

Extracts:

Title
Author
Document ID
Type
Date

Preserving Frontmatter

Markdown frontmatter is preserved automatically:

---
title: My Document
author: John Doe
---
# Content starts here...

Model Parameters

Temperature Control

Control creativity:

# High = more creative
cerebrate-file story.md --temp 0.9

# Low = more predictable
cerebrate-file technical.md --temp 0.3

Top-p Sampling

Control vocabulary diversity:

# Wider range of words
cerebrate-file creative.md --top_p 0.95

# Stick to common words
cerebrate-file formal.md --top_p 0.7

Monitoring and Debugging

Verbose Mode

See what’s happening:

cerebrate-file large.md --verbose

Displays:

Chunk boundaries
Token usage
API requests/responses
Rate limits
Timing info

Dry Run

Test chunking without calling the API:

cerebrate-file huge.md --dry_run --verbose

Useful for:

Checking chunk sizes
Validating token limits
Testing file patterns
Debugging

Progress Display

The terminal shows:

Current file
Progress percentage
Output path
Remaining API calls

Best Practices

1. Chunk Sizes

Small files (<10K tokens): Default 32K chunks work fine
Large files (>100K tokens): Try 48K–64K chunks
Code files: 24K chunks help keep functions intact

2. Chunking Strategy

Markdown: Use markdown
Code: Use code
Articles: Use semantic
Structured data: Use text

3. Rate Limits

Watch remaining requests: 📊 Remaining today: X
Use --workers carefully
Add delays if hitting limits

4. Large Projects

Process in controlled batches:

# Shell-based batching
find . -name "*.md" -print0 | \
  xargs -0 -n 10 cerebrate-file --output ./processed

# Or with limited parallelism
cerebrate-file . --output ./output \
  --recurse "**/*.md" \
  --workers 4

5. Preserve Context

For continuous text:

cerebrate-file book.md \
  --sample_size 500 \
  --chunk_size 48000 \
  --prompt "Keep narrative voice consistent"

Common Workflows

Document Translation

cerebrate-file document.md \
  --prompt "Translate to Spanish, keep formatting" \
  --output documento.md

Code Documentation

cerebrate-file ./src \
  --recurse "**/*.py" \
  --prompt "Add Google-style docstrings" \
  --output ./documented

Content Summarization

cerebrate-file reports/ \
  --recurse "*.pdf.txt" \
  --prompt "Executive summary, 500 words max" \
  --output summaries/

Style Transformation

cerebrate-file blog.md \
  --file_prompt style_guide.md \
  --prompt "Rewrite in professional tone" \
  --output blog_professional.md

Batch Processing

# Apply same instructions to all markdown files
for file in *.md; do
  cerebrate-file "$file" \
    --file_prompt instructions.md \
    --output "processed/${file}"
done

Error Handling

Rate Limits

Cerebrate File handles them automatically:

Exponential backoff
Retries with delays
Clear status updates

Network Issues

For flaky connections:

# Verbose mode helps debug retries
cerebrate-file document.md --verbose

Large Files

If you hit token limits:

# Reduce chunk size and output ratio
cerebrate-file huge.md \
  --chunk_size 24000 \
  --max_tokens_ratio 50

Tips and Tricks

1. Preview Changes

Dry run before processing:

cerebrate-file doc.md --dry_run --verbose

2. Save Prompts

Create reusable instruction files:

echo "Your instructions here" > prompts/summarize.md
cerebrate-file doc.md --file_prompt prompts/summarize.md

3. Chain Processing

Multi-step workflows:

# Step 1: Translate
cerebrate-file doc.md --prompt "Translate to Spanish" --output doc_es.md

# Step 2: Summarize
cerebrate-file doc_es.md --prompt "Summarize key points" --output summary_es.md

4. Use Shell Features

Leverage shell tools:

# Process files modified today
find . -name "*.md" -mtime -1 -exec cerebrate-file {} \;

# Confirm before processing
for file in *.txt; do
  read -p "Process $file? " -n 1 -r
  echo
  if [[ $REPLY =~ ^[Yy]$ ]]; then
    cerebrate-file "$file"
  fi
done

Next Steps

CLI Reference – full list of options
Examples – real-world use cases
Troubleshooting – common issues
API Reference – programmatic usage

Quick Start