This is a structured workflow for turning messy revenue data into defensible insights. It walks you through a seven-phase process from ingestion to output, with mandatory decision logging at every step (why this filter, why this metric, what assumptions you made). The bias checklist and data quality validator are the real value here: they force you to document survivorship bias, test weighting sensitivity, and add confidence intervals before you ship anything. Built for financial and RevOps contexts where stakeholders will actually question your methodology. Outputs range from exec slide decks to full Marimo notebooks with audit trails. If you've ever had an analysis questioned three months later and couldn't remember why you excluded certain records, this prevents that.
npx -y skills add casper-studios/casper-marketplace --skill data-analysis --agent claude-codeInstalls into .claude/skills of the current project.
A comprehensive data analysis and storytelling skill optimized for financial, SaaS, and RevOps contexts. This skill provides structured workflows for turning raw data into actionable insights with full transparency on analytical decisions, bias awareness, and progressive disclosure reporting.
Every analysis follows a 7-phase process:
1. SETUP → Initialize Marimo notebook (run init_marimo_notebook.py)
2. INGEST → Load data, document sources and assumptions
3. EXPLORE → EDA with logged decisions (why this viz, why this filter)
4. MODEL → If needed, with interpretable-first approach
5. INTERPRET → Apply bias checklist, hedge appropriately
6. WISHLIST → Document data gaps and proxies used
7. OUTPUT → Generate appropriate tier (slides/report/notebook)
Every analytical choice must be logged. This creates an audit trail and enables reproducibility.
| Decision Type | Example | Log Format |
|---|---|---|
| Data filtering | Removed 47 records with null revenue | FILTER: [reason] - [count] records affected |
| Metric choice | Used logo churn vs revenue churn | METRIC: [chosen] over [alternative] because [reason] |
| Visualization | Line chart for time series | VIZ: [type] because [reason] |
| Assumption | Assumed linear growth for projection | ASSUMPTION: [statement] - confidence: [H/M/L] |
| Proxy used | Used support tickets as NPS proxy | PROXY: [proxy] for [missing data] - quality: [S/M/W] |
# === DECISION LOG ===
# FILTER: Excluded trial accounts - 1,247 records removed
# METRIC: NRR over GRR because expansion is significant factor
# ASSUMPTION: Q4 seasonality similar to prior year - confidence: M
# PROXY: Support ticket sentiment for NPS - quality: Weak
Run the initialization script to create a new Marimo notebook with pre-built scaffolding:
python scripts/init_marimo_notebook.py <notebook_name>
This creates a .py file with:
When loading data:
# === DATA SOURCE ===
# Source: sales_data_2024.csv
# Loaded: 2024-01-15
# Records: 15,847 rows x 23 columns
# Note: Data through 2024-01-10, 5-day lag from source system
Follow this EDA checklist:
Log every visualization choice and filtering decision.
Prioritize interpretability:
Always provide:
Before finalizing insights, run the bias checklist. See references/biases.md for full checklist.
Quick check:
Hedge appropriately:
⚠️ GATE: Before proceeding to output, you MUST run the data quality validation checklist.
This is not optional. Run through references/data-quality-validator.md before finalizing:
Critical Patterns Checklist:
Statistical Checks:
Logic Checks:
Methodology Note on Time Horizons: When assessing skill vs luck (e.g., sales rep performance, investment returns):
Do not proceed to Phase 6/7 until this checklist is complete.
Document gaps and proxies. See references/data-wishlisting.md for patterns.
Format:
## Data Wishlist
| Missing Data | Proxy Used | Quality | Impact on Analysis |
|--------------|------------|---------|-------------------|
| Customer NPS | Support sentiment | Weak | Core finding, needs validation |
| True LTV | 12-month value | Moderate | Acceptable for segmentation |
Choose output tier based on audience and purpose:
| Tier | When to Use | Tool |
|---|---|---|
| Slides | Executive summary, board deck | generate_pptx_summary.py |
| Report | Detailed findings, stakeholder review | Markdown/PDF |
| Notebook | Full analysis, data team handoff | Marimo .py file |
For messy data that needs cleaning before analysis:
python scripts/profile_data.py <csv_file> --output data_quality_report.md
This generates:
Reference references/data-cleaning.md for:
Reference references/datetime-handling.md for:
For interactive monitoring dashboards:
python scripts/init_dashboard.py <dashboard_name>
This creates a Marimo dashboard with:
Reference references/dashboard-patterns.md for:
Before presenting or accepting analytical claims:
Reference references/data-quality-validator.md for comprehensive checklists:
Statistical Sins:
Chart Crimes:
Logic Fallacies:
Sanity Checks:
For exporting analysis results to Excel with proper formulas and formatting:
Reference references/xlsx-patterns.md for:
After creating Excel files with formulas, always recalculate:
python scripts/recalc.py output.xlsx
This ensures:
For extracting data from PDFs or creating PDF reports:
Reference references/pdf-patterns.md for:
Reference references/pdf-patterns.md for:
Load these as needed during analysis:
| Reference | When to Use |
|---|---|
references/metrics.md | Calculating SaaS/RevOps metrics |
references/biases.md | Interpretation phase, before finalizing insights |
references/report-templates.md | Structuring output (pyramid vs consulting style) |
references/visualization-guide.md | Choosing chart types, avoiding anti-patterns |
references/data-wishlisting.md | Documenting gaps, rating proxy quality |
references/data-cleaning.md | Data quality checks, cleaning patterns |
references/datetime-handling.md | Timezone, parsing, fiscal calendars |
references/dashboard-patterns.md | Marimo layouts, KPIs, interactivity |
references/data-quality-validator.md | Data quality validation, detecting issues |
references/xlsx-patterns.md | Excel output, financial model standards, formulas |
references/pdf-patterns.md | PDF extraction, report creation, manipulation |
| Script | Purpose | Usage |
|---|---|---|
scripts/init_marimo_notebook.py | Initialize analysis workspace | python scripts/init_marimo_notebook.py <name> |
scripts/generate_pptx_summary.py | Create slide deck from findings | python scripts/generate_pptx_summary.py <config.json> |
scripts/profile_data.py | Generate data quality report | python scripts/profile_data.py <csv_file> |
scripts/init_dashboard.py | Scaffold interactive dashboard | python scripts/init_dashboard.py <name> |
scripts/recalc.py | Recalculate Excel formulas | python scripts/recalc.py <xlsx_file> |
| Tool | Purpose | Why |
|---|---|---|
| Marimo | Notebook environment | Pure Python files, reactive, git-friendly |
| pandas | Data manipulation | Reliable LLM code generation, mature ecosystem |
| Matplotlib/Seaborn | Visualization | Publication-quality, static, well-supported |
| python-pptx | Slide generation | Programmatic PowerPoint creation |
| openpyxl | Excel files | Formulas, formatting, financial models |
| pypdf/pdfplumber | PDF handling | Extract text, tables; create reports |
| reportlab | PDF creation | Professional PDF reports |
Revenue analysis:
"Analyze our ARR trends by segment and identify drivers of growth/churn"
Pipeline analytics:
"Build a win rate analysis by deal size and sales rep"
Cohort analysis:
"Create a retention cohort analysis for customers acquired in 2023"
Forecasting:
"Project next quarter revenue based on current pipeline"
Board deck:
"Create an executive summary deck of our key SaaS metrics"
Data cleaning:
"Clean this messy CSV and profile the data quality"
Dashboard:
"Build a dashboard to monitor our key SaaS metrics"
Data validation:
"Validate these findings before I present them"
Excel output:
"Export this analysis to Excel with proper formulas and formatting"
PDF extraction:
"Extract the tables from this quarterly report PDF"
Financial model:
"Create a revenue projection model in Excel with scenario inputs"
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills