Csv Excel Merger

822 installs168 stars

Summary

Handles the annoying work of merging CSV and Excel files when columns don't match up perfectly. Does fuzzy matching on column names (so "Email", "e-mail", and "email_address" get unified), deduplicates based on your primary key, and flags conflicts when the same record appears with different values across files. Spits out a detailed report showing what got merged, what got deduplicated, and what needs manual review. The conflict resolution options are solid: keep first, keep last, keep longest value, or flag for review. Honestly most useful when you're combining contact lists or data exports from different systems and don't want to manually reconcile column schemas. Saves the tedious pandas boilerplate you'd otherwise write yourself.

Install to Claude Code

npx -y skills add onewave-ai/claude-skills --skill csv-excel-merger --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Files

SKILL.mdView on GitHub

CSV/Excel Merger

Merge multiple CSV or Excel files with automatic column matching, deduplication, and conflict resolution.

Workflow — the step-by-step merge process
Verification — confirm the merge before handing it back
Special cases — encoding, compound keys, large files
Guidelines — quality and transparency standards
Example triggers
references/merge_strategies.md — column matching, conflict resolution, and dedup options
references/output_template.md — the merge-report format

Workflow

Inspect the inputs. Determine file count, format (CSV / Excel / TSV), and whether the files are attached or read from disk. Read each header; identify column names, data types, and encoding (UTF-8, Latin-1). Note the candidate primary key.
Plan the merge. Match columns across files to one unified schema, choose a conflict-resolution rule, and pick a deduplication strategy. See references/merge_strategies.md for the matching heuristics and the full set of options.

Execute the merge with pandas:

import pandas as pd

df1 = pd.read_csv("file1.csv")
df2 = pd.read_csv("file2.csv")

# Normalize, then map column names onto the unified schema
for df in (df1, df2):
    df.columns = df.columns.str.lower().str.strip()
df2 = df2.rename(columns={"firstname": "first_name", "e_mail": "email"})

merged = pd.concat([df1, df2], ignore_index=True)
merged = merged.drop_duplicates(subset=["email"], keep="last")
merged.to_csv("merged_output.csv", index=False)

Verify the result before reporting — see Verification.
Report using the layout in references/output_template.md, then offer export options: CSV (UTF-8), Excel (.xlsx), JSON, SQL INSERT statements, or Parquet for large datasets.

Verification

Never hand back a merge without checking it. After merging, assert the row math holds and the key is actually unique:

total_in = len(df1) + len(df2)
assert len(merged) > 0, "merge produced an empty frame"
assert len(merged) <= total_in, "more rows than inputs — check the concat/join"
assert merged["email"].is_unique, "duplicate keys remain after dedup"

print(f"in: {total_in} rows | out: {len(merged)} rows | removed: {total_in - len(merged)}")
print(f"null keys: {merged['email'].isna().sum()} | columns: {list(merged.columns)}")

Report rows in vs. out, duplicates removed, and per-column completeness so the user can sanity-check the numbers against their own expectations.

Special cases

Compound keys — when no single column is unique, key on a tuple: subset=["email", "company"].
Mixed data types — standardize dates, phone numbers, and country codes; strip whitespace and normalize casing before deduping, or near-duplicates slip through.
Missing columns — fill absent columns with empty values and flag them in the report; never silently drop data.
Large files (>100MB) — read in chunks (pd.read_csv(path, chunksize=...)), report progress, and estimate memory before loading everything at once.

Guidelines

Column matching — prefer exact, then case-insensitive, then fuzzy. Always emit the original → unified mapping so every match is auditable, and allow manual override.
Data quality — trim whitespace, standardize formats, flag invalid values, preserve types.
Transparency — track the source file for every surviving row, log each merge decision, and report all conflicts with their resolutions.
Performance — chunk large files, process in batches, and show progress on long-running merges.

Example triggers

"Merge these three CSV files"
"Combine multiple Excel sheets into one file"
"Deduplicate and merge customer data"
"Join spreadsheets with different column names"
"Consolidate contact lists from different sources"

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Make money from your Skills

On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.

Start earning →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

Categories

Office & Documents

First SeenApr 16, 2026

View on GitHub

Contents

Workflow — the step-by-step merge process

Verification — confirm the merge before handing it back

Special cases — encoding, compound keys, large files

Guidelines — quality and transparency standards

Example triggers

references/merge_strategies.md — column matching, conflict resolution, and dedup options

references/output_template.md — the merge-report format

Workflow

Inspect the inputs. Determine file count, format (CSV / Excel / TSV), and whether the files are attached or read from disk. Read each header; identify column names, data types, and encoding (UTF-8, Latin-1). Note the candidate primary key.

Plan the merge. Match columns across files to one unified schema, choose a conflict-resolution rule, and pick a deduplication strategy. See references/merge_strategies.md for the matching heuristics and the full set of options.

Execute the merge with pandas:

import pandas as pd

df1 = pd.read_csv("file1.csv")
df2 = pd.read_csv("file2.csv")

# Normalize, then map column names onto the unified schema
for df in (df1, df2):
    df.columns = df.columns.str.lower().str.strip()
df2 = df2.rename(columns={"firstname": "first_name", "e_mail": "email"})

merged = pd.concat([df1, df2], ignore_index=True)
merged = merged.drop_duplicates(subset=["email"], keep="last")
merged.to_csv("merged_output.csv", index=False)

Verify the result before reporting — see Verification.

Report using the layout in references/output_template.md, then offer export options: CSV (UTF-8), Excel (.xlsx), JSON, SQL INSERT statements, or Parquet for large datasets.

Verification

Never hand back a merge without checking it. After merging, assert the row math holds and the key is actually unique:

total_in = len(df1) + len(df2) assert len(merged) > 0, "merge produced an empty frame" assert len(merged) <= total_in, "more rows than inputs — check the concat/join" assert merged["email"].is_unique, "duplicate keys remain after dedup" print(f"in: {total_in} rows | out: {len(merged)} rows | removed: {total_in - len(merged)}") print(f"null keys: {merged['email'].isna().sum()} | columns: {list(merged.columns)}")

Report rows in vs. out, duplicates removed, and per-column completeness so the user can sanity-check the numbers against their own expectations.

Special cases

Compound keys — when no single column is unique, key on a tuple: subset=["email", "company"].

Mixed data types — standardize dates, phone numbers, and country codes; strip whitespace and normalize casing before deduping, or near-duplicates slip through.

Missing columns — fill absent columns with empty values and flag them in the report; never silently drop data.

Large files (>100MB) — read in chunks (pd.read_csv(path, chunksize=...)), report progress, and estimate memory before loading everything at once.

Guidelines

Column matching — prefer exact, then case-insensitive, then fuzzy. Always emit the original → unified mapping so every match is auditable, and allow manual override.

Data quality — trim whitespace, standardize formats, flag invalid values, preserve types.

Transparency — track the source file for every surviving row, log each merge decision, and report all conflicts with their resolutions.

Performance — chunk large files, process in batches, and show progress on long-running merges.

Csv Excel Merger

Install to Claude Code

CSV/Excel Merger

Contents

Workflow

Verification

Special cases

Guidelines

Example triggers

Csv Excel Merger

Install to Claude Code

CSV/Excel Merger

Contents

Workflow

Verification

Special cases

Guidelines

Example triggers

Recommended

Recommended