Handles the annoying work of merging CSV and Excel files when columns don't match up perfectly. Does fuzzy matching on column names (so "Email", "e-mail", and "email_address" get unified), deduplicates based on your primary key, and flags conflicts when the same record appears with different values across files. Spits out a detailed report showing what got merged, what got deduplicated, and what needs manual review. The conflict resolution options are solid: keep first, keep last, keep longest value, or flag for review. Honestly most useful when you're combining contact lists or data exports from different systems and don't want to manually reconcile column schemas. Saves the tedious pandas boilerplate you'd otherwise write yourself.
npx -y skills add onewave-ai/claude-skills --skill csv-excel-merger --agent claude-codeInstalls into .claude/skills of the current project.
Merge multiple CSV or Excel files with automatic column matching, deduplication, and conflict resolution.
references/merge_strategies.md — column matching, conflict resolution, and dedup optionsreferences/output_template.md — the merge-report formatInspect the inputs. Determine file count, format (CSV / Excel / TSV), and whether the files are attached or read from disk. Read each header; identify column names, data types, and encoding (UTF-8, Latin-1). Note the candidate primary key.
Plan the merge. Match columns across files to one unified schema, choose a conflict-resolution rule, and pick a deduplication strategy. See references/merge_strategies.md for the matching heuristics and the full set of options.
Execute the merge with pandas:
import pandas as pd
df1 = pd.read_csv("file1.csv")
df2 = pd.read_csv("file2.csv")
# Normalize, then map column names onto the unified schema
for df in (df1, df2):
df.columns = df.columns.str.lower().str.strip()
df2 = df2.rename(columns={"firstname": "first_name", "e_mail": "email"})
merged = pd.concat([df1, df2], ignore_index=True)
merged = merged.drop_duplicates(subset=["email"], keep="last")
merged.to_csv("merged_output.csv", index=False)
Verify the result before reporting — see Verification.
Report using the layout in references/output_template.md, then offer export options: CSV (UTF-8), Excel (.xlsx), JSON, SQL INSERT statements, or Parquet for large datasets.
Never hand back a merge without checking it. After merging, assert the row math holds and the key is actually unique:
total_in = len(df1) + len(df2)
assert len(merged) > 0, "merge produced an empty frame"
assert len(merged) <= total_in, "more rows than inputs — check the concat/join"
assert merged["email"].is_unique, "duplicate keys remain after dedup"
print(f"in: {total_in} rows | out: {len(merged)} rows | removed: {total_in - len(merged)}")
print(f"null keys: {merged['email'].isna().sum()} | columns: {list(merged.columns)}")
Report rows in vs. out, duplicates removed, and per-column completeness so the user can sanity-check the numbers against their own expectations.
subset=["email", "company"].pd.read_csv(path, chunksize=...)), report progress, and estimate memory before loading everything at once.larksuite/cli
googleworkspace/cli
googleworkspace/cli
googleworkspace/cli