LIMITED OFFERUnlimited conversions for $1/week — Cancel anytimeStart trial
Technical Deep DiveAdvancedJanuary 1, 2025

Transaction Matching Algorithms Explained

A technical examination of the algorithms that power automated bank reconciliation. From exact matching to machine learning-based approaches, understand how modern reconciliation software achieves 95%+ automatic match rates.

Transaction matching is the core algorithmic challenge in bank reconciliation. The goal is to automatically pair bank statement transactions with general ledger entries, reducing manual work while maintaining accuracy. This guide explores the technical approaches used in modern reconciliation software.

1Introduction to Transaction Matching

Transaction matching compares two datasets: bank statement transactions (source of truth for cash) and general ledger entries (accounting records). The challenge is that these datasets rarely align perfectly due to timing differences, naming variations, and batch processing.

Why Perfect Matching is Hard

Timing differences:Check clears 3 days after recording
Description variations:"AMAZON" vs "AMZN*Marketplace"
Batch transactions:5 checks grouped in one bank entry
Bank fees:Not recorded until statement arrives
Rounding:Currency conversion differences
Reversals:Voided checks, refunds

Effective matching algorithms must handle these variations while avoiding false positives (incorrect matches that introduce errors).

2Exact Matching Algorithms

Exact matching is the first pass in any reconciliation system. It identifies transactions where all key fields match precisely.

Match Criteria

Exact Match Algorithm:

FOR each bank_transaction:
  FOR each gl_entry WHERE status = 'unmatched':
    IF bank_transaction.amount == gl_entry.amount
       AND bank_transaction.date == gl_entry.date
       AND normalize(bank_transaction.reference) == normalize(gl_entry.reference):

      mark_matched(bank_transaction, gl_entry)
      BREAK

Normalization function:
  - Remove spaces, special characters
  - Convert to uppercase
  - Strip leading zeros from check numbers

Performance Characteristics

  • Accuracy: 100% (by definition—exact matches cannot be wrong)
  • Coverage: 40-60% of transactions typically (depends on data quality)
  • Complexity: O(n × m) naïve, O(n + m) with hash indexing

3Fuzzy Matching Techniques

Fuzzy matching handles transactions that should match but have minor differences. This is where algorithmic sophistication becomes important.

3.1 Date Tolerance Matching

Allow dates to differ within a configurable window (typically ±3 days):

Date Tolerance Match:

tolerance_days = 3  // Configurable

IF abs(bank_date - gl_date) <= tolerance_days
   AND bank_amount == gl_amount:

   confidence = 1.0 - (day_difference / 10)  // Decay with distance
   propose_match(bank, gl, confidence)

3.2 String Similarity (Levenshtein Distance)

For description matching, Levenshtein distance measures the minimum edits needed to transform one string into another:

String Similarity:

bank_desc = "AMAZON MARKETPLACE"
gl_desc = "AMZN*Marketplace LLC"

levenshtein_distance = 8
max_length = 20
similarity = 1 - (8 / 20) = 0.60  // 60% similar

IF similarity >= threshold (0.5) AND amounts_match:
   propose_match(confidence = similarity * 0.9)

3.3 Phonetic Matching (Soundex/Metaphone)

For vendor names that sound similar but are spelled differently:

  • "Smith & Co" → "Smithco" (phonetically equivalent)
  • "Café Coffee" → "Cafe Coffee" (accent handling)

4One-to-Many Matching

Banks often group multiple transactions into a single line item. The algorithm must identify which GL entries combine to match the bank total.

Subset Sum Problem

Finding GL entries that sum to a bank amount is a variant of the subset sum problem. For reconciliation, we use heuristics since optimal solutions are NP-hard.

One-to-Many Matching:

bank_amount = $5,000.00
gl_candidates = [$1,200, $1,500, $800, $1,500, $2,000]

# Greedy approach: sort and sum until match
sorted_candidates = sort(gl_candidates, descending)
selected = []
running_sum = 0

FOR entry in sorted_candidates:
  IF running_sum + entry <= bank_amount:
    selected.append(entry)
    running_sum += entry
  IF running_sum == bank_amount:
    MATCH_FOUND

# In this case: $2,000 + $1,500 + $1,500 = $5,000 ✓

Constraints to prevent false positives:

  • Maximum subset size (e.g., max 10 entries per group)
  • Date window constraint (all entries within ±7 days)
  • Category consistency (entries should be similar type)

5Machine Learning Approaches

Modern reconciliation tools like Zera AI use machine learning to improve match accuracy beyond rule-based systems.

Feature Engineering

ML models use features extracted from transaction pairs:

Amount difference (absolute and relative)
Date difference (days)
Description similarity score
Vendor name match confidence
Transaction type compatibility
Historical match patterns
Time-of-day/day-of-week patterns
Recurring transaction detection

Model Architecture

ML Matching Pipeline:

1. Feature extraction
   features = extract_features(bank_txn, gl_entry)

2. Model prediction (e.g., gradient boosting)
   match_probability = model.predict(features)

3. Threshold application
   IF match_probability >= 0.95:
     auto_match(bank_txn, gl_entry)
   ELIF match_probability >= 0.70:
     suggest_match(bank_txn, gl_entry)  // Human review
   ELSE:
     no_match()

The model learns from historical reconciliation data, improving accuracy as more transactions are processed.

6Implementation Considerations

Matching Order Matters

Run exact matching first, then fuzzy, then one-to-many. This ensures high-confidence matches are made before lower-confidence algorithms run.

Confidence Scoring

Every match should have a confidence score. Auto-match at 95%+, review queue at 70-95%, no match below 70%.

Conflict Resolution

When multiple potential matches exist, use a scoring system to select the best match or flag for review.

Audit Trail

Record why each match was made (algorithm used, confidence score) for compliance and debugging.

7Performance Benchmarks

Expected performance metrics for well-implemented matching algorithms:

Matching TypeAccuracyCoverageUse Case
Combined System95%+95%+Full reconciliation

The remaining 5% that cannot be auto-matched typically require human judgment (unusual transactions, data quality issues, or legitimate discrepancies).

Ashish Josan
"My clients send me all kinds of messy PDFs from different banks. This tool handles them all and saves me probably 10 hours a week."

Ashish Josan

Manager, CPA at Manning Elliott

95%+ auto-match rate10 hours saved weekly

Experience 95%+ Auto-Match Rates

Zera Books uses advanced matching algorithms to automate bank reconciliation. See the technology in action with your own statements.

Try for one week