Transaction matching is the core algorithmic challenge in bank reconciliation. The goal is to automatically pair bank statement transactions with general ledger entries, reducing manual work while maintaining accuracy. This guide explores the technical approaches used in modern reconciliation software.
1Introduction to Transaction Matching
Transaction matching compares two datasets: bank statement transactions (source of truth for cash) and general ledger entries (accounting records). The challenge is that these datasets rarely align perfectly due to timing differences, naming variations, and batch processing.
Why Perfect Matching is Hard
Effective matching algorithms must handle these variations while avoiding false positives (incorrect matches that introduce errors).
2Exact Matching Algorithms
Exact matching is the first pass in any reconciliation system. It identifies transactions where all key fields match precisely.
Match Criteria
Exact Match Algorithm:
FOR each bank_transaction:
FOR each gl_entry WHERE status = 'unmatched':
IF bank_transaction.amount == gl_entry.amount
AND bank_transaction.date == gl_entry.date
AND normalize(bank_transaction.reference) == normalize(gl_entry.reference):
mark_matched(bank_transaction, gl_entry)
BREAK
Normalization function:
- Remove spaces, special characters
- Convert to uppercase
- Strip leading zeros from check numbersPerformance Characteristics
- Accuracy: 100% (by definition—exact matches cannot be wrong)
- Coverage: 40-60% of transactions typically (depends on data quality)
- Complexity: O(n × m) naïve, O(n + m) with hash indexing
3Fuzzy Matching Techniques
Fuzzy matching handles transactions that should match but have minor differences. This is where algorithmic sophistication becomes important.
3.1 Date Tolerance Matching
Allow dates to differ within a configurable window (typically ±3 days):
Date Tolerance Match: tolerance_days = 3 // Configurable IF abs(bank_date - gl_date) <= tolerance_days AND bank_amount == gl_amount: confidence = 1.0 - (day_difference / 10) // Decay with distance propose_match(bank, gl, confidence)
3.2 String Similarity (Levenshtein Distance)
For description matching, Levenshtein distance measures the minimum edits needed to transform one string into another:
String Similarity: bank_desc = "AMAZON MARKETPLACE" gl_desc = "AMZN*Marketplace LLC" levenshtein_distance = 8 max_length = 20 similarity = 1 - (8 / 20) = 0.60 // 60% similar IF similarity >= threshold (0.5) AND amounts_match: propose_match(confidence = similarity * 0.9)
3.3 Phonetic Matching (Soundex/Metaphone)
For vendor names that sound similar but are spelled differently:
- "Smith & Co" → "Smithco" (phonetically equivalent)
- "Café Coffee" → "Cafe Coffee" (accent handling)
4One-to-Many Matching
Banks often group multiple transactions into a single line item. The algorithm must identify which GL entries combine to match the bank total.
Subset Sum Problem
Finding GL entries that sum to a bank amount is a variant of the subset sum problem. For reconciliation, we use heuristics since optimal solutions are NP-hard.
One-to-Many Matching:
bank_amount = $5,000.00
gl_candidates = [$1,200, $1,500, $800, $1,500, $2,000]
# Greedy approach: sort and sum until match
sorted_candidates = sort(gl_candidates, descending)
selected = []
running_sum = 0
FOR entry in sorted_candidates:
IF running_sum + entry <= bank_amount:
selected.append(entry)
running_sum += entry
IF running_sum == bank_amount:
MATCH_FOUND
# In this case: $2,000 + $1,500 + $1,500 = $5,000 ✓Constraints to prevent false positives:
- Maximum subset size (e.g., max 10 entries per group)
- Date window constraint (all entries within ±7 days)
- Category consistency (entries should be similar type)
5Machine Learning Approaches
Modern reconciliation tools like Zera AI use machine learning to improve match accuracy beyond rule-based systems.
Feature Engineering
ML models use features extracted from transaction pairs:
Model Architecture
ML Matching Pipeline:
1. Feature extraction
features = extract_features(bank_txn, gl_entry)
2. Model prediction (e.g., gradient boosting)
match_probability = model.predict(features)
3. Threshold application
IF match_probability >= 0.95:
auto_match(bank_txn, gl_entry)
ELIF match_probability >= 0.70:
suggest_match(bank_txn, gl_entry) // Human review
ELSE:
no_match()The model learns from historical reconciliation data, improving accuracy as more transactions are processed.
6Implementation Considerations
Matching Order Matters
Run exact matching first, then fuzzy, then one-to-many. This ensures high-confidence matches are made before lower-confidence algorithms run.
Confidence Scoring
Every match should have a confidence score. Auto-match at 95%+, review queue at 70-95%, no match below 70%.
Conflict Resolution
When multiple potential matches exist, use a scoring system to select the best match or flag for review.
Audit Trail
Record why each match was made (algorithm used, confidence score) for compliance and debugging.
7Performance Benchmarks
Expected performance metrics for well-implemented matching algorithms:
| Matching Type | Accuracy | Coverage | Use Case |
|---|---|---|---|
| Combined System | 95%+ | 95%+ | Full reconciliation |
The remaining 5% that cannot be auto-matched typically require human judgment (unusual transactions, data quality issues, or legitimate discrepancies).
