Methodology¶

This page provides a complete technical description of the scoring approach used by NMTC Application Builder. It is intended for practitioners who need to understand how scores are computed, what assumptions are embedded, and where the analysis has meaningful limitations.

Executive summary¶

NMTC Application Builder scores applications by comparing their pipeline metrics against patterns extracted from CDFI Fund NMTC award announcements (CY2020–CY2024). It does not compute a win probability — it computes an alignment score reflecting how closely a pipeline resembles historical winners on five measurable dimensions.

The five dimensions and their weights mirror the relative importance of scoring criteria in recent CDFI Fund NOFA documents:

Dimension	Weight	Primary metric
Distress concentration	35%	% of QEI in deep/severe distressed tracts
Impact intensity	25%	FTE jobs created per $1MM QEI
Geographic diversity	20%	States served + HHI + rural %
Sector diversity	15%	Sectors represented + concentration ceiling
Pipeline quality	5%	Eligibility rate + project count + award size fit

Why we cannot compute true win probability¶

The CDFI Fund does not publish application-level microdata for non-winning applicants. What is publicly available:

Round-level aggregates: total applications received, awards made, total dollars allocated
Winner-level data: which CDEs received allocations and for what amounts (from award announcements)
Program-level impact statistics: aggregate jobs, units, and outcomes across the funded portfolio (from annual reports)

What is not available:

Application-level data for the ~65–70% of applicants who were not selected in any given round
NOFA scores for any application
Project-level pipeline data from any application (winner or non-winner)

Without the distribution of non-winner metrics, the standard conditional probability formula (P(win | metrics)) cannot be computed. We do not know whether applicants with 72% distress concentration are funded 40% of the time or 5% of the time — we only know that historical winners averaged 81% distress concentration.

This means any "win probability" generated from available data would be a classification score based on feature similarity to winners only, not a calibrated probability derived from the full applicant distribution. We report this explicitly as an alignment score to avoid misleading practitioners into treating it as a prediction.

How each dimensional score is computed¶

Distress concentration score (weight: 35%)¶

Input: pct_deep_or_severe — the fraction of total portfolio QEI deployed in census tracts classified as deep distress or severe distress by the CDFI Fund's NMTC Mapping Tool.

Historical winner statistics (CY2020–CY2024):

Statistic	Value
Mean	81%
Standard deviation	11%
P25	72%
P50 (median)	82%
P75	91%
P90	95%
Floor (minimum observed)	50%
Mean eligibility rate	96%

Scoring formula:

If pct_deep_or_severe < 50% (the historical floor):

score = (pct / 0.50) × 30

If pct_deep_or_severe ≥ 50%:

z = (pct - 0.81) / 0.11
score = Φ(z) × 100

where Φ is the standard normal CDF, approximated using math.erfc. The score is bounded [0, 100].

This formulation means a pipeline at the winner mean (81%) scores near 50 (50th percentile of the winner distribution), and a pipeline at the winner p75 (91%) scores near 75.

Impact intensity score (weight: 25%)¶

Input: jobs_per_million_qei — total FTE jobs created (from project projections) divided by total QEI in millions.

Historical winner statistics:

Statistic	Value
Mean	12.0 jobs/$MM
P25	6.0 jobs/$MM
P50 (median)	10.0 jobs/$MM
P75	18.0 jobs/$MM
Top decile	28.0 jobs/$MM

Scoring formula:

if jpm <= 0: score = 0
if jpm >= 28 (top decile): score = 100
else: z = (jpm - 12.0) / 6.0; score = Φ(z) × 100

Score is bounded [0, 100].

Geographic diversity score (weight: 20%)¶

Input: states_count (number of distinct states), hhi (Herfindahl-Hirschman Index of QEI concentration by state), rural_pct (fraction of QEI in rural tracts).

Historical winner statistics:

Statistic	Value
Mean states	7.2
Std states	3.8
P25 states	4
P50 states	7
P75 states	10
Mean HHI	620
Mean rural %	18%

Scoring formula:

states_score = Φ((states - 7.2) / 3.8) × 100

hhi_score = max(0, (2000 - hhi) / 2000 × 100)

rural_bonus = min(10, (rural_pct / 0.18) × 5)

score = min(100, states_score × 0.5 + hhi_score × 0.4 + rural_bonus)

The HHI score rewards lower concentration: an HHI of 0 (perfect equality) scores 100; an HHI ≥ 2000 scores 0. The rural bonus contributes up to 10 points and rewards reaching or exceeding the winner average rural allocation.

Sector diversity score (weight: 15%)¶

Input: sectors_represented (count of distinct sectors), max_single_sector_pct (fraction of QEI in the most concentrated sector).

Historical winner statistics:

Statistic	Value
Mean sectors represented	4.8
Max single sector concentration	35%
Healthcare share	22%
Affordable housing share	18%
Small business share	17%
Education share	14%
Community facility share	12%

Scoring formula:

sector_score = min(100, (sectors / 4.8) × 80)
conc_penalty = max(0, (max_single_pct - 0.35) × 200)
score = max(0, min(100, sector_score - conc_penalty))

A pipeline with 4.8 sectors (winner mean) and no sector exceeding 35% scores approximately 80. Each percentage point above the 35% ceiling subtracts 2 points from the sector score.

Pipeline quality score (weight: 5%)¶

Input: eligibility_pct (fraction of projects in eligible tracts), total_projects, requested_allocation.

Scoring formula:

eligibility_score = min(100, (pct / 0.96) × 90)

count_score = min(100, (projects / 14.5) × 80)

size_score:
  $35M–$65M → 90
  $25M–$35M → 70
  >$65M     → 75
  <$25M     → 50

score = eligibility_score × 0.4 + count_score × 0.3 + size_score × 0.3

Composite score¶

composite = (distress × 0.35) + (impact × 0.25) + (geographic × 0.20) + (sector × 0.15) + (pipeline × 0.05)

Bounded [0, 100] and rounded to one decimal place.

How the readiness score differs from the alignment score¶

The readiness score (ReadinessScore) measures internal application quality and completeness:

Is the pipeline NMTC-eligible?
Does the pipeline meet minimum distress thresholds?
Have all required fields been populated?
Do internal numbers (QEI, QLICI, project cost) satisfy program rules?

The alignment score (WinProbabilityScore) measures external competitiveness:

Does the pipeline look like historical winners?
Is the distress concentration in the winner distribution?
Does geographic diversity match winner patterns?

An application can have a high readiness score (everything is correctly filled out and eligible) but a low alignment score (distress concentration is only 55%, well below winner patterns). Both scores are necessary for a complete picture.

The optimization objective function¶

The PipelineOptimizer maximizes the composite alignment score subject to constraints. The objective function is the same formula used in the alignment score, applied to the selected subset rather than the full pipeline.

objective(selected) = composite_alignment_score(selected, requested_allocation, weights)

where weights = {distress: 0.35, impact: 0.25, geographic: 0.20, sector: 0.15, pipeline: 0.05}.

The optimizer uses no convexity assumptions — it is a heuristic (greedy + local search) and is not guaranteed to find the globally optimal subset. For pipelines of 15–25 projects with typical constraint sets, the heuristic converges to a near-optimal solution in fewer than 100 swap iterations. The max_iterations parameter (default 500) provides headroom for larger or more constrained problems.

Validation methodology¶

Three validation checks run automatically in analyze():

Eligibility check: Each project's is_nmtc_eligible field is verified. Projects with is_nmtc_eligible = False generate errors; projects with is_nmtc_eligible = None (not yet enriched) generate warnings.

Completeness check: All required PipelineProject fields (12 fields) and required CDEProfile fields are checked for presence and valid type. Optional fields with None values generate warnings, not errors.

Consistency check: Cross-field validation: - qei_request <= total_project_cost (QEI cannot exceed total project cost) - qlici_amount <= qei_request (QLICI is a component of QEI) - expected_jobs_created >= 0

Validation limitations¶

The scoring framework has several important limitations that practitioners should understand:

No non-winner data. Winner patterns are inferred from public award announcements and annual reports. The non-winner distribution is unknown. The scoring formulas are calibrated against winner data only.
Aggregate inference. Winner statistics (mean, standard deviation, percentiles) are inferred from program-level aggregates in annual reports, not computed from a microdata sample of individual applications. The inference introduces uncertainty, particularly for standard deviations and percentiles.
NOFA changes. The CDFI Fund revises NOFA criteria and scoring weights between rounds. The current library reflects the CY2024 NOFA structure. Scoring weights and dimensional definitions may not perfectly match future rounds.
Impact projections are self-reported. expected_jobs_created is entered by the CDE and not verified by the platform. Inflated job projections will improve the impact score without reflecting actual community outcomes.
Geographic centroids. The pipeline map uses state centroids, not exact project addresses. The geographic analysis itself uses census tract data (from nmtc-mapper), but the visualization is approximate.

Acknowledgments¶

The scoring methodology was developed by reference to the following primary sources:

CDFI Fund. New Markets Tax Credit Program: Notice of Funds Availability (NOFA), various years (2020–2024). U.S. Department of the Treasury.
CDFI Fund. NMTC Program Award Book, various years. U.S. Department of the Treasury.
CDFI Fund. Community Development Financial Institutions Fund Annual Report, FY2018–FY2023. U.S. Department of the Treasury.
U.S. Congress. Consolidated Appropriations Act, 2023, Pub. L. No. 117-328 (authorizing $5 billion in NMTC authority for CY2022).

The dimensional weighting structure is informed by but does not replicate the CDFI Fund's proprietary NOFA scoring rubric, which is not publicly disclosed at the question-level.