Data Sources¶
This page documents the primary data sources that underpin the intelligence layer and the 17-library open source stack the platform is built on.
CDFI Fund NMTC Award Announcements (CY2020–CY2024)¶
Source: cdfifund.gov/programs-training/programs/new-markets-tax-credit
What is publicly available: - Round-level aggregate statistics: total applications received, awards made, total dollars allocated, average award size - CDE-level award announcements: which entities received allocations and for how much
What we use from these sources:
- Round-level acceptance rates to populate NMTC_AWARD_ROUNDS and compute acceptance_rate_baseline
- Aggregate patterns to infer winner distributions for distress concentration, geographic diversity, sector mix, and impact intensity
What is NOT available: - Application-level data for non-winners (not published by the CDFI Fund) - Individual project-level details from winner applications - NOFA scores for any application (winner or non-winner)
This is the critical limitation that prevents computation of true win probability — see Methodology and Limitations for the full discussion.
CDFI Fund Annual Reports (FY2018–FY2023)¶
Source: cdfifund.gov/research — NMTC Program Annual Reports
What we use:
- Impact statistics tables: jobs created and retained per dollar of QEI invested (used to populate WINNER_IMPACT_BENCHMARKS)
- NMTC Investments by Business Type tables: sector allocation percentages across funded projects (used to populate WINNER_SECTOR_PATTERNS)
- Geographic deployment tables: state-level deployment breakdowns (used to infer WINNER_GEOGRAPHIC_PATTERNS)
Coverage: FY2018–FY2023 annual reports. FY2024 data will be incorporated when published.
Note on aggregation: Annual reports present program-level aggregates, not individual application data. Winner patterns in historical_awards.py are inferred from these aggregates, not computed from a microdata sample. The inference methodology is described in Methodology.
NMTC Program NOFA¶
Source: CDFI Fund Notice of Funds Availability (NOFA), published annually
What we use: - Scoring criteria weights for the five application categories (Business Strategy, Community Outcomes, Management Capacity, Capitalization, Prior Awards) - Distress concentration requirements and bonus criteria (Native American areas, high-migration rural counties, Opportunity Zones) - Program rules: 39% credit rate, 7-year compliance period, minimum QEI thresholds
Note: The NOFA is revised each year. Scoring weights and specific criteria can change between rounds. The current library reflects the CY2024 NOFA structure. Always verify against the applicable NOFA for your specific application round.
Integration libraries (open source, third party)¶
The library wraps five community-developed Python libraries:
| Library | What it does | Used for |
|---|---|---|
nmtc-mapper |
Census tract lookup and NMTC eligibility determination | Enriching PipelineProject.is_nmtc_eligible and distress_level |
nmtc-calc |
NMTC deal economics computation (credits, investor equity, CDE fees) | deal_economics_summary in ApplicationAnalysis |
hmda-analyzer |
HMDA (Home Mortgage Disclosure Act) data integration | Community lending context for Section B |
cdfidata |
CDFI Fund certified entity data | CDE track record and peer data |
impact-ledger |
Impact metrics aggregation and benchmarking | Jobs and community outcome standardization |
The 17-library open source stack¶
NMTC Application Builder is built on these open source libraries:
Core data science:
- pandas >= 1.3 — DataFrames for pipeline and analysis results
- numpy >= 1.21 — Numerical operations in scoring and statistics
CDFI/NMTC domain:
- nmtc-mapper >= 0.3.0 — Census tract eligibility
- nmtc-calc >= 0.1.0 — NMTC deal economics
- hmda-analyzer >= 0.1.0 — HMDA lending data
- cdfidata >= 0.1.7 — CDFI Fund data
- impact-ledger >= 0.2.0 — Impact measurement
- cra-scraper >= 0.1.0 — Community Reinvestment Act data
Configuration:
- pyyaml >= 6.0 — YAML configuration loading for CDE profiles
Output (optional):
- python-docx >= 1.1.0 — Word document generation ([word] extra)
- openpyxl >= 3.0.9 — Excel workbook generation ([excel] extra)
- reportlab >= 4.0.0 — PDF generation ([pdf] extra)
Visualization (optional):
- matplotlib >= 3.7 — All five visualization functions ([viz] extra)
Development:
- pytest >= 7.0 — Test suite
- pytest-cov >= 4.0 — Coverage reporting
- jupyter >= 1.0 — Example notebooks
Data recency and update policy¶
The embedded historical data in nmtcapp/data/historical_awards.py covers CY2020–CY2024 award rounds. Data is updated when:
- The CDFI Fund publishes new award announcement data (typically once per year)
- Annual report data is released with new impact statistics
Users working on CY2025 and later applications should verify that the winner patterns used for scoring reflect the most recent available data. Check the historical_awards.py module header comment for the current data coverage date.
To see the current round data programmatically: