Pipeline Analysis¶
The analyze() method is the central operation of NMTC Application Builder. It orchestrates five sequential steps, caches the result, and returns an ApplicationAnalysis object that feeds every downstream operation: scoring, recommendations, optimization, and output generation.
What happens when you call analyze()¶
Internally, five steps run in sequence:
Step 1: Eligibility enrichment (nmtc-mapper)¶
Every PipelineProject in the pipeline is passed through enrich_pipeline_eligibility(), which calls the nmtc-mapper library to look up the census tract for each project address and determine NMTC eligibility and distress level. After enrichment, each project has its census_tract, is_nmtc_eligible, distress_level, is_native_area, is_high_migration_rural, and is_opportunity_zone fields populated.
The four distress level codes:
| Code | Meaning |
|---|---|
deep |
Deep Distress — poverty rate >30% or unemployment >1.5× national average |
severe |
Severe Distress — LIC with additional qualifying distress factors |
lic |
Low Income Community — AMI ≤80% or poverty rate ≥20% |
ineligible |
Not NMTC Eligible — does not meet LIC threshold |
Projects with pre-populated eligibility fields (as returned by Pipeline.sample()) skip the API call.
Step 2: Deal economics (nmtc-calc)¶
compute_pipeline_economics() calculates the NMTC deal economics for the full pipeline: total NMTCs generated (39% of QEI over 7 years), investor equity raised at the current market credit price (~$0.83/credit dollar), estimated CDE fees (2.5% of QEI), and net subsidy to the QALICB. These figures populate deal_economics_summary in the result.
Step 3: Intelligence analyses¶
Four analyses run in parallel on the enriched pipeline:
- Distress concentration — what percentage of QEI is deployed into deep, severe, LIC, and ineligible tracts; native area and high-migration rural percentages; comparison against winner distributions
- Geographic diversity — states count, MSA count, urban/rural split, Herfindahl-Hirschman Index (HHI) for geographic concentration
- Sector mix — sectors represented, dominant sector, high-priority sector percentage (healthcare + affordable housing + education), sector diversity score
- Impact aggregation — total jobs created and retained, units built, square footage, and the critical jobs-per-million-QEI metric benchmarked against historical winners
Step 4: Validation¶
Three validation checks run automatically:
- Eligibility check — confirms all projects have valid census tracts and eligibility determinations; flags ineligible projects
- Completeness check — verifies all required fields are populated on every project and the
CDEProfile - Consistency check — checks that
qei_request <= total_project_cost,qlici_amount <= qei_request, and other internal consistency rules
Step 5: Readiness score¶
compute_readiness_score() computes a weighted 0–100 score from six components with a letter grade (A–F):
| Component | Weight |
|---|---|
| Eligibility quality | 25% |
| Distress concentration | 25% |
| Impact metrics | 20% |
| Geographic diversity | 15% |
| Validation pass rate | 10% |
| Completeness | 5% |
Grade thresholds: A ≥ 85, B ≥ 70, C ≥ 55, D ≥ 40, F below 40.
PipelineProject fields¶
Required fields (must be set at construction)¶
| Field | Type | Description |
|---|---|---|
project_id |
str |
Unique identifier (e.g. "PRJ-001") |
project_name |
str |
Human-readable project name |
qalicb_name |
str |
Legal name of the QALICB entity |
address |
str |
Street address |
city |
str |
City |
state |
str |
Two-letter state abbreviation |
sector |
str |
One of the valid sectors (see below) |
project_type |
str |
real_estate, operating_business, or mixed_use |
total_project_cost |
float |
Total project cost in dollars (must be > 0) |
qei_request |
float |
Qualified Equity Investment request in dollars (must be > 0) |
qlici_amount |
float |
QLICI amount in dollars (must be > 0) |
expected_jobs_created |
int |
FTE jobs expected to be created (must be ≥ 0) |
Optional fields¶
| Field | Type | Default | Description |
|---|---|---|---|
expected_jobs_retained |
int |
0 | FTE jobs expected to be retained |
expected_units_built |
int \| None |
None | Affordable housing units (if applicable) |
expected_sq_ft |
float \| None |
None | Gross square footage |
closing_target_date |
str \| None |
None | Target closing date (ISO format: "2025-09-30") |
construction_start |
str \| None |
None | Construction start date |
operations_start |
str \| None |
None | Operations start date |
Enrichment fields (populated by analyze())¶
These begin as None and are set by the eligibility enrichment step. Do not set them manually unless using pre-enriched data (as Pipeline.sample() does).
| Field | Type | Description |
|---|---|---|
census_tract |
str \| None |
11-digit FIPS census tract ID |
is_nmtc_eligible |
bool \| None |
True if tract qualifies as LIC or deeper |
distress_level |
str \| None |
deep, severe, lic, or ineligible |
is_native_area |
bool \| None |
True if BIA-designated Native American area |
is_high_migration_rural |
bool \| None |
True if USDA high-migration rural county |
is_opportunity_zone |
bool \| None |
True if Opportunity Zone designation applies |
Valid sector values¶
VALID_SECTORS = [
"healthcare",
"affordable_housing",
"education",
"small_business",
"mixed_use",
"community_facility",
"clean_energy",
"other",
]
CDFI Fund priority sectors are healthcare, affordable_housing, and education. Projects in these sectors score highest on the sector diversity dimension.
Loading from CSV¶
The Pipeline.from_csv() class method reads a CSV file with columns matching PipelineProject field names. Required columns are the 12 required fields above. Optional columns are read when present and default to None when absent.
A template CSV is available at templates/pipeline_template.csv in the repository. A sample strong pipeline is at templates/pipeline_sample_strong.csv.
Example CSV structure (abbreviated):
project_id,project_name,qalicb_name,address,city,state,sector,project_type,total_project_cost,qei_request,qlici_amount,expected_jobs_created
PRJ-001,Southside Health Center,Southside HC QALICB LLC,3400 S Michigan Ave,Chicago,IL,healthcare,real_estate,12500000,8500000,8500000,52
PRJ-002,East Houston Charter Academy,East Houston Academy QALICB LLC,5200 Lawndale St,Houston,TX,education,real_estate,9800000,7000000,7000000,38
Building programmatically¶
from nmtcapp.core.pipeline import Pipeline, PipelineProject
project = PipelineProject(
project_id="PRJ-001",
project_name="Southside Health Center",
qalicb_name="Southside HC QALICB, LLC",
address="3400 S Michigan Ave",
city="Chicago",
state="IL",
sector="healthcare",
project_type="real_estate",
total_project_cost=12_500_000,
qei_request=8_500_000,
qlici_amount=8_500_000,
expected_jobs_created=52,
expected_jobs_retained=18,
expected_sq_ft=24_000,
closing_target_date="2025-09-30",
)
pipeline = Pipeline(projects=[project])
# or:
pipeline = Pipeline()
pipeline.add(project)
Reading the analysis output¶
analysis = app.analyze()
# High-level summary to terminal
analysis.summary()
# Access individual analyses
print(analysis.distress_analysis["pct_deep_or_severe"]) # e.g. 0.82
print(analysis.geographic_analysis["states_count"]) # e.g. 10
print(analysis.sector_analysis["sectors_represented"]) # e.g. 6
print(analysis.impact_summary["jobs_per_million_qei"]) # e.g. 14.2
# Readiness score
rs = analysis.readiness_score
print(f"Grade: {rs.grade}, Score: {rs.overall_score}") # e.g. Grade: B, Score: 74.5
# Serialize to dict (JSON-safe)
import json
print(json.dumps(analysis.to_dict(), indent=2))
Distress analysis keys¶
d = analysis.distress_analysis
d["pct_deep_or_severe"] # float — fraction of QEI in deep + severe tracts
d["pct_lic"] # float — fraction in standard LIC tracts
d["pct_non_lic"] # float — fraction in non-LIC (ineligible) tracts
d["pct_native_area"] # float — fraction in Native American areas
d["meets_target_threshold"] # bool — True if pct_deep_or_severe >= 0.75
d["vs_historical_winners"] # str — e.g. "above_median"
Geographic analysis keys¶
g = analysis.geographic_analysis
g["states_count"] # int — number of distinct states
g["msa_count"] # int — number of MSAs represented
g["urban_pct"] # float — fraction of QEI in urban tracts
g["rural_pct"] # float — fraction of QEI in rural tracts
g["hhi"] # float — Herfindahl-Hirschman Index (lower = more diverse)
g["geographic_concentration_label"] # str — e.g. "low", "medium", "high"
g["state_breakdown"] # dict — per-state QEI and project counts
Readiness score interpretation¶
| Grade | Score Range | Interpretation |
|---|---|---|
| A | 85–100 | Submission-ready; focus on final review |
| B | 70–84 | Competitive; targeted improvements will strengthen the application |
| C | 55–69 | Below typical winner patterns; significant work needed before submission |
| D | 40–54 | Multiple critical gaps; substantial restructuring required |
| F | 0–39 | Application not viable in current form |
Note that the readiness score is distinct from the win alignment score — it measures internal completeness and quality, while the alignment score measures competitiveness against winner patterns.
Caching behavior¶
analyze() caches its result after the first call. Calling analyze() again on the same Application object returns the cached result immediately. The cache is invalidated if you call add_pipeline() or add_project() after the initial analysis.