Pipeline Analysis¶

The analyze() method is the central operation of NMTC Application Builder. It orchestrates five sequential steps, caches the result, and returns an ApplicationAnalysis object that feeds every downstream operation: scoring, recommendations, optimization, and output generation.

What happens when you call analyze()¶

analysis = app.analyze()

Internally, five steps run in sequence:

Step 1: Eligibility enrichment (nmtc-mapper)¶

Every PipelineProject in the pipeline is passed through enrich_pipeline_eligibility(), which calls the nmtc-mapper library to look up the census tract for each project address and determine NMTC eligibility and distress level. After enrichment, each project has its census_tract, is_nmtc_eligible, distress_level, is_native_area, is_high_migration_rural, and is_opportunity_zone fields populated.

The four distress level codes:

Code	Meaning
`deep`	Deep Distress — poverty rate >30% or unemployment >1.5× national average
`severe`	Severe Distress — LIC with additional qualifying distress factors
`lic`	Low Income Community — AMI ≤80% or poverty rate ≥20%
`ineligible`	Not NMTC Eligible — does not meet LIC threshold

Projects with pre-populated eligibility fields (as returned by Pipeline.sample()) skip the API call.

Step 2: Deal economics (nmtc-calc)¶

compute_pipeline_economics() calculates the NMTC deal economics for the full pipeline: total NMTCs generated (39% of QEI over 7 years), investor equity raised at the current market credit price (~$0.83/credit dollar), estimated CDE fees (2.5% of QEI), and net subsidy to the QALICB. These figures populate deal_economics_summary in the result.

Step 3: Intelligence analyses¶

Four analyses run in parallel on the enriched pipeline:

Distress concentration — what percentage of QEI is deployed into deep, severe, LIC, and ineligible tracts; native area and high-migration rural percentages; comparison against winner distributions
Geographic diversity — states count, MSA count, urban/rural split, Herfindahl-Hirschman Index (HHI) for geographic concentration
Sector mix — sectors represented, dominant sector, high-priority sector percentage (healthcare + affordable housing + education), sector diversity score
Impact aggregation — total jobs created and retained, units built, square footage, and the critical jobs-per-million-QEI metric benchmarked against historical winners

Step 4: Validation¶

Three validation checks run automatically:

Eligibility check — confirms all projects have valid census tracts and eligibility determinations; flags ineligible projects
Completeness check — verifies all required fields are populated on every project and the CDEProfile
Consistency check — checks that qei_request <= total_project_cost, qlici_amount <= qei_request, and other internal consistency rules

Step 5: Readiness score¶

compute_readiness_score() computes a weighted 0–100 score from six components with a letter grade (A–F):

Component	Weight
Eligibility quality	25%
Distress concentration	25%
Impact metrics	20%
Geographic diversity	15%
Validation pass rate	10%
Completeness	5%

Grade thresholds: A ≥ 85, B ≥ 70, C ≥ 55, D ≥ 40, F below 40.

PipelineProject fields¶

Required fields (must be set at construction)¶

Field	Type	Description
`project_id`	`str`	Unique identifier (e.g. "PRJ-001")
`project_name`	`str`	Human-readable project name
`qalicb_name`	`str`	Legal name of the QALICB entity
`address`	`str`	Street address
`city`	`str`	City
`state`	`str`	Two-letter state abbreviation
`sector`	`str`	One of the valid sectors (see below)
`project_type`	`str`	`real_estate`, `operating_business`, or `mixed_use`
`total_project_cost`	`float`	Total project cost in dollars (must be > 0)
`qei_request`	`float`	Qualified Equity Investment request in dollars (must be > 0)
`qlici_amount`	`float`	QLICI amount in dollars (must be > 0)
`expected_jobs_created`	`int`	FTE jobs expected to be created (must be ≥ 0)

Optional fields¶

Field	Type	Default	Description
`expected_jobs_retained`	`int`	0	FTE jobs expected to be retained
`expected_units_built`	`int \\| None`	None	Affordable housing units (if applicable)
`expected_sq_ft`	`float \\| None`	None	Gross square footage
`closing_target_date`	`str \\| None`	None	Target closing date (ISO format: "2025-09-30")
`construction_start`	`str \\| None`	None	Construction start date
`operations_start`	`str \\| None`	None	Operations start date

Enrichment fields (populated by analyze())¶

These begin as None and are set by the eligibility enrichment step. Do not set them manually unless using pre-enriched data (as Pipeline.sample() does).

Field	Type	Description
`census_tract`	`str \\| None`	11-digit FIPS census tract ID
`is_nmtc_eligible`	`bool \\| None`	True if tract qualifies as LIC or deeper
`distress_level`	`str \\| None`	`deep`, `severe`, `lic`, or `ineligible`
`is_native_area`	`bool \\| None`	True if BIA-designated Native American area
`is_high_migration_rural`	`bool \\| None`	True if USDA high-migration rural county
`is_opportunity_zone`	`bool \\| None`	True if Opportunity Zone designation applies

Valid sector values¶

VALID_SECTORS = [
    "healthcare",
    "affordable_housing",
    "education",
    "small_business",
    "mixed_use",
    "community_facility",
    "clean_energy",
    "other",
]

CDFI Fund priority sectors are healthcare, affordable_housing, and education. Projects in these sectors score highest on the sector diversity dimension.

Loading from CSV¶

The Pipeline.from_csv() class method reads a CSV file with columns matching PipelineProject field names. Required columns are the 12 required fields above. Optional columns are read when present and default to None when absent.

pipeline = Pipeline.from_csv("my_pipeline.csv")

A template CSV is available at templates/pipeline_template.csv in the repository. A sample strong pipeline is at templates/pipeline_sample_strong.csv.

Example CSV structure (abbreviated):

project_id,project_name,qalicb_name,address,city,state,sector,project_type,total_project_cost,qei_request,qlici_amount,expected_jobs_created
PRJ-001,Southside Health Center,Southside HC QALICB LLC,3400 S Michigan Ave,Chicago,IL,healthcare,real_estate,12500000,8500000,8500000,52
PRJ-002,East Houston Charter Academy,East Houston Academy QALICB LLC,5200 Lawndale St,Houston,TX,education,real_estate,9800000,7000000,7000000,38

Building programmatically¶

from nmtcapp.core.pipeline import Pipeline, PipelineProject

project = PipelineProject(
    project_id="PRJ-001",
    project_name="Southside Health Center",
    qalicb_name="Southside HC QALICB, LLC",
    address="3400 S Michigan Ave",
    city="Chicago",
    state="IL",
    sector="healthcare",
    project_type="real_estate",
    total_project_cost=12_500_000,
    qei_request=8_500_000,
    qlici_amount=8_500_000,
    expected_jobs_created=52,
    expected_jobs_retained=18,
    expected_sq_ft=24_000,
    closing_target_date="2025-09-30",
)

pipeline = Pipeline(projects=[project])
# or:
pipeline = Pipeline()
pipeline.add(project)

Reading the analysis output¶

analysis = app.analyze()

# High-level summary to terminal
analysis.summary()

# Access individual analyses
print(analysis.distress_analysis["pct_deep_or_severe"])   # e.g. 0.82
print(analysis.geographic_analysis["states_count"])        # e.g. 10
print(analysis.sector_analysis["sectors_represented"])     # e.g. 6
print(analysis.impact_summary["jobs_per_million_qei"])    # e.g. 14.2

# Readiness score
rs = analysis.readiness_score
print(f"Grade: {rs.grade}, Score: {rs.overall_score}")    # e.g. Grade: B, Score: 74.5

# Serialize to dict (JSON-safe)
import json
print(json.dumps(analysis.to_dict(), indent=2))

Distress analysis keys¶

d = analysis.distress_analysis
d["pct_deep_or_severe"]       # float — fraction of QEI in deep + severe tracts
d["pct_lic"]                  # float — fraction in standard LIC tracts
d["pct_non_lic"]              # float — fraction in non-LIC (ineligible) tracts
d["pct_native_area"]          # float — fraction in Native American areas
d["meets_target_threshold"]   # bool — True if pct_deep_or_severe >= 0.75
d["vs_historical_winners"]    # str — e.g. "above_median"

Geographic analysis keys¶

g = analysis.geographic_analysis
g["states_count"]                  # int — number of distinct states
g["msa_count"]                     # int — number of MSAs represented
g["urban_pct"]                     # float — fraction of QEI in urban tracts
g["rural_pct"]                     # float — fraction of QEI in rural tracts
g["hhi"]                           # float — Herfindahl-Hirschman Index (lower = more diverse)
g["geographic_concentration_label"] # str — e.g. "low", "medium", "high"
g["state_breakdown"]               # dict — per-state QEI and project counts

Readiness score interpretation¶

Grade	Score Range	Interpretation
A	85–100	Submission-ready; focus on final review
B	70–84	Competitive; targeted improvements will strengthen the application
C	55–69	Below typical winner patterns; significant work needed before submission
D	40–54	Multiple critical gaps; substantial restructuring required
F	0–39	Application not viable in current form

Note that the readiness score is distinct from the win alignment score — it measures internal completeness and quality, while the alignment score measures competitiveness against winner patterns.

Caching behavior¶

analyze() caches its result after the first call. Calling analyze() again on the same Application object returns the cached result immediately. The cache is invalidated if you call add_pipeline() or add_project() after the initial analysis.

analysis1 = app.analyze()    # runs full analysis
analysis2 = app.analyze()    # returns cached result
assert analysis1 is analysis2  # True — same object