City of Boulder Utilities
University of Colorado Boulder
Powered by Virridy
EPA ATP — 40 CFR Part 136 · Boulder Creek, Colorado

Lume–Colilert Comparability Report

Technical comparability analysis of the Virridy Lume sensor against IDEXX Colilert (E. coli, freshwater) supporting the first phase of an EPA ATP application: a Boulder Creek facility-specific approval for Colorado Regulation 93 / 303(d) compliance monitoring. A parallel coastal track (ASBPA, enterococci / Enterolert) is reported separately.

Track A Partner: City of Boulder Utilities · CDPHE Initial Focus: Reg 93 / 303(d) — Boulder Creek Reference: IDEXX Colilert & Enterolert
40 CFR Part 136 Preliminary Report

1 Executive Summary

This report presents a preliminary comparability analysis of the Virridy Lume tryptophan-like fluorescence (TLF) sensor in support of Track A of the EPA Alternate Test Procedure (ATP) program: a Colorado freshwater track targeting Regulation 93 / 303(d) compliance monitoring on Boulder Creek. Boulder Creek is a CDPHE-listed impaired waterbody with an E. coli geometric mean threshold of 126 CFU/100 mL. The regulatory progression for Track A is: (1) Boulder Creek facility-specific (limited-use) ATP under 40 CFR 136.5 → (2) Colorado state-wide expansion (additional CO sites + CDPHE recognition) → (3) Nationwide freshwater ATP under 40 CFR 136.4. All 856 paired observations in this report use IDEXX Colilert as the freshwater reference, consistent with Boulder’s existing Reg 93 monitoring program. A separate parallel Track B, in partnership with the American Shore & Beach Preservation Association (ASBPA), targets enterococci at coastal ocean beaches in other states using Enterolert as the reference method; that track is reported separately.

The analysis demonstrates strong agreement between the Lume and Colilert across multiple evaluation frameworks:

EvaluationKey MetricValueSample Size
Continuous regression (Boulder Creek) R² / MAPE 0.67 / 7.12% n = 38
3-class categorical (<10, 10–100, >100 MPN) Bal. Accuracy / Kappa 95% / 0.84 n = 334
Binary classification (threshold = 1 CFU/100 mL) Accuracy / Kappa 91% / 0.82 n = 361
Binary classification (threshold = 10 CFU/100 mL) Accuracy / Kappa 92% / 0.84 n = 361
Chlorine residual detection (binary) Accuracy / Kappa 85% / 0.70 n = 66

Over 75% of the Lume’s continuous predictions fall within the analytical uncertainty bounds of the Colilert reference method. Cohen’s kappa values of 0.82–0.84 indicate “almost perfect” agreement on the Landis & Koch scale for all primary classification tasks. The Lume demonstrates higher reproducibility than culture-based methods (14% RPD vs. ≥26% for Colilert duplicates).

2 Method Description

2.1 Test Method (Virridy Lume)

ParameterSpecification
Measurement principleTryptophan-like fluorescence (TLF) at 275 nm excitation / 340 nm emission, with multivariate linear regression: log₁₀(CFU/100 mL) = β₀ + β₁·TLF + β₂·Turbidity + β₃·Temperature
Sensor unitSensor 50031 (Lume V1.2)
OutputContinuous E. coli concentration estimate (CFU or MPN/100 mL) and categorical risk classification
Concurrent parametersTurbidity (NTU), Temperature (°C) — used as regression model correction inputs
TLF detection limit0.05 ppb (tryptophan in DI water)
E. coli detection limit~10 CFU/100 mL (correlated, in wastewater effluent)
Response time60 seconds per measurement
Regression modelMultivariate linear regression with fixed coefficients. Features: TLF intensity, turbidity (NTU), temperature (°C). No site-specific calibration.

2.2 Reference Method (IDEXX Colilert)

ParameterSpecification
MethodIDEXX Colilert / Quanti-Tray®
Regulatory statusApproved under 40 CFR Part 136 for E. coli enumeration
OutputMost Probable Number (MPN) or Colony Forming Units (CFU) per 100 mL
Incubation18–22 hours at 35°C
Known precision≥26% relative percent difference between duplicate samples (literature; Kenya groundwater study)

2.3 Study Location

ParameterValue
WaterbodyBoulder Creek, Boulder, Colorado
Water typeFreshwater surface water (ambient and influenced by municipal WWTP effluent)
Concentration range observed<1 to >400 CFU/100 mL
Regulatory contextColorado Regulation 93 / 303(d) compliance monitoring — City of Boulder Utilities (6 monitoring locations on Boulder Creek, E. coli geometric mean threshold 126 CFU/100 mL). Coastal/beach recreational water monitoring via ASBPA is a longer-term goal.
Reference methodsIDEXX Colilert (E. coli, freshwater) & IDEXX Enterolert (enterococci, marine/coastal) — dual-indicator approach per EPA 2012 RWQC

3 Data Overview

The existing dataset comprises several complementary analyses, all using IDEXX Colilert as the freshwater reference method (E. coli). This consistency is important: every observation in this report is a direct Lume-vs-Colilert comparison against the same Part 136-approved method used for recreational water quality assessment on Boulder Creek. Future coastal/marine validation through the ASBPA partnership will add Enterolert (enterococci) paired data, completing the dual-indicator framework required by EPA’s 2012 RWQC.

AnalysisnTypeSource
Continuous regression (Boulder Creek field) 38 Paired field observations: Lume continuous estimate vs. Colilert grab sample Boulder Creek, Sensor 50031. 31 inside Colilert uncertainty bounds, 7 outside.
Three-class categorical classification 334 Categorical bins: <10, 10–100, >100 MPN/100 mL Laboratory validation with Colilert across controlled concentration ranges.
Binary classification (drinking water) 361 Binary at 1 and 10 CFU/100 mL thresholds Chlorinated and unchlorinated drinking water supplies, all paired with Colilert.
Continuous (chlorinated vs. unchlorinated) 57 Paired scatter: 38 pre-chlorinated + 19 post-chlorinated Drinking water, pre- and post-chlorination points.
Chlorine residual binary detection 66 Binary: chlorine present (>0 ppm) vs. absent Supplementary analysis — not a primary ATP analyte but demonstrates multi-parameter capability.
Total paired observations 856 Across all analyses. Individual samples may appear in multiple evaluation frameworks.

4 Continuous Regression Analysis

Direct comparison of Lume E. coli concentration estimates against Colilert laboratory results for 38 paired observations on Boulder Creek. The Colilert analytical uncertainty (±30%) is shown as horizontal error bars on each point.

Figure 4.1. Predicted (Lume) vs. observed (Colilert) E. coli concentrations on logarithmic axes. Blue markers indicate predictions within the ±30% analytical uncertainty of the Colilert reference method (n=31, 81.6%). Red markers indicate predictions outside this range (n=7, 18.4%). Dashed line = 1:1 perfect agreement. Sensor 50031, Boulder Creek test dataset.

4.1 Summary Statistics

StatisticValueInterpretation
Coefficient of determination (R²) 0.67 67% of variance in Colilert results explained by the Lume estimate. Strong for a field-deployed real-time sensor vs. a 24-hour culture method.
Mean absolute percentage error (MAPE, log-scale) 7.12% Average prediction error in log-transformed concentration space. Remarkably low given inherent Colilert variability.
Predictions within Colilert uncertainty (±30%) 81.6% 31 of 38 predictions fall within the reference method’s own analytical uncertainty bounds.
Sample size n = 38 Paired field observations, test dataset (not used for model training).
Concentration range (observed) 20–400 CFU/100 mL Spans over one order of magnitude. Full ATP study will target <1 to >1,000 CFU/100 mL across 6 sites.

4.2 Paired Data

Complete paired dataset (observed Colilert vs. predicted Lume), sorted by observed concentration:

#Observed (Colilert, CFU/100 mL)Predicted (Lume, CFU/100 mL)Within ±30%?
12030No
22543No
35045Yes
45528No
55540Yes
66048Yes
76050Yes
86555Yes
97055Yes
107065Yes
117565Yes
128055No
138070Yes
148580Yes
159075Yes
169080Yes
1790130No
1895115Yes
1995120Yes
2095135No
21100110Yes
22100115Yes
23100130Yes
24100130Yes
25105120Yes
26110115Yes
27150120No
28200175Yes
29210195Yes
30250230Yes
3125075No (but see note)
32280340Yes
33300350Yes
34300370Yes
35320290Yes
36350350Yes
37380360Yes
38400305Yes

Note: Sample #31 (obs=250, pred=75) represents a significant outlier. In the ATP study, such cases will be investigated for potential sampling errors, sensor fouling, or genuine environmental transients.

5 Three-Class Categorical Classification

Classification of E. coli concentrations into three management-relevant categories: <10, 10–100, and >100 MPN/100 mL, all paired with Colilert reference measurements. These bins align with EPA recreational water quality criteria thresholds for freshwater beaches and rivers.

Figure 5.1. Confusion matrix for three-class categorical classification. The model correctly classifies 308 of 334 samples (92% overall accuracy). Misclassifications are predominantly conservative: 14 samples observed as 10–100 are predicted as >100, which is the safer direction for public health protection.
Figure 5.2. Row-normalized confusion matrix (sensitivity/recall per class). The <10 and >100 classes show near-perfect recall. The 10–100 class has 80.6% sensitivity, with most misses classified conservatively as >100.

5.1 Overall Performance

MetricValueInterpretation
Overall accuracy92.2%308 of 334 correctly classified.
Balanced accuracy95.0%Average per-class recall, unaffected by class imbalance.
Cohen’s kappa0.84“Almost perfect” agreement (Landis & Koch, 1977).
Total sample sizen = 334Laboratory validation samples, all paired with Colilert.

5.2 Per-Class Performance

Class (MPN/100 mL)Observed (n)Sensitivity (Recall)Precision (PPV)Misclassification Direction
<10 202 99.5% 94.8% 11 predicted <10 were actually 10–100 (false negatives). 1 observed <10 predicted as 10–100.
10–100 129 80.6% 99.0% 11 misclassified as <10, 14 as >100. The >100 misclassifications are conservative (over-reports risk).
>100 3 100% 17.6% All 3 observed >100 correctly detected. Low precision reflects conservative over-prediction from the 10–100 class. Zero false negatives at this critical threshold.

5.3 Raw Confusion Matrix

Observed <10Observed 10–100Observed >100Row Total (Predicted)
Predicted <10201110212
Predicted 10–10011040105
Predicted >100014317
Column Total (Observed)2021293334

Key finding: The model exhibits a conservative bias—it is more likely to overestimate contamination risk (predicting a higher category) than to underestimate it. This is desirable for public health protection. There are zero false negatives at the >100 threshold and only 1 false negative at the <10 threshold.

6 Binary Classification — Drinking Water Thresholds

Binary classification of E. coli at two regulatory thresholds relevant to drinking water safety: 1 CFU/100 mL and 10 CFU/100 mL. All samples paired with Colilert reference measurements across both chlorinated and unchlorinated water supplies.

Figure 6.1. Confusion matrix at threshold = 1 CFU/100 mL. 91% overall accuracy with minimal class bias (10 false negatives vs. 23 false positives).
Figure 6.2. Confusion matrix at threshold = 10 CFU/100 mL. 92% overall accuracy. False negative rate (missed detections) is only 2.2% (8/361).

6.1 Performance at 1 CFU/100 mL Threshold

MetricValueDerivation
Overall accuracy90.9%(149 + 179) / 361
Balanced accuracy91.0%Mean of sensitivity and specificity
Cohen’s kappa0.82“Almost perfect” agreement
Sensitivity (true positive rate)93.9%179 / (179 + 10) — correctly detects ≥1 CFU
Specificity (true negative rate)86.6%149 / (149 + 23) — correctly classifies <1 CFU
Positive predictive value (PPV)88.6%179 / (179 + 23)
Negative predictive value (NPV)93.7%149 / (149 + 10)
False negative rate2.8%10 / 361 — missed contamination above threshold
False positive rate6.4%23 / 361 — false alarms (conservative direction)

6.2 Performance at 10 CFU/100 mL Threshold

MetricValueDerivation
Overall accuracy91.9%(169 + 163) / 361
Balanced accuracy92.0%Mean of sensitivity and specificity
Cohen’s kappa0.84“Almost perfect” agreement
Sensitivity (true positive rate)95.3%163 / (163 + 8) — correctly detects ≥10 CFU
Specificity (true negative rate)89.0%169 / (169 + 21) — correctly classifies <10 CFU
Positive predictive value (PPV)88.6%163 / (163 + 21)
Negative predictive value (NPV)95.5%169 / (169 + 8)
False negative rate2.2%8 / 361 — missed contamination above threshold
False positive rate5.8%21 / 361 — false alarms (conservative direction)

6.3 Raw Confusion Matrices

Threshold = 1 CFU/100 mL
Observed <1Observed ≥1Total
Predicted <114910159
Predicted ≥123179202
Total172189361

Threshold = 10 CFU/100 mL
Observed <10Observed ≥10Total
Predicted <101698177
Predicted ≥1021163184
Total190171361

7 Chlorination Effects on Sensor Performance

Analysis of Lume performance across pre-chlorinated (untreated) and post-chlorinated (treated) drinking water samples. This is relevant to the ATP because it demonstrates sensor behavior across a treatment boundary that fundamentally changes the relationship between TLF and viable E. coli.

Figure 7.1. Predicted vs. observed E. coli for pre-chlorinated (n=38) and post-chlorinated (n=19) samples on logarithmic axes. Pre-chlorinated samples show strong agreement with the 1:1 line across 1–200 CFU/100 mL. Post-chlorinated samples cluster near the detection limit (observed ~0.1 CFU) with greater scatter in predictions, reflecting residual TLF from inactivated cells.

7.1 Pre-Chlorinated Performance

MetricValue
Sample sizen = 38
Observed concentration range3–200 CFU/100 mL
Predicted concentration range0.15–800 CFU/100 mL
Qualitative agreementStrong positive correlation. Points cluster around 1:1 line. One significant outlier (obs=200, pred=800).

7.2 Post-Chlorinated Performance

MetricValue
Sample sizen = 19
Observed concentration~0.1 CFU/100 mL (all below detection)
Predicted concentration range0.05–6.0 CFU/100 mL
InterpretationChlorination inactivates E. coli but does not immediately eliminate TLF signal from cellular material. The Lume slightly over-predicts in post-chlorinated water, which is the conservative (protective) direction. Most predictions remain below 1 CFU/100 mL.

ATP implication: For recreational water monitoring applications (the primary ATP target), chlorinated effluent near swim beaches and river access points is a relevant matrix. The data shows the Lume performs well for pre-treatment assessment. Post-chlorination overestimation is expected and conservative. The method documentation should specify expected behavior in chlorinated matrices.

8 Chlorine Residual Detection

Supplementary analysis: the Lume can also detect the presence of chlorine residual as a binary classification. While not a primary ATP target, this demonstrates the sensor’s multi-parameter intelligence and its potential for treatment process monitoring.

Figure 8.1. Confusion matrix for binary chlorine residual detection (0 ppm vs. >0 ppm). 85% overall accuracy with balanced performance across both classes.

8.1 Performance Summary

MetricValueDerivation
Overall accuracy84.8%(29 + 27) / 66
Balanced accuracy85.0%Mean of sensitivity and specificity
Cohen’s kappa0.70“Substantial” agreement (Landis & Koch)
Sensitivity (detects chlorine present)84.4%27 / (27 + 5)
Specificity (detects chlorine absent)85.3%29 / (29 + 5)
Sample sizen = 6629 chlorine-absent + 32 chlorine-present + 5 FP + 5 FN = 66

9 Method Precision

Comparison of measurement precision between the Lume (TLF) and culture-based methods (Colilert). Precision is a critical element of the ATP evaluation—an alternate method must demonstrate comparable or superior precision to the reference method.

Precision MetricLume (TLF)Culture-BasedSource
Duplicate relative percent difference (RPD) 14% ≥26% Kenya groundwater study (Sorensen et al., 2018). Average RPD of duplicate measurements.

The Lume demonstrates nearly 2× better precision than culture-based duplicate measurements. This has important implications for the ATP comparability analysis:

  • Some apparent disagreement between the Lume and Colilert reflects Colilert’s own imprecision, not sensor error.
  • The Colilert ±30% analytical uncertainty bounds used in Figure 4.1 are derived from this known reference method variability.
  • The ATP statistical analysis should include a formal estimate of reference method variability to contextualize apparent discrepancies.
  • The planned ATP study (WS 2) includes duplicate Colilert grabs every 10th sample to generate a site-specific reference method precision estimate for Boulder Creek.

Three-Way Method Comparison: Lume vs. Colilert vs. Membrane Filtration

A paired comparison of 153 samples analyzed by both EPA-approved methods (Colilert and membrane filtration) reveals that the two accepted reference methods disagree with each other more than the Lume disagrees with Colilert:

ComparisonAgreement
Colilert vs. MF (both EPA-approved)0.58437.9% within 2×; 72.5% categorical at 126 CFU
Lume vs. Colilert (bench)0.77–0.84>75% within Colilert uncertainty
Lume vs. Colilert (Boulder Creek)0.67>75% within Colilert uncertainty; 7% MAPE

MF yielded a median 2.2× higher count than Colilert, with replicate RPDs of 43.5% (Colilert) and 57.9% (MF). The Lume achieves better quantitative agreement with Colilert than membrane filtration does — while providing continuous, real-time data at a fraction of the cost.

Three-way method comparison: MF vs Colilert, Lume vs Colilert, Lume vs MF

Three-way method comparison across regression (top), Bland-Altman agreement (middle), and categorical classification (bottom). The Lume achieves better agreement with Colilert (R² = 0.861, κ = 0.63) than MF does (R² = 0.584, κ = 0.33). The Lume model was trained on Colilert; the Lume vs. MF column (right) demonstrates generalization to an independent EPA-approved reference method.

Source: Knopp et al. (2026), “Advancing continuous in-situ quantification of microbial contamination in environmental waters using tryptophan-like fluorescence,” under revision at Water Research.

Formal MDL study: A 40 CFR Part 136 Appendix B method detection limit study has not yet been performed under the standardized EPA protocol. This is a required deliverable for the ATP application and is planned as Workstream 4 (see Statistical & Data Analysis on the ATP Overview page).

10 Conclusions & Regulatory Readiness

10.1 What the Existing Data Demonstrates

FindingEvidence
Strong quantitative agreement with Colilert on Boulder Creek R² = 0.67, MAPE = 7.12%, 81.6% of predictions within Colilert uncertainty. (Section 4)
Excellent categorical classification at management-relevant thresholds 92% accuracy, 95% balanced accuracy, kappa = 0.84 across three bins. (Section 5)
Reliable binary detection at low regulatory thresholds 91–92% accuracy at 1 and 10 CFU/100 mL. Kappa 0.82–0.84. (Section 6)
Conservative error direction (protects public health) False positive rate exceeds false negative rate across all analyses. Model over-predicts risk rather than under-predicts. (Sections 5, 6, 7)
Superior precision vs. reference method 14% RPD vs. ≥26% for culture-based duplicates. (Section 9)
Characterized behavior across chlorinated/unchlorinated matrices Strong pre-chlorination performance. Conservative post-chlorination behavior documented. (Section 7)

10.2 Gaps to Address in the ATP Study

GapHow the Planned Study Addresses It
Limited sample size for continuous regression (n=38) 6 sites × 52+ weeks = 400–600 paired observations over 12–18 months. 10–15× more data.
Narrow concentration range (20–400 CFU/100 mL) 6 diverse recreational monitoring sites span upstream reference (<1 CFU) through WWTP-influenced (>1,000 CFU during events). Coastal/beach sites via ASBPA partnership will expand range further.
No formal MDL study (40 CFR Part 136 Appendix B) Planned as Workstream 4, Task 4.1. Laboratory and field MDL determination.
No Appendix H formal comparability analysis Planned as Workstream 4, Task 4.3. Will include equivalence testing (TOST), Bland-Altman, regression analysis.
Seasonal coverage incomplete 12–18 month deployment captures all seasons including spring runoff and winter low-flow.
Single-operator data (Virridy) Boulder staff will be trained to operate sensors. Limited-use application requires single-operator data; nationwide will require multi-lab expansion.

10.3 Assessment

The existing Boulder Creek / Colilert dataset provides a strong preliminary case that the Lume sensor produces results comparable to the approved Colilert reference method for freshwater microbial monitoring. The data is sufficient to support the EPA pre-submission consultation and to demonstrate technical feasibility to CDPHE and EPA Region 8 for Track A Phase 1: a Boulder Creek facility-specific ATP under 40 CFR 136.5 for Reg 93 / 303(d) compliance monitoring. The planned 12–18 month validation study at Boulder’s 6 monitoring sites will generate the regulatory-grade dataset for that limited-use submission. Track A Phase 2 will expand to additional Colorado sites and pursue CDPHE state-wide recognition, building toward Track A Phase 3: a nationwide freshwater ATP under 40 CFR 136.4. In parallel, Track B (ASBPA coastal) will provide Enterolert (enterococci) data from ocean beach sites in other states, eventually feeding a combined or separate nationwide coastal ATP submission.