Lume–Colilert Comparability Report — Boulder Creek ATP Program

1 Executive Summary

This report presents a preliminary comparability analysis of the Virridy Lume tryptophan-like fluorescence (TLF) sensor in support of Track A of the EPA Alternate Test Procedure (ATP) program: a Colorado freshwater track targeting Regulation 93 / 303(d) compliance monitoring on Boulder Creek. Boulder Creek is a CDPHE-listed impaired waterbody with an E. coli geometric mean threshold of 126 CFU/100 mL. The regulatory progression for Track A is: (1) Boulder Creek facility-specific (limited-use) ATP under 40 CFR 136.5 → (2) Colorado state-wide expansion (additional CO sites + CDPHE recognition) → (3) Nationwide freshwater ATP under 40 CFR 136.4. All 856 paired observations in this report use IDEXX Colilert as the freshwater reference, consistent with Boulder’s existing Reg 93 monitoring program. A separate parallel Track B, in partnership with the American Shore & Beach Preservation Association (ASBPA), targets enterococci at coastal ocean beaches in other states using Enterolert as the reference method; that track is reported separately.

The analysis demonstrates strong agreement between the Lume and Colilert across multiple evaluation frameworks:

Evaluation	Key Metric	Value	Sample Size
Continuous regression (Boulder Creek)	R² / MAPE	0.67 / 7.12%	n = 38
3-class categorical (<10, 10–100, >100 MPN)	Bal. Accuracy / Kappa	95% / 0.84	n = 334
Binary classification (threshold = 1 CFU/100 mL)	Accuracy / Kappa	91% / 0.82	n = 361
Binary classification (threshold = 10 CFU/100 mL)	Accuracy / Kappa	92% / 0.84	n = 361
Chlorine residual detection (binary)	Accuracy / Kappa	85% / 0.70	n = 66

Over 75% of the Lume’s continuous predictions fall within the analytical uncertainty bounds of the Colilert reference method. Cohen’s kappa values of 0.82–0.84 indicate “almost perfect” agreement on the Landis & Koch scale for all primary classification tasks. The Lume demonstrates higher reproducibility than culture-based methods (14% RPD vs. ≥26% for Colilert duplicates).

2 Method Description

2.1 Test Method (Virridy Lume)

Parameter	Specification
Measurement principle	Tryptophan-like fluorescence (TLF) at 275 nm excitation / 340 nm emission, with multivariate linear regression: log₁₀(CFU/100 mL) = β₀ + β₁·TLF + β₂·Turbidity + β₃·Temperature
Sensor unit	Sensor 50031 (Lume V1.2)
Output	Continuous E. coli concentration estimate (CFU or MPN/100 mL) and categorical risk classification
Concurrent parameters	Turbidity (NTU), Temperature (°C) — used as regression model correction inputs
TLF detection limit	0.05 ppb (tryptophan in DI water)
E. coli detection limit	~10 CFU/100 mL (correlated, in wastewater effluent)
Response time	60 seconds per measurement
Regression model	Multivariate linear regression with fixed coefficients. Features: TLF intensity, turbidity (NTU), temperature (°C). No site-specific calibration.

2.2 Reference Method (IDEXX Colilert)

Parameter	Specification
Method	IDEXX Colilert / Quanti-Tray®
Regulatory status	Approved under 40 CFR Part 136 for E. coli enumeration
Output	Most Probable Number (MPN) or Colony Forming Units (CFU) per 100 mL
Incubation	18–22 hours at 35°C
Known precision	≥26% relative percent difference between duplicate samples (literature; Kenya groundwater study)

2.3 Study Location

Parameter	Value
Waterbody	Boulder Creek, Boulder, Colorado
Water type	Freshwater surface water (ambient and influenced by municipal WWTP effluent)
Concentration range observed	<1 to >400 CFU/100 mL
Regulatory context	Colorado Regulation 93 / 303(d) compliance monitoring — City of Boulder Utilities (6 monitoring locations on Boulder Creek, E. coli geometric mean threshold 126 CFU/100 mL). Coastal/beach recreational water monitoring via ASBPA is a longer-term goal.
Reference methods	IDEXX Colilert (E. coli, freshwater) & IDEXX Enterolert (enterococci, marine/coastal) — dual-indicator approach per EPA 2012 RWQC

3 Data Overview

The existing dataset comprises several complementary analyses, all using IDEXX Colilert as the freshwater reference method (E. coli). This consistency is important: every observation in this report is a direct Lume-vs-Colilert comparison against the same Part 136-approved method used for recreational water quality assessment on Boulder Creek. Future coastal/marine validation through the ASBPA partnership will add Enterolert (enterococci) paired data, completing the dual-indicator framework required by EPA’s 2012 RWQC.

Analysis	n	Type	Source
Continuous regression (Boulder Creek field)	38	Paired field observations: Lume continuous estimate vs. Colilert grab sample	Boulder Creek, Sensor 50031. 31 inside Colilert uncertainty bounds, 7 outside.
Three-class categorical classification	334	Categorical bins: <10, 10–100, >100 MPN/100 mL	Laboratory validation with Colilert across controlled concentration ranges.
Binary classification (drinking water)	361	Binary at 1 and 10 CFU/100 mL thresholds	Chlorinated and unchlorinated drinking water supplies, all paired with Colilert.
Continuous (chlorinated vs. unchlorinated)	57	Paired scatter: 38 pre-chlorinated + 19 post-chlorinated	Drinking water, pre- and post-chlorination points.
Chlorine residual binary detection	66	Binary: chlorine present (>0 ppm) vs. absent	Supplementary analysis — not a primary ATP analyte but demonstrates multi-parameter capability.
Total paired observations	856	Across all analyses. Individual samples may appear in multiple evaluation frameworks.

4 Continuous Regression Analysis

Direct comparison of Lume E. coli concentration estimates against Colilert laboratory results for 38 paired observations on Boulder Creek. The Colilert analytical uncertainty (±30%) is shown as horizontal error bars on each point.

Figure 4.1. Predicted (Lume) vs. observed (Colilert) E. coli concentrations on logarithmic axes. Blue markers indicate predictions within the ±30% analytical uncertainty of the Colilert reference method (n=31, 81.6%). Red markers indicate predictions outside this range (n=7, 18.4%). Dashed line = 1:1 perfect agreement. Sensor 50031, Boulder Creek test dataset.

4.1 Summary Statistics

Statistic	Value	Interpretation
Coefficient of determination (R²)	0.67	67% of variance in Colilert results explained by the Lume estimate. Strong for a field-deployed real-time sensor vs. a 24-hour culture method.
Mean absolute percentage error (MAPE, log-scale)	7.12%	Average prediction error in log-transformed concentration space. Remarkably low given inherent Colilert variability.
Predictions within Colilert uncertainty (±30%)	81.6%	31 of 38 predictions fall within the reference method’s own analytical uncertainty bounds.
Sample size	n = 38	Paired field observations, test dataset (not used for model training).
Concentration range (observed)	20–400 CFU/100 mL	Spans over one order of magnitude. Full ATP study will target <1 to >1,000 CFU/100 mL across 6 sites.

4.2 Paired Data

Complete paired dataset (observed Colilert vs. predicted Lume), sorted by observed concentration:

#	Observed (Colilert, CFU/100 mL)	Predicted (Lume, CFU/100 mL)	Within ±30%?
1	20	30	No
2	25	43	No
3	50	45	Yes
4	55	28	No
5	55	40	Yes
6	60	48	Yes
7	60	50	Yes
8	65	55	Yes
9	70	55	Yes
10	70	65	Yes
11	75	65	Yes
12	80	55	No
13	80	70	Yes
14	85	80	Yes
15	90	75	Yes
16	90	80	Yes
17	90	130	No
18	95	115	Yes
19	95	120	Yes
20	95	135	No
21	100	110	Yes
22	100	115	Yes
23	100	130	Yes
24	100	130	Yes
25	105	120	Yes
26	110	115	Yes
27	150	120	No
28	200	175	Yes
29	210	195	Yes
30	250	230	Yes
31	250	75	No (but see note)
32	280	340	Yes
33	300	350	Yes
34	300	370	Yes
35	320	290	Yes
36	350	350	Yes
37	380	360	Yes
38	400	305	Yes

Note: Sample #31 (obs=250, pred=75) represents a significant outlier. In the ATP study, such cases will be investigated for potential sampling errors, sensor fouling, or genuine environmental transients.

5 Three-Class Categorical Classification

Classification of E. coli concentrations into three management-relevant categories: <10, 10–100, and >100 MPN/100 mL, all paired with Colilert reference measurements. These bins align with EPA recreational water quality criteria thresholds for freshwater beaches and rivers.

Figure 5.1. Confusion matrix for three-class categorical classification. The model correctly classifies 308 of 334 samples (92% overall accuracy). Misclassifications are predominantly conservative: 14 samples observed as 10–100 are predicted as >100, which is the safer direction for public health protection.

Figure 5.2. Row-normalized confusion matrix (sensitivity/recall per class). The <10 and >100 classes show near-perfect recall. The 10–100 class has 80.6% sensitivity, with most misses classified conservatively as >100.

5.1 Overall Performance

Metric	Value	Interpretation
Overall accuracy	92.2%	308 of 334 correctly classified.
Balanced accuracy	95.0%	Average per-class recall, unaffected by class imbalance.
Cohen’s kappa	0.84	“Almost perfect” agreement (Landis & Koch, 1977).
Total sample size	n = 334	Laboratory validation samples, all paired with Colilert.

5.2 Per-Class Performance

Class (MPN/100 mL)	Observed (n)	Sensitivity (Recall)	Precision (PPV)	Misclassification Direction
<10	202	99.5%	94.8%	11 predicted <10 were actually 10–100 (false negatives). 1 observed <10 predicted as 10–100.
10–100	129	80.6%	99.0%	11 misclassified as <10, 14 as >100. The >100 misclassifications are conservative (over-reports risk).
>100	3	100%	17.6%	All 3 observed >100 correctly detected. Low precision reflects conservative over-prediction from the 10–100 class. Zero false negatives at this critical threshold.

5.3 Raw Confusion Matrix

	Observed <10	Observed 10–100	Observed >100	Row Total (Predicted)
Predicted <10	201	11	0	212
Predicted 10–100	1	104	0	105
Predicted >100	0	14	3	17
Column Total (Observed)	202	129	3	334

Key finding: The model exhibits a conservative bias—it is more likely to overestimate contamination risk (predicting a higher category) than to underestimate it. This is desirable for public health protection. There are zero false negatives at the >100 threshold and only 1 false negative at the <10 threshold.

6 Binary Classification — Drinking Water Thresholds

Binary classification of E. coli at two regulatory thresholds relevant to drinking water safety: 1 CFU/100 mL and 10 CFU/100 mL. All samples paired with Colilert reference measurements across both chlorinated and unchlorinated water supplies.

Figure 6.1. Confusion matrix at threshold = 1 CFU/100 mL. 91% overall accuracy with minimal class bias (10 false negatives vs. 23 false positives).

Figure 6.2. Confusion matrix at threshold = 10 CFU/100 mL. 92% overall accuracy. False negative rate (missed detections) is only 2.2% (8/361).

6.1 Performance at 1 CFU/100 mL Threshold

Metric	Value	Derivation
Overall accuracy	90.9%	(149 + 179) / 361
Balanced accuracy	91.0%	Mean of sensitivity and specificity
Cohen’s kappa	0.82	“Almost perfect” agreement
Sensitivity (true positive rate)	93.9%	179 / (179 + 10) — correctly detects ≥1 CFU
Specificity (true negative rate)	86.6%	149 / (149 + 23) — correctly classifies <1 CFU
Positive predictive value (PPV)	88.6%	179 / (179 + 23)
Negative predictive value (NPV)	93.7%	149 / (149 + 10)
False negative rate	2.8%	10 / 361 — missed contamination above threshold
False positive rate	6.4%	23 / 361 — false alarms (conservative direction)

6.2 Performance at 10 CFU/100 mL Threshold

Metric	Value	Derivation
Overall accuracy	91.9%	(169 + 163) / 361
Balanced accuracy	92.0%	Mean of sensitivity and specificity
Cohen’s kappa	0.84	“Almost perfect” agreement
Sensitivity (true positive rate)	95.3%	163 / (163 + 8) — correctly detects ≥10 CFU
Specificity (true negative rate)	89.0%	169 / (169 + 21) — correctly classifies <10 CFU
Positive predictive value (PPV)	88.6%	163 / (163 + 21)
Negative predictive value (NPV)	95.5%	169 / (169 + 8)
False negative rate	2.2%	8 / 361 — missed contamination above threshold
False positive rate	5.8%	21 / 361 — false alarms (conservative direction)

6.3 Raw Confusion Matrices

Threshold = 1 CFU/100 mL
	Observed <1	Observed ≥1	Total
Predicted <1	149	10	159
Predicted ≥1	23	179	202
Total	172	189	361

Threshold = 10 CFU/100 mL
	Observed <10	Observed ≥10	Total
Predicted <10	169	8	177
Predicted ≥10	21	163	184
Total	190	171	361

7 Chlorination Effects on Sensor Performance

Analysis of Lume performance across pre-chlorinated (untreated) and post-chlorinated (treated) drinking water samples. This is relevant to the ATP because it demonstrates sensor behavior across a treatment boundary that fundamentally changes the relationship between TLF and viable E. coli.

Figure 7.1. Predicted vs. observed E. coli for pre-chlorinated (n=38) and post-chlorinated (n=19) samples on logarithmic axes. Pre-chlorinated samples show strong agreement with the 1:1 line across 1–200 CFU/100 mL. Post-chlorinated samples cluster near the detection limit (observed ~0.1 CFU) with greater scatter in predictions, reflecting residual TLF from inactivated cells.

7.1 Pre-Chlorinated Performance

Metric	Value
Sample size	n = 38
Observed concentration range	3–200 CFU/100 mL
Predicted concentration range	0.15–800 CFU/100 mL
Qualitative agreement	Strong positive correlation. Points cluster around 1:1 line. One significant outlier (obs=200, pred=800).

7.2 Post-Chlorinated Performance

Metric	Value
Sample size	n = 19
Observed concentration	~0.1 CFU/100 mL (all below detection)
Predicted concentration range	0.05–6.0 CFU/100 mL
Interpretation	Chlorination inactivates E. coli but does not immediately eliminate TLF signal from cellular material. The Lume slightly over-predicts in post-chlorinated water, which is the conservative (protective) direction. Most predictions remain below 1 CFU/100 mL.

ATP implication: For recreational water monitoring applications (the primary ATP target), chlorinated effluent near swim beaches and river access points is a relevant matrix. The data shows the Lume performs well for pre-treatment assessment. Post-chlorination overestimation is expected and conservative. The method documentation should specify expected behavior in chlorinated matrices.

8 Chlorine Residual Detection

Supplementary analysis: the Lume can also detect the presence of chlorine residual as a binary classification. While not a primary ATP target, this demonstrates the sensor’s multi-parameter intelligence and its potential for treatment process monitoring.

Figure 8.1. Confusion matrix for binary chlorine residual detection (0 ppm vs. >0 ppm). 85% overall accuracy with balanced performance across both classes.

8.1 Performance Summary

Metric	Value	Derivation
Overall accuracy	84.8%	(29 + 27) / 66
Balanced accuracy	85.0%	Mean of sensitivity and specificity
Cohen’s kappa	0.70	“Substantial” agreement (Landis & Koch)
Sensitivity (detects chlorine present)	84.4%	27 / (27 + 5)
Specificity (detects chlorine absent)	85.3%	29 / (29 + 5)
Sample size	n = 66	29 chlorine-absent + 32 chlorine-present + 5 FP + 5 FN = 66

9 Method Precision

Comparison of measurement precision between the Lume (TLF) and culture-based methods (Colilert). Precision is a critical element of the ATP evaluation—an alternate method must demonstrate comparable or superior precision to the reference method.

Precision Metric	Lume (TLF)	Culture-Based	Source
Duplicate relative percent difference (RPD)	14%	≥26%	Kenya groundwater study (Sorensen et al., 2018). Average RPD of duplicate measurements.

The Lume demonstrates nearly 2× better precision than culture-based duplicate measurements. This has important implications for the ATP comparability analysis:

Some apparent disagreement between the Lume and Colilert reflects Colilert’s own imprecision, not sensor error.
The Colilert ±30% analytical uncertainty bounds used in Figure 4.1 are derived from this known reference method variability.
The ATP statistical analysis should include a formal estimate of reference method variability to contextualize apparent discrepancies.
The planned ATP study (WS 2) includes duplicate Colilert grabs every 10th sample to generate a site-specific reference method precision estimate for Boulder Creek.

Three-Way Method Comparison: Lume vs. Colilert vs. Membrane Filtration

A paired comparison of 153 samples analyzed by both EPA-approved methods (Colilert and membrane filtration) reveals that the two accepted reference methods disagree with each other more than the Lume disagrees with Colilert:

Comparison	R²	Agreement
Colilert vs. MF (both EPA-approved)	0.584	37.9% within 2×; 72.5% categorical at 126 CFU
Lume vs. Colilert (bench)	0.77–0.84	>75% within Colilert uncertainty
Lume vs. Colilert (Boulder Creek)	0.67	>75% within Colilert uncertainty; 7% MAPE

MF yielded a median 2.2× higher count than Colilert, with replicate RPDs of 43.5% (Colilert) and 57.9% (MF). The Lume achieves better quantitative agreement with Colilert than membrane filtration does — while providing continuous, real-time data at a fraction of the cost.

Three-way method comparison: MF vs Colilert, Lume vs Colilert, Lume vs MF

Three-way method comparison across regression (top), Bland-Altman agreement (middle), and categorical classification (bottom). The Lume achieves better agreement with Colilert (R² = 0.861, κ = 0.63) than MF does (R² = 0.584, κ = 0.33). The Lume model was trained on Colilert; the Lume vs. MF column (right) demonstrates generalization to an independent EPA-approved reference method.

Source: Knopp et al. (2026), “Advancing continuous in-situ quantification of microbial contamination in environmental waters using tryptophan-like fluorescence,” under revision at Water Research.

Formal MDL study: A 40 CFR Part 136 Appendix B method detection limit study has not yet been performed under the standardized EPA protocol. This is a required deliverable for the ATP application and is planned as Workstream 4 (see Statistical & Data Analysis on the ATP Overview page).

10 Conclusions & Regulatory Readiness

10.1 What the Existing Data Demonstrates

	Finding	Evidence
✓	Strong quantitative agreement with Colilert on Boulder Creek	R² = 0.67, MAPE = 7.12%, 81.6% of predictions within Colilert uncertainty. (Section 4)
✓	Excellent categorical classification at management-relevant thresholds	92% accuracy, 95% balanced accuracy, kappa = 0.84 across three bins. (Section 5)
✓	Reliable binary detection at low regulatory thresholds	91–92% accuracy at 1 and 10 CFU/100 mL. Kappa 0.82–0.84. (Section 6)
✓	Conservative error direction (protects public health)	False positive rate exceeds false negative rate across all analyses. Model over-predicts risk rather than under-predicts. (Sections 5, 6, 7)
✓	Superior precision vs. reference method	14% RPD vs. ≥26% for culture-based duplicates. (Section 9)
✓	Characterized behavior across chlorinated/unchlorinated matrices	Strong pre-chlorination performance. Conservative post-chlorination behavior documented. (Section 7)

10.2 Gaps to Address in the ATP Study

	Gap	How the Planned Study Addresses It
●	Limited sample size for continuous regression (n=38)	6 sites × 52+ weeks = 400–600 paired observations over 12–18 months. 10–15× more data.
●	Narrow concentration range (20–400 CFU/100 mL)	6 diverse recreational monitoring sites span upstream reference (<1 CFU) through WWTP-influenced (>1,000 CFU during events). Coastal/beach sites via ASBPA partnership will expand range further.
●	No formal MDL study (40 CFR Part 136 Appendix B)	Planned as Workstream 4, Task 4.1. Laboratory and field MDL determination.
●	No Appendix H formal comparability analysis	Planned as Workstream 4, Task 4.3. Will include equivalence testing (TOST), Bland-Altman, regression analysis.
●	Seasonal coverage incomplete	12–18 month deployment captures all seasons including spring runoff and winter low-flow.
●	Single-operator data (Virridy)	Boulder staff will be trained to operate sensors. Limited-use application requires single-operator data; nationwide will require multi-lab expansion.

10.3 Assessment

The existing Boulder Creek / Colilert dataset provides a strong preliminary case that the Lume sensor produces results comparable to the approved Colilert reference method for freshwater microbial monitoring. The data is sufficient to support the EPA pre-submission consultation and to demonstrate technical feasibility to CDPHE and EPA Region 8 for Track A Phase 1: a Boulder Creek facility-specific ATP under 40 CFR 136.5 for Reg 93 / 303(d) compliance monitoring. The planned 12–18 month validation study at Boulder’s 6 monitoring sites will generate the regulatory-grade dataset for that limited-use submission. Track A Phase 2 will expand to additional Colorado sites and pursue CDPHE state-wide recognition, building toward Track A Phase 3: a nationwide freshwater ATP under 40 CFR 136.4. In parallel, Track B (ASBPA coastal) will provide Enterolert (enterococci) data from ocean beach sites in other states, eventually feeding a combined or separate nationwide coastal ATP submission.