digital-identity-platform-docs
digital-identity-platform-docs / workers/sgtm-usage-cron/worker
workers/sgtm-usage-cron/worker
Cloudflare Worker performing Daily Usage Aggregation for Billing.
Role in the platform
- Source: Queries Cloudflare Analytics Engine (AE) for D-1 logs.
- Sink: Upserts aggregated counts to Supabase (
sgtm_usage_daily_cf). - Billing Model: We charge based on "Total Hits" (Requests) vs "Recovered Hits" (Adblock/ITP recovery).
Pipeline Flow (runCron)
- Time Resolution: Calculates the "Yesterday" (UTC) time window.
- AE Query: Fetches RAW rows from Analytics Engine for that window.
- Aggregation: Aggregates data in-memory (JS Map) by Container ID.
- Counts:
total_hits,ga4_hits,gtm_hits,recovered_hits, etc.
- Counts:
- Persistence: Pushes results to Supabase via idempotent UPSERT.
Memory & Performance Limits
- In-Memory Aggregation: The aggregation step runs entirely in Worker memory.
- Risk: If daily log volume exceeds ~500k-1M rows, this might hit the Worker memory limit (128MB).
- Future Optimization: Push aggregation to AE SQL
GROUP BYheavily if supported efficiently.
Idempotency & Reliability
- Strategy: Uses
prefer=resolution=merge-duplicates(Upsert) in Supabase. - Conflict Target: The unique key is
(sgtm_container_id, date). - Effect: Re-running the cron is safe. It simply overwrites the day's stats with the latest values, fixing any previous under-counts.
⚠️ Latency & Consistency
- Async Ingestion: Analytics Engine is asynchronous. Logs from 23:59:59 might not be queryable immediately at 00:00:00.
- Mitigation: This cron should run multiple times or with a safe delay (e.g., at 01:00 UTC) to capture settled data.
Testing url: https://sgtm-usage-cron.matej-4a9.workers.dev/__run_cron_now