Skip to content

digital-identity-platform-docs


digital-identity-platform-docs / workers/sgtm-usage-cron/worker

workers/sgtm-usage-cron/worker

Cloudflare Worker performing Daily Usage Aggregation for Billing.

Role in the platform

  • Source: Queries Cloudflare Analytics Engine (AE) for D-1 logs.
  • Sink: Upserts aggregated counts to Supabase (sgtm_usage_daily_cf).
  • Billing Model: We charge based on "Total Hits" (Requests) vs "Recovered Hits" (Adblock/ITP recovery).

Pipeline Flow (runCron)

  1. Time Resolution: Calculates the "Yesterday" (UTC) time window.
  2. AE Query: Fetches RAW rows from Analytics Engine for that window.
  3. Aggregation: Aggregates data in-memory (JS Map) by Container ID.
    • Counts: total_hits, ga4_hits, gtm_hits, recovered_hits, etc.
  4. Persistence: Pushes results to Supabase via idempotent UPSERT.

Memory & Performance Limits

  • In-Memory Aggregation: The aggregation step runs entirely in Worker memory.
  • Risk: If daily log volume exceeds ~500k-1M rows, this might hit the Worker memory limit (128MB).
  • Future Optimization: Push aggregation to AE SQL GROUP BY heavily if supported efficiently.

Idempotency & Reliability

  • Strategy: Uses prefer=resolution=merge-duplicates (Upsert) in Supabase.
  • Conflict Target: The unique key is (sgtm_container_id, date).
  • Effect: Re-running the cron is safe. It simply overwrites the day's stats with the latest values, fixing any previous under-counts.

⚠️ Latency & Consistency

  • Async Ingestion: Analytics Engine is asynchronous. Logs from 23:59:59 might not be queryable immediately at 00:00:00.
  • Mitigation: This cron should run multiple times or with a safe delay (e.g., at 01:00 UTC) to capture settled data.

Testing url: https://sgtm-usage-cron.matej-4a9.workers.dev/__run_cron_now

Variables

Released under proprietary license.