Skip to content

feat(jobs): Add data retention jobs#4128

Draft
TheodoreSpeaks wants to merge 5 commits intostagingfrom
feat/auto-redaction
Draft

feat(jobs): Add data retention jobs#4128
TheodoreSpeaks wants to merge 5 commits intostagingfrom
feat/auto-redaction

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented Apr 13, 2026

Summary

Add data retention jobs. 3 jobs created:

  1. Clean up soft deleted resources (7 days free, 30 days paid, customizable enterprise)
  2. Log retention cleanup (7 days free, infinite paid, customizable enterprise)
  3. Task cleanup (7 days free, infinite paid, customizable enterprise)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

  • Tested locally. Validated that data is deleted from sim and copilot dbs. Validated that s3 buckets clean up data as well.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Apr 18, 2026 2:54am

Request Review

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks let's consolidate the migrations into a single one, just delete the existing ones and run it once over all the changes in shcema.ts

@TheodoreSpeaks TheodoreSpeaks changed the title Feat/auto redaction (wip) feat(jobs): Add data retention jobs Apr 18, 2026
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@BugBot review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 18, 2026

PR Summary

High Risk
Adds automated background jobs that permanently delete logs, soft-deleted resources, tasks, and associated storage/copilot backend data, increasing risk of unintended data loss if retention defaults or workspace targeting are wrong. Also introduces new workspace-level retention configuration that affects cleanup behavior across plans.

Overview
Introduces a tiered data-retention system by adding three new cleanup job types (logs, soft-deletes, and tasks) that are dispatched via cron-triggered API routes and executed as background tasks, with behavior split by free/paid defaults and enterprise per-workspace settings.

Adds an enterprise-only workspace settings surface (API + UI tab) to view effective retention (defaults vs configured) and to update per-workspace retention hours with auditing.

Centralizes retention defaults and job dispatch in cleanup-dispatcher (supports Trigger.dev batch triggering, queue fallback, and inline execution on DB-backed queues), adds chat/copilot cleanup helpers for deleting associated backend data and storage files, and extends DB schema/indexing to support retention columns and faster soft-delete queries (while removing the legacy FREE_PLAN_LOG_RETENTION_DAYS env config).

Reviewed by Cursor Bugbot for commit 18b3f24. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 18b3f24. Configure here.

logger.error(`[${tableName}] Batch delete failed:`, { error })
hasMore = false
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELETE without LIMIT makes batching logic ineffective

High Severity

The cleanupTable function in both cleanup-soft-deletes.ts and cleanup-tasks.ts uses BATCH_SIZE and MAX_BATCHES_PER_TABLE constants to suggest batched deletion, but the db.delete() call has no .limit() clause. PostgreSQL DELETE doesn't support LIMIT directly, so the query deletes all matching rows in a single unbounded operation. The check hasMore = deleted.length === BATCH_SIZE is effectively dead logic since the delete returns all rows, not a capped batch. Additionally, .returning({ id: sql\id` })loads all deleted row IDs into memory at once, risking OOM for large tables. Compare withcleanup-logs.ts, which correctly batches via SELECT with .limit(BATCH_SIZE)` followed by DELETE-by-IDs.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 18b3f24. Configure here.

"version": "7",
"when": 1775952219230,
"tag": "0191_parched_living_mummy",
"breakpoints": true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration journal references non-existent SQL migration files

High Severity

The migration journal adds two entries — 0190_lean_terror and 0191_parched_living_mummy — but the corresponding .sql files don't exist in the migrations/ directory. Additionally, there are now two entries with idx: 190 (the pre-existing 0190_shocking_karma and the new 0190_lean_terror), creating a duplicate index. Running migrations will fail because the migration runner can't find the referenced files.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 18b3f24. Configure here.

"type": "integer",
"primaryKey": false,
"notNull": false
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot defines task_redaction_hours column absent from schema

Medium Severity

The migration snapshot includes a task_redaction_hours column on the workspace table, but schema.ts does not define a corresponding taskRedactionHours field. The schema only adds logRetentionHours, softDeleteRetentionHours, and taskCleanupHours. This mismatch means the migration would create a DB column that's inaccessible from application code, and the next drizzle-kit generate would produce a migration to drop it.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 18b3f24. Configure here.

runChildResults.reduce((s, r) => s + r.deleted, 0) +
runsResult.deleted +
chatsResult.deleted +
inboxResult.deleted
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback deletions excluded from total deleted count

Low Severity

The totalDeleted sum in runCleanupTasks includes runChildResults, runsResult, chatsResult, and inboxResult, but omits feedbackResult.deleted. The feedback deletion block (around line 218–253) successfully deletes copilotFeedback rows but those counts aren't reflected in the logged total.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 18b3f24. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants