STAGE-1 · NO ES PROD

Entorno de testing aislado de Socio Virtual

Clonado de producción el 2026-05-07 vía AMI snapshot. Sirve para probar features end-to-end (código + DB + nginx + agentes) en condiciones idénticas a prod, antes de promoverlas. Cero riesgo sobre clientes reales.

Topología actual

Cómo el tráfico se separa entre producción y stage. Cada uno tiene su propia EC2, su propio Postgres local, sus propios subdominios, y cero shared state.

flowchart TD
    Internet([Internet]) --> CF[Cloudflare DNS
zone sociovirtual.ai]

    CF -->|share / control / backoffice
commandcenter / live / meta
ext / webhook| ProdEC2[sv-production
i-09b50a47b30aa4893
3.138.80.125
t3.xlarge Ubuntu 24.04]
    CF -->|share-stage1 / control-stage1
backoffice-stage1 / commandcenter-stage1
live-stage1 / meta-stage1
ext-stage1 / webhook-stage1| StageEC2[sv-production-stage
i-0d53ae90be25b1bcd
3.12.134.57
t3.large Ubuntu 24.04]

    ProdEC2 --> ProdState[26 agentes activos
72 bindings WA reales
Meta WA Cloud activo
Stripe live + sandbox
Notion / Tokko / Odoo writes
Anthropic key SV principal]
    StageEC2 --> StageState[12 agentes core
0 bindings WA
Meta deshabilitado
Stripe sandbox
9 keys STAGE-BLANK
Anthropic key dedicada stage]

    ProdEC2 -.->|"AMI snapshot
(2026-05-07 18:17, --no-reboot)"| StageEC2

    Legacy[i-0775f92ac229dbfca
sv-production-stage anterior
Amazon Linux 2023
backup en 16GB tar]:::dead
    Legacy -.->|terminated 18:38| StageEC2

    classDef dead fill:#220,stroke:#666,color:#888,stroke-dasharray:5

El EIP 3.12.134.57 se reasignó del legacy terminado a la nueva instancia. Mismo alias SSH, IP idéntica, OS distinto (Ubuntu en lugar de AL2023).

Cómo se hizo la migración

Timeline real, sin downtime sobre prod. La AMI se creó con --no-reboot, todos los pasos siguientes corrieron sobre la instancia stage nueva.

gantt
    title Migración prod → stage1 · UTC
    dateFormat YYYY-MM-DD HH:mm
    axisFormat %H:%M
    section Backup
    Backup legacy 16GB              :done, b1, 2026-05-07 17:06, 25m
    pg_dump openclaw_mt 21MB        :done, b2, 2026-05-07 17:15, 5m
    Branch stage en GitHub          :done, b3, 2026-05-07 17:15, 2m
    section AMI + Launch
    Crear AMI ami-0689e1aa           :done, c1, 2026-05-07 18:17, 20m
    Terminate legacy 3.12.134.57    :done, c2, 2026-05-07 18:38, 1m
    Launch nueva t3.large           :done, c3, 2026-05-07 18:38, 2m
    Reasignar EIP                    :done, c4, 2026-05-07 18:40, 1m
    section Sanitize
    user-data firstboot              :done, d1, 2026-05-07 18:41, 2m
    SQL sanitize Postgres            :done, d2, 2026-05-07 18:43, 6m
    nginx + LE certs cleanup        :done, d3, 2026-05-07 18:48, 2m
    section Configuración
    Restaurar 35 creds críticas      :done, e1, 2026-05-07 21:55, 5m
    Arrancar 12 servicios core       :done, e2, 2026-05-07 22:00, 5m
    section DNS + TLS
    Token Cloudflare con perms       :done, f1, 2026-05-08 15:35, 5m
    8 records *-stage1.sociovirtual.ai :done, f2, 2026-05-08 15:38, 1m
    Certbot SAN para 8 dominios      :done, f3, 2026-05-08 15:42, 4m
    Landing en share-stage1          :active, f4, 2026-05-08 15:45, 15m

Workflow git stage ↔ prod

Branch stage larga viva en el repo sociovirtualai/openclaw-multitenant. Cada server checked out en su branch. Las features se prueban en stage, después se promueven a prod via PR squash.

gitGraph
    commit id: "182b816 main"
    commit id: "promote: PL rule"
    branch stage
    checkout stage
    commit id: "stage branch creada"
    branch "feature/nuevo-skill"
    checkout "feature/nuevo-skill"
    commit id: "skill nueva"
    commit id: "tests"
    checkout stage
    merge "feature/nuevo-skill"
    commit id: "validar en stage"
    commit id: "fix issue"
    checkout main
    merge stage tag: "promote"
    commit id: "deploy prod"

Comandos típicos del flujo:

cd ~/ocmt
git checkout stage
git pull origin stage
git checkout -b feature/x
# trabajar en stage server, validar
git push origin feature/x
gh pr create --base stage --head feature/x

# después de validar en stage:
git checkout stage
git merge feature/x
git push origin stage
gh pr create --base main --head stage --title "promote: feature/x"

# después del merge:
ssh sv-production
cd ~/ocmt && git pull origin main && node mt/db/migrate.ts
systemctl --user restart <servicio>

Aislación de datos

El sanitize neutralizó cualquier camino donde stage podría tocar producción real. La DB postgres es local del stage (la copia, no compartida con prod).

flowchart LR
    subgraph PROD["Producción · sv-production"]
        PGProd[("Postgres prod
localhost:5432
15 accesses
7 ext_channels active
4 provider_keys
12 tenant_balance
528 hha_tasks")]
        WAProd["~/.wacli
Alfred Baileys
~/.wacli-personal
Pedro Baileys"]
        MetaProd["META_APP_SECRET
token activo"]
        NotionProd["NOTION_API_KEY
CLOUDFLARE_API_TOKEN
TOKKO/ODOO/GHL"]
    end

    subgraph STAGE["Stage-1 · sv-production-stage"]
        PGStage[("Postgres stage
localhost:5432
0 accesses
0 ext_active
0 provider_keys
0 balance
0 hha_tasks")]
        Archive[("accesses_archive_stage
15 bindings preservados
solo lectura")]
        WAStage["~/.wacli movido
a backup
cabina-wa-bridge
MASKED"]
        MetaStage["META_APP_SECRET
= STAGE-BLANK"]
        NotionStage["NOTION_API_KEY
CLOUDFLARE_API_TOKEN
TOKKO / ODOO / GHL
= STAGE-BLANK"]
    end

    PROD -.->|"AMI snapshot
zero downtime"| STAGE
    STAGE -.->|"SQL sanitize
UPDATE ext_channels SET is_active=false
TRUNCATE accesses, provider_keys, hha_tasks
DELETE FROM tenant_balance"| Archive

    classDef prod fill:#1a1f2e,stroke:#1a73e8,color:#e8e8e8
    classDef stage fill:#1f1a0d,stroke:#f5a623,color:#e8e8e8
    classDef arch fill:#0d1f0d,stroke:#2ecc71,color:#e8e8e8
    class PGProd,WAProd,MetaProd,NotionProd prod
    class PGStage,WAStage,MetaStage,NotionStage stage
    class Archive arch

Lo que NO puede pasar en stage

Mensajes a clientes reales: 0 bindings WA activos, external_channels.is_active=false para todos.
Sesiones Baileys del prod afectadas: stores movidos a backup antes del primer boot.
Cobros Stripe: balances vaciados, timer hha-auto-topup masked, key sandbox.
Escrituras a Notion / CF / Tokko / Odoo / GHL: keys STAGE-BLANK.
Compartir cuota Anthropic con prod: stage usa una API key dedicada.

Lo que SÍ está vivo en stage

Postgres local con copia de la DB de prod (12 tenants is_stage=true).
Substrate stack completo (server, watcher, incremental, feedback-worker, mcp-http).
OpenClaw gateway listo para correr agentes test.
Cabina (control-api + control-ui), backoffice, command-center, live, informante-admin.
Stripe sandbox para probar billing flows sin tocar dinero real.

Estado de servicios

De los 30+ servicios systemd que vienen en la AMI de prod, solo 18 están activos en stage. Los que tocan sistemas externos vivos están masked vía symlinks a /dev/null.

flowchart TD
    AMI[AMI booted
30+ unit files heredados] --> Decide{Tipo de servicio}

    Decide -->|infra core| Active[18 ACTIVE en stage]
    Decide -->|toca sistemas externos
de clientes| Masked[17 MASKED en stage]
    Decide -->|cobra dinero o
mata sesiones WA| Masked

    Active --> ActiveList["substrate-server
substrate-feedback-worker
substrate-mcp-http
substrate-incremental
substrate-watcher
openclaw-mt
openclaw-gateway
openclaw-centinela
openclaw-media-relay
ocmt-control-api
control-ui
ocmt-backoffice
ocmt-live
webhook-server
informante-admin-endpoint
constelacion-server
contact-attributes-endpoint
visitas-endpoint"]

    Masked --> MaskedList["cabina-wa-bridge
cabina-wa-operante-bridge
cabina-linkedin-bridge
wacli-personal-sync
external-channels
tokko-sync.timer
hha-auto-topup.timer
hha-detector
informante-metrics-poll.timer
informante-channel-refresh.timer
informante-hashtag-pack.timer
sv-claude-config-autocommit.timer
reporte-semanal-tenants.timer
soc81-reminder.timer
whatsapp-mcp-bridge
openclaw-rule-review.timer
session-migration"]

    classDef active fill:#0d1f0d,stroke:#2ecc71,color:#e8e8e8
    classDef masked fill:#1f0d0d,stroke:#e74c3c,color:#e8e8e8
    class Active,ActiveList active
    class Masked,MaskedList masked

Endpoints HTTPS

Stage URL	Equivalente prod	Sirve	Status
share-stage1	share	Statics + landings + papers	200
control-stage1	control	Cabina (control-api + control-ui + WS)	401 Basic Auth
backoffice-stage1	backoffice	Panel admin OCMT	200
commandcenter-stage1	commandcenter	Command Center	401 Basic Auth
live-stage1	live	OCMT Live (Next.js)	200
meta-stage1	meta	Webhooks Meta (deshabilitado)	404 esperado
ext-stage1	ext	External channels (intencionalmente masked)	502 intencional
webhook-stage1	webhook	Webhook server genérico	404 esperado

KPIs del stage

servicios systemd activos

servicios masked (defensa profundidad)

subdominios -stage1 con TLS

35 / 9

creds restauradas / STAGE-BLANK

tenants en DB marcados is_stage

downtime sobre prod durante migración

Cómo entrar al stage

ssh -i ~/.ssh/SOCIOVIRTUAL_KEY.pem ubuntu@3.12.134.57
# o con alias:
ssh sv-production-stage

Verificación obligatoria antes de cualquier acción:

cat /etc/ocmt-stage-marker  # debe decir "stage"
hostname                     # ocmt-stage
echo $OCMT_ENV               # stage
psql "$DATABASE_URL" -c "SELECT inet_server_addr()"  # 127.0.0.1

Si los cuatro no matchean, abortar inmediatamente — el comando podría estar apuntando a producción.

Plan completo

El plan original con justificación, riesgos contemplados y prompt para revisión por IAs externas vive en el filesystem del stage:

~/.claude/plans/quiero-empezar-a-migrar-delegated-treasure.md

El ~/CLAUDE.md del stage tiene un bloque arriba de todo (489 líneas) con todas las reglas de aislación que cualquier agente Claude que entre a este server debe respetar.