06 - Infraestructura de MedTime¶

Identificador: TECH-INF-001 Version: 1.0.1 Fecha: 2025-12-08 Ultima Revision: DV2-P3 - Mobile dashboard refresh rate specification Autor: DevOpsDrone (MTS-DRN-OPS-001) / SpecQueen Technical Division Refs Arquitectura: TECH-ARC-001, TECH-CS-001 Refs Seguridad: TECH-SEC-SERVER-001, TECH-SEC-CLIENT-001 Refs Testing: TECH-TST-001 Estado: Borrador

1. Introduccion
1.1. Proposito
1.2. Alcance
1.3. Principios de Infraestructura
2. Cloud Architecture
2.1. Stack Tecnologico
2.2. Supabase Configuration
2.3. Firebase Services
2.4. Twilio Integration
2.5. Network Topology
3. Environments
3.1. Development
3.2. Staging
3.3. Production
3.4. Feature Branches
3.5. Environment Promotion
4. CI/CD Pipelines
4.1. Mobile CI/CD (GitHub Actions + Fastlane)
4.2. Backend CI/CD
4.3. Database Migrations
4.4. Release Management
5. Monitoring y Alerting
5.1. Zero-Knowledge Monitoring
5.2. Metricas Permitidas
5.3. Metricas Prohibidas
5.4. Alertas de Infraestructura
5.5. Dashboards
6. Disaster Recovery
6.1. RTO y RPO
6.2. Backup Strategy
6.3. Failover Procedures
6.4. Data Recovery
7. Security Infrastructure
7.1. Secrets Management
7.2. Certificate Management
7.3. Network Security
7.4. Compliance Scanning
8. Cost Management
8.1. Cost Breakdown
8.2. Resource Optimization
8.3. Cost Alerts
8.4. Scaling Strategy
9. Infrastructure as Code
10. Runbooks
11. Referencias

1. Introduccion¶

1.1. Proposito¶

Este documento define la infraestructura completa para MedTime, cubriendo:

Cloud services (Supabase, Firebase, Twilio)
CI/CD pipelines para mobile y backend
Monitoring y alerting Zero-Knowledge
Disaster recovery y backups
Environments y deployment strategy

1.2. Alcance¶

Componente	Incluido	Proveedor
Database	Si	Supabase (PostgreSQL)
Authentication	Si	Firebase Auth
Push Notifications	Si	Firebase FCM
SMS Notifications	Si	Twilio
Storage (blobs)	Si	Supabase Storage
Mobile CI/CD	Si	GitHub Actions + Fastlane
Backend CI/CD	Si	GitHub Actions
Monitoring	Si	Supabase Dashboard + Custom
Error Tracking	Si	Firebase Crashlytics + Sentry

1.3. Principios de Infraestructura¶

PRINCIPIOS FUNDAMENTALES:
+------------------------------------------------------------------+
|  1. Zero-Knowledge Monitoring                                     |
|     - El monitoreo NUNCA ve PHI en claro                          |
|     - Solo metadata operativa                                     |
|                                                                   |
|  2. Serverless-First                                              |
|     - Minimizar infraestructura a mantener                        |
|     - Managed services sobre custom servers                       |
|                                                                   |
|  3. Mobile-First                                                  |
|     - CI/CD optimizado para iOS/Android                           |
|     - Backend es minimalista (5% procesamiento)                   |
|                                                                   |
|  4. Compliance-Ready                                              |
|     - HIPAA, LGPD, FDA desde el diseño                           |
|     - Audit logs, backups, encryption por defecto                 |
+------------------------------------------------------------------+

2. Cloud Architecture¶

2.1. Stack Tecnologico¶

graph TB
    subgraph MOBILE["Apps Moviles (95%)"]
        direction LR
        IOS["iOS App<br/>Swift/SwiftUI<br/>Realm"]
        ANDROID["Android App<br/>Kotlin/Compose<br/>Room"]
    end

    subgraph BACKEND["Backend Services (5%)"]
        direction TB

        subgraph SUPABASE["Supabase"]
            PG["PostgreSQL<br/>(Blobs E2E)"]
            AUTH_SB["Auth Helper"]
            STORAGE["Storage<br/>(Files)"]
            EDGE["Edge Functions<br/>(Node.js)"]
        end

        subgraph FIREBASE["Firebase"]
            AUTH_FB["Authentication"]
            FCM["Cloud Messaging"]
            CRASH["Crashlytics"]
        end

        subgraph EXTERNAL["External Services"]
            TWILIO["Twilio SMS"]
            SENTRY["Sentry<br/>(Error Tracking)"]
        end
    end

    IOS -->|HTTPS/REST| EDGE
    ANDROID -->|HTTPS/REST| EDGE

    IOS -->|Auth| AUTH_FB
    ANDROID -->|Auth| AUTH_FB

    EDGE --> PG
    EDGE --> STORAGE
    EDGE --> AUTH_SB
    EDGE --> TWILIO

    AUTH_FB --> FCM

    IOS -.->|Crashes| CRASH
    ANDROID -.->|Crashes| CRASH

    EDGE -.->|Errors| SENTRY

    style MOBILE fill:#99ff99
    style BACKEND fill:#99ccff

2.2. Supabase Configuration¶

Tier: Pro (minimo requerido para HIPAA compliance)

Supabase Configuration:

  project:
    name: medtime-prod
    region: us-east-1  # O sa-east-1 para Brasil
    organization: MedTime Inc.

  database:
    plan: Pro
    version: PostgreSQL 15
    instance_size: Small (inicio) -> Medium (escala)
    storage: 8GB (inicio) -> auto-scale

    extensions:
      - uuid-ossp  # UUIDs
      - pgcrypto   # Funciones crypto
      - pg_stat_statements  # Monitoring queries

    connection_pooling:
      mode: transaction
      pool_size: 15
      timeout: 10s

    backups:
      enabled: true
      schedule: daily
      retention: 30_days
      point_in_time_recovery: 7_days

  auth:
    # Supabase Auth es complementario a Firebase
    # Solo se usa para RLS context
    enabled: true
    jwt_expiry: 3600

  storage:
    # Para blobs grandes (imagenes de recetas)
    enabled: true
    file_size_limit: 5MB
    public_buckets: []  # Todos privados

  edge_functions:
    # Node.js/TypeScript para APIs
    runtime: deno
    timeout: 60s
    memory: 256MB

  security:
    ssl_enforcement: required
    rls_enabled: true  # Obligatorio
    row_level_security: enforced

Configuracion de PostgreSQL:

-- Configuraciones de seguridad
ALTER SYSTEM SET ssl = 'on';
ALTER SYSTEM SET ssl_min_protocol_version = 'TLSv1.3';

-- Performance tuning (ajustar segun carga)
ALTER SYSTEM SET shared_buffers = '256MB';
ALTER SYSTEM SET effective_cache_size = '1GB';
ALTER SYSTEM SET maintenance_work_mem = '64MB';
ALTER SYSTEM SET work_mem = '16MB';

-- Logging (sin PHI)
ALTER SYSTEM SET log_statement = 'ddl';  -- Solo DDL
ALTER SYSTEM SET log_min_duration_statement = 1000;  -- Queries > 1s
ALTER SYSTEM SET log_connections = 'on';
ALTER SYSTEM SET log_disconnections = 'on';

-- IMPORTANTE: NO loguear contenido de queries (puede tener PHI)
ALTER SYSTEM SET log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h ';

2.3. Firebase Services¶

Tier: Blaze (pay-as-you-go)

Firebase Configuration:

  authentication:
    providers:
      - email_password
      - google
      - apple  # Obligatorio para iOS

    password_policy:
      min_length: 12
      require_uppercase: true
      require_lowercase: true
      require_number: true

    mfa:
      enforcement:
        free: optional
        pro: required
        perfect: required
      providers:
        - totp
        - sms

    session_management:
      id_token_expiry: 3600  # 1 hora
      refresh_token_expiry: 2592000  # 30 dias
      concurrent_sessions:
        free: 2
        pro: 5
        perfect: 10

  cloud_messaging:
    # FCM para push notifications (backup de local)
    platforms:
      - ios
      - android

    message_types:
      - alert_backup  # Backup de alertas locales
      - emergency_notification
      - caregiver_notification

    rate_limiting:
      per_device: 100/hour
      per_topic: 1000/hour

    # IMPORTANTE: Payloads NO contienen PHI
    payload_sanitization: enforced

  crashlytics:
    enabled: true
    symbolication: automatic

    # NO enviar PHI en crash reports
    custom_keys_allowed: false
    user_id_collection: hashed_only

    retention: 90_days

  analytics:
    # Analytics minimalista (Zero-Knowledge)
    enabled: true
    events:
      # Solo eventos de uso, sin contenido
      - app_open
      - screen_view
      - user_engagement

    # NO enviar datos sensibles
    automatically_collected: false
    user_properties: [tier, role]  # Solo metadata

Firebase Security Rules:

// firestore.rules (si se usa Firestore para cache)
rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {

    // Blobs cifrados - solo owner
    match /encrypted_blobs/{userId}/{blobId} {
      allow read, write: if request.auth != null
                         && request.auth.uid == userId;
    }

    // Catalogos publicos - autenticados
    match /public_catalogs/{catalog}/{item} {
      allow read: if request.auth != null;
      allow write: if false;  // Solo admins via backend
    }

    // Por defecto: denegar
    match /{document=**} {
      allow read, write: if false;
    }
  }
}

// storage.rules
rules_version = '2';
service firebase.storage {
  match /b/{bucket}/o {

    // Imagenes de recetas (anonimizadas)
    match /user_uploads/{userId}/{fileName} {
      allow read, write: if request.auth != null
                         && request.auth.uid == userId
                         && request.resource.size < 5 * 1024 * 1024;  // 5MB
    }

    // Por defecto: denegar
    match /{allPaths=**} {
      allow read, write: if false;
    }
  }
}

2.4. Twilio Integration¶

Twilio Configuration:

  service: SMS (Voice en v2)

  usage:
    - MFA verification codes
    - Emergency alerts (Pro/Perfect)
    - Caregiver notifications

  phone_numbers:
    quantity: 1 (inicio) -> escala segun region
    type: local_number
    capabilities: [SMS]

  rate_limiting:
    per_user: 10/day
    per_number: 100/day
    cooldown: 60s between messages

  message_templates:
    mfa_code: |
      MedTime: Tu codigo de verificacion es {code}.
      Valido por 5 minutos. No compartas este codigo.

    emergency_alert: |
      MedTime: ALERTA - {patient_name} ha activado alerta de emergencia.
      Ubicacion: {location_link}

    caregiver_reminder: |
      MedTime: {patient_name} no ha confirmado toma de medicamento.
      Programado: {scheduled_time}

  # IMPORTANTE: NO incluir nombres de medicamentos en SMS
  phi_handling: sanitized

  monitoring:
    delivery_status: tracked
    error_codes: logged
    retry_policy: 3_attempts

2.5. Network Topology¶

NETWORK ARCHITECTURE:
+------------------------------------------------------------------+

INTERNET
    |
    v
┌────────────────────────────────────────────────────────────────┐
│  CDN / WAF (Cloudflare)                                        │
│  - DDoS protection                                              │
│  - Rate limiting                                                │
│  - TLS termination                                              │
└────────────────────────────────────────────────────────────────┘
    |
    v
┌────────────────────────────────────────────────────────────────┐
│  Load Balancer (Supabase Managed)                              │
│  - Health checks                                                │
│  - SSL/TLS 1.3                                                  │
└────────────────────────────────────────────────────────────────┘
    |
    +--- Edge Functions (Supabase) ---+--- Firebase Auth
    |                                  |
    v                                  v
┌─────────────────────┐       ┌──────────────────┐
│  PostgreSQL         │       │  Firebase        │
│  (Private subnet)   │       │  (Google Cloud)  │
│  - RLS enabled      │       │  - FCM           │
│  - Backups          │       │  - Crashlytics   │
└─────────────────────┘       └──────────────────┘
    |
    v
┌─────────────────────┐
│  Supabase Storage   │
│  (Encrypted blobs)  │
└─────────────────────┘

SECURITY LAYERS:
- Layer 7: WAF (SQL injection, XSS, DDoS)
- Layer 6: TLS 1.3 (encryption in transit)
- Layer 5: JWT Auth (Firebase tokens)
- Layer 4: RLS (Row Level Security)
- Layer 3: Encryption at rest (AES-256)
- Layer 2: E2E encryption (client-side)
- Layer 1: Keychain/Keystore (device)

3. Environments¶

3.1. Development¶

Development Environment:

  purpose: Desarrollo local y feature branches

  supabase:
    project: medtime-dev
    database:
      instance: Micro (free tier)
      data: Fixtures de prueba
    url: https://dev.medtime.supabase.co

  firebase:
    project: medtime-dev
    config: firebase-dev-config.json

  access:
    developers: all
    testers: read-only

  data:
    # NUNCA datos de produccion
    source: fixtures + factories
    phi: synthetic_only

  deployment:
    trigger: manual
    approval: none

3.2. Staging¶

Staging Environment:

  purpose: QA y testing pre-produccion

  supabase:
    project: medtime-staging
    database:
      instance: Small
      data: Replicas anonimizadas de produccion
    url: https://staging.medtime.supabase.co

  firebase:
    project: medtime-staging
    config: firebase-staging-config.json

  access:
    developers: read-write
    testers: read-write
    stakeholders: read-only

  data:
    # Datos anonimizados de produccion
    source: prod_anonymized + synthetic
    refresh: weekly

  deployment:
    trigger: push to develop branch
    approval: automatic

  testing:
    - E2E tests
    - Load tests
    - Security scans

3.3. Production¶

Production Environment:

  purpose: Usuarios reales

  supabase:
    project: medtime-prod
    database:
      instance: Medium (inicio) -> auto-scale
      data: Real user data (encrypted)
    url: https://api.medtime.app
    custom_domain: api.medtime.app

  firebase:
    project: medtime-prod
    config: firebase-prod-config.json

  access:
    developers: read-only (with audit)
    on_call: read-write (emergency only)
    admins: full (with approval)

  data:
    source: user_generated
    phi: encrypted_e2e
    backups:
      frequency: hourly
      retention: 30_days

  deployment:
    trigger: git tag (vX.Y.Z)
    approval: manual (tech lead + security)
    strategy: blue_green
    rollback: automatic on errors

  monitoring:
    uptime_target: 99.9%
    alerts: pagerduty
    zero_knowledge: enforced

3.4. Feature Branches¶

Feature Branch Environments:

  # Environments efimeros para PRs grandes

  creation:
    trigger: PR con label "needs-preview"
    naming: feat-PR-{number}

  resources:
    supabase: Micro instance
    firebase: Shared dev project
    lifetime: hasta merge o cierre de PR

  cleanup:
    trigger: PR merged o closed
    retention: 7_days despues de cierre

3.5. Environment Promotion¶

FLUJO DE PROMOCION:
+------------------------------------------------------------------+

feature/* ─┐
           ├──> develop ──> staging ──> main ──> production
feature/* ─┘       │           │          │
                   │           │          │
              Auto Deploy  Auto Deploy  Manual Deploy
                           + E2E Tests   + Approval

GATES POR AMBIENTE:

develop -> staging:
  - Unit tests pass
  - Lint pass
  - Security scan (SAST)

staging -> main:
  - Integration tests pass
  - E2E tests pass
  - Load tests pass
  - Security scan (DAST)
  - Manual QA approval

main -> production:
  - All previous gates
  - Tech lead approval
  - Security team approval (para releases con cambios de seguridad)
  - Changelog generado

4. CI/CD Pipelines¶

4.1. Mobile CI/CD (GitHub Actions + Fastlane)¶

iOS Pipeline:

# .github/workflows/ios.yml
name: iOS CI/CD

on:
  push:
    branches: [main, develop, 'TechSpec-*']
  pull_request:
    branches: [main, develop]
  release:
    types: [created]

env:
  FASTLANE_XCODEBUILD_SETTINGS_TIMEOUT: 120

jobs:
  test:
    name: Unit Tests
    runs-on: macos-14
    steps:
      - uses: actions/checkout@v4

      - name: Setup Ruby
        uses: ruby/setup-ruby@v1
        with:
          ruby-version: '3.2'
          bundler-cache: true

      - name: Install dependencies
        run: |
          cd ios
          bundle install
          pod install

      - name: Run tests
        run: |
          cd ios
          bundle exec fastlane test

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ios/coverage/lcov.info
          flags: ios

  lint:
    name: Lint & Format
    runs-on: macos-14
    steps:
      - uses: actions/checkout@v4

      - name: SwiftLint
        run: |
          cd ios
          swiftlint lint --strict

      - name: SwiftFormat
        run: |
          cd ios
          swiftformat --lint .

  security:
    name: Security Scan
    runs-on: macos-14
    steps:
      - uses: actions/checkout@v4

      - name: Snyk Security Scan
        uses: snyk/actions/cocoapods@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

      - name: OWASP Dependency Check
        run: |
          cd ios
          bundle exec fastlane security_scan

  build_testflight:
    name: Build & Deploy to TestFlight
    runs-on: macos-14
    needs: [test, lint, security]
    if: github.ref == 'refs/heads/develop'
    steps:
      - uses: actions/checkout@v4

      - name: Setup certificates
        env:
          MATCH_PASSWORD: ${{ secrets.MATCH_PASSWORD }}
          FASTLANE_USER: ${{ secrets.FASTLANE_USER }}
        run: |
          cd ios
          bundle exec fastlane match development --readonly
          bundle exec fastlane match appstore --readonly

      - name: Build and upload to TestFlight
        env:
          FASTLANE_USER: ${{ secrets.FASTLANE_USER }}
          FASTLANE_PASSWORD: ${{ secrets.FASTLANE_PASSWORD }}
          FASTLANE_APPLE_APPLICATION_SPECIFIC_PASSWORD: ${{ secrets.FASTLANE_APP_PASSWORD }}
        run: |
          cd ios
          bundle exec fastlane beta

      - name: Upload build artifacts
        uses: actions/upload-artifact@v3
        with:
          name: ios-build
          path: ios/build/

  release_appstore:
    name: Release to App Store
    runs-on: macos-14
    needs: [test, lint, security]
    if: github.event_name == 'release'
    steps:
      - uses: actions/checkout@v4

      - name: Setup certificates
        env:
          MATCH_PASSWORD: ${{ secrets.MATCH_PASSWORD }}
        run: |
          cd ios
          bundle exec fastlane match appstore --readonly

      - name: Build and upload to App Store
        env:
          FASTLANE_USER: ${{ secrets.FASTLANE_USER }}
          FASTLANE_PASSWORD: ${{ secrets.FASTLANE_PASSWORD }}
        run: |
          cd ios
          bundle exec fastlane release

      - name: Create GitHub Release
        uses: softprops/action-gh-release@v1
        with:
          files: |
            ios/build/*.ipa
            ios/changelog.md

iOS Fastfile:

# ios/fastlane/Fastfile
default_platform(:ios)

platform :ios do

  desc "Run unit tests"
  lane :test do
    scan(
      scheme: "MedTime",
      devices: ["iPhone 15"],
      code_coverage: true,
      output_directory: "coverage"
    )
  end

  desc "Run security scan"
  lane :security_scan do
    # Snyk scan
    sh("snyk test --all-projects")

    # Check for hardcoded secrets
    sh("git secrets --scan")
  end

  desc "Build for TestFlight"
  lane :beta do
    # Increment build number
    increment_build_number(
      build_number: latest_testflight_build_number + 1
    )

    # Sync certificates
    match(
      type: "appstore",
      readonly: true
    )

    # Build
    build_app(
      scheme: "MedTime",
      export_method: "app-store",
      output_directory: "build"
    )

    # Upload to TestFlight
    upload_to_testflight(
      skip_waiting_for_build_processing: true,
      distribute_external: false  # Solo internal testers
    )

    # Notify team
    slack(
      message: "iOS build uploaded to TestFlight!",
      channel: "#mobile-releases"
    )
  end

  desc "Release to App Store"
  lane :release do
    # Sync certificates
    match(
      type: "appstore",
      readonly: true
    )

    # Build
    build_app(
      scheme: "MedTime",
      export_method: "app-store"
    )

    # Upload to App Store
    upload_to_app_store(
      submit_for_review: false,  # Manual submission
      automatic_release: false,   # Phased release
      phased_release: true,
      submission_information: {
        export_compliance_uses_encryption: true,
        export_compliance_is_exempt: false,
        export_compliance_encryption_updated: false
      }
    )

    # Notify
    slack(
      message: "iOS app submitted to App Store!",
      channel: "#releases"
    )
  end

end

Android Pipeline:

# .github/workflows/android.yml
name: Android CI/CD

on:
  push:
    branches: [main, develop, 'TechSpec-*']
  pull_request:
    branches: [main, develop]
  release:
    types: [created]

jobs:
  test:
    name: Unit Tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup JDK
        uses: actions/setup-java@v3
        with:
          java-version: '17'
          distribution: 'temurin'

      - name: Setup Android SDK
        uses: android-actions/setup-android@v2

      - name: Cache Gradle
        uses: actions/cache@v3
        with:
          path: |
            ~/.gradle/caches
            ~/.gradle/wrapper
          key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*') }}

      - name: Run tests
        run: ./gradlew testDebugUnitTest

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: android/app/build/reports/jacoco/testDebugUnitTestCoverage/testDebugUnitTestCoverage.xml
          flags: android

  lint:
    name: Lint & Format
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup JDK
        uses: actions/setup-java@v3
        with:
          java-version: '17'
          distribution: 'temurin'

      - name: Run ktlint
        run: ./gradlew ktlintCheck

      - name: Run Android Lint
        run: ./gradlew lintDebug

  security:
    name: Security Scan
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Snyk Security Scan
        uses: snyk/actions/gradle@master
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

      - name: MobSF Scan
        run: |
          # MobSF scan for APK
          ./gradlew assembleDebug
          # Upload to MobSF API
          curl -F "file=@app/build/outputs/apk/debug/app-debug.apk" \
               -H "Authorization: ${{ secrets.MOBSF_API_KEY }}" \
               https://mobsf-api.example.com/scan

  build_beta:
    name: Build & Deploy to Firebase App Distribution
    runs-on: ubuntu-latest
    needs: [test, lint, security]
    if: github.ref == 'refs/heads/develop'
    steps:
      - uses: actions/checkout@v4

      - name: Setup JDK
        uses: actions/setup-java@v3
        with:
          java-version: '17'
          distribution: 'temurin'

      - name: Decode keystore
        run: |
          echo "${{ secrets.ANDROID_KEYSTORE_BASE64 }}" | base64 -d > keystore.jks

      - name: Build APK
        env:
          KEYSTORE_PASSWORD: ${{ secrets.KEYSTORE_PASSWORD }}
          KEY_ALIAS: ${{ secrets.KEY_ALIAS }}
          KEY_PASSWORD: ${{ secrets.KEY_PASSWORD }}
        run: ./gradlew assembleRelease

      - name: Upload to Firebase App Distribution
        uses: wzieba/Firebase-Distribution-Github-Action@v1
        with:
          appId: ${{ secrets.FIREBASE_APP_ID_ANDROID }}
          token: ${{ secrets.FIREBASE_TOKEN }}
          groups: internal-testers
          file: app/build/outputs/apk/release/app-release.apk
          releaseNotes: |
            Build from commit: ${{ github.sha }}
            Changes: See GitHub

  release_playstore:
    name: Release to Play Store
    runs-on: ubuntu-latest
    needs: [test, lint, security]
    if: github.event_name == 'release'
    steps:
      - uses: actions/checkout@v4

      - name: Setup JDK
        uses: actions/setup-java@v3
        with:
          java-version: '17'
          distribution: 'temurin'

      - name: Decode keystore
        run: |
          echo "${{ secrets.ANDROID_KEYSTORE_BASE64 }}" | base64 -d > keystore.jks

      - name: Build AAB
        env:
          KEYSTORE_PASSWORD: ${{ secrets.KEYSTORE_PASSWORD }}
          KEY_ALIAS: ${{ secrets.KEY_ALIAS }}
          KEY_PASSWORD: ${{ secrets.KEY_PASSWORD }}
        run: ./gradlew bundleRelease

      - name: Upload to Play Store
        uses: r0adkll/upload-google-play@v1
        with:
          serviceAccountJsonPlainText: ${{ secrets.PLAY_SERVICE_ACCOUNT }}
          packageName: com.medtime.app
          releaseFiles: app/build/outputs/bundle/release/app-release.aab
          track: internal  # internal -> alpha -> beta -> production
          status: completed

4.2. Backend CI/CD¶

# .github/workflows/backend.yml
name: Backend CI/CD

on:
  push:
    branches: [main, develop]
    paths:
      - 'backend/**'
      - 'supabase/**'
  pull_request:
    branches: [main, develop]
    paths:
      - 'backend/**'
      - 'supabase/**'

jobs:
  test:
    name: Test Supabase Functions
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Deno
        uses: denoland/setup-deno@v1
        with:
          deno-version: v1.x

      - name: Run tests
        run: |
          cd supabase/functions
          deno test --allow-all

      - name: Lint
        run: |
          cd supabase/functions
          deno lint

  db_test:
    name: Test Database
    runs-on: ubuntu-latest
    services:
      postgres:
        image: supabase/postgres:15
        env:
          POSTGRES_PASSWORD: postgres
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 5432:5432
    steps:
      - uses: actions/checkout@v4

      - name: Run migrations
        run: |
          psql -h localhost -U postgres -f supabase/migrations/*.sql
        env:
          PGPASSWORD: postgres

      - name: Run pgTAP tests
        run: |
          psql -h localhost -U postgres -f supabase/tests/*.sql
        env:
          PGPASSWORD: postgres

  deploy_staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: [test, db_test]
    if: github.ref == 'refs/heads/develop'
    steps:
      - uses: actions/checkout@v4

      - name: Setup Supabase CLI
        uses: supabase/setup-cli@v1

      - name: Link Supabase project
        run: supabase link --project-ref ${{ secrets.SUPABASE_STAGING_PROJECT_ID }}
        env:
          SUPABASE_ACCESS_TOKEN: ${{ secrets.SUPABASE_ACCESS_TOKEN }}

      - name: Deploy database migrations
        run: supabase db push

      - name: Deploy Edge Functions
        run: |
          cd supabase/functions
          for func in */; do
            supabase functions deploy ${func%/}
          done

  deploy_production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: [test, db_test]
    if: github.ref == 'refs/heads/main'
    environment:
      name: production
      url: https://api.medtime.app
    steps:
      - uses: actions/checkout@v4

      - name: Setup Supabase CLI
        uses: supabase/setup-cli@v1

      - name: Link Supabase project
        run: supabase link --project-ref ${{ secrets.SUPABASE_PROD_PROJECT_ID }}
        env:
          SUPABASE_ACCESS_TOKEN: ${{ secrets.SUPABASE_ACCESS_TOKEN }}

      - name: Deploy database migrations
        run: supabase db push

      - name: Deploy Edge Functions
        run: |
          cd supabase/functions
          for func in */; do
            supabase functions deploy ${func%/}
          done

      - name: Verify deployment
        run: |
          # Health check
          curl -f https://api.medtime.app/health || exit 1

      - name: Notify team
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "Backend deployed to production!",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "Backend deployed to production\nCommit: ${{ github.sha }}"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

4.3. Database Migrations¶

# Estrategia de migraciones
Database Migration Strategy:

  tool: Supabase Migrations (SQL)

  naming: YYYYMMDDHHMMSS_description.sql

  workflow:
    1. Crear migracion local:
       supabase migration new add_table_xyz

    2. Escribir SQL con rollback:
       - UP migration (cambios)
       - DOWN migration (revert)

    3. Test local:
       supabase db reset
       supabase db test

    4. PR con migracion

    5. Deploy a staging (automatico)

    6. Verificar staging

    7. Deploy a produccion (manual approval)

  rollback:
    strategy: manual
    procedure:
      1. Identificar migracion problematica
      2. Crear migracion de rollback
      3. Deploy urgente
      4. Post-mortem

Ejemplo de migracion:

-- supabase/migrations/20251208120000_add_user_preferences.sql

-- UP Migration
CREATE TABLE IF NOT EXISTS srv_user_preferences (
    user_id UUID PRIMARY KEY REFERENCES srv_users(user_id) ON DELETE CASCADE,
    language VARCHAR(10) NOT NULL DEFAULT 'es',
    timezone VARCHAR(50) NOT NULL DEFAULT 'America/Mexico_City',
    notification_enabled BOOLEAN NOT NULL DEFAULT true,
    theme VARCHAR(20) NOT NULL DEFAULT 'system',
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- RLS
ALTER TABLE srv_user_preferences ENABLE ROW LEVEL SECURITY;

CREATE POLICY user_preferences_select ON srv_user_preferences
    FOR SELECT
    USING (user_id = auth.uid());

CREATE POLICY user_preferences_update ON srv_user_preferences
    FOR UPDATE
    USING (user_id = auth.uid())
    WITH CHECK (user_id = auth.uid());

-- Indices
CREATE INDEX idx_user_preferences_user_id ON srv_user_preferences(user_id);

-- Trigger para updated_at
CREATE TRIGGER update_srv_user_preferences_updated_at
    BEFORE UPDATE ON srv_user_preferences
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_at_column();

-- DOWN Migration (en archivo separado o comentado)
-- DROP TABLE IF EXISTS srv_user_preferences;

4.4. Release Management¶

Release Strategy:

  versioning: Semantic Versioning (SemVer)

  format: vMAJOR.MINOR.PATCH

  examples:
    - v1.0.0 - Initial release
    - v1.1.0 - New features (backward compatible)
    - v1.1.1 - Bug fixes
    - v2.0.0 - Breaking changes

  branches:
    main: Production releases
    develop: Development (next release)
    hotfix/*: Emergency fixes

  release_process:
    1. Create release branch from develop
       git checkout -b release/v1.2.0 develop

    2. Bump version numbers
       - iOS: Info.plist
       - Android: build.gradle
       - Backend: package.json

    3. Run full test suite

    4. Deploy to staging

    5. QA approval

    6. Merge to main

    7. Tag release
       git tag -a v1.2.0 -m "Release v1.2.0"
       git push origin v1.2.0

    8. GitHub Actions deploys to production

    9. Monitor for 24 hours

    10. Announce release

  rollout_strategy:
    ios:
      - TestFlight (internal): 100%
      - TestFlight (external): 100%
      - App Store (phased):
          - Day 1: 1%
          - Day 2: 2%
          - Day 3: 5%
          - Day 4: 10%
          - Day 5: 20%
          - Day 6: 50%
          - Day 7: 100%

    android:
      - Internal testing: 100%
      - Alpha: 100%
      - Beta (staged):
          - Day 1: 1%
          - Day 2: 5%
          - Day 3: 10%
          - Day 4: 20%
          - Day 5: 50%
          - Day 7: 100%

    backend:
      - Blue-green deployment
      - Canary release (10% -> 50% -> 100%)
      - Automatic rollback on error rate > 1%

5. Monitoring y Alerting¶

5.1. Zero-Knowledge Monitoring¶

PRINCIPIO ZERO-KNOWLEDGE MONITORING:
+------------------------------------------------------------------+
|  El sistema de monitoreo NUNCA tiene acceso a PHI               |
|                                                                   |
|  PERMITIDO:                                                       |
|  - Metadata operativa (counts, timings, sizes)                   |
|  - Error codes (sin mensajes con PHI)                            |
|  - Performance metrics (latency, throughput)                     |
|  - Availability (uptime, downtime)                               |
|                                                                   |
|  PROHIBIDO:                                                       |
|  - Contenido de blobs cifrados                                   |
|  - Nombres de medicamentos                                       |
|  - Datos de usuario                                              |
|  - Request/response bodies con PHI                               |
|  - Stack traces con PHI                                          |
+------------------------------------------------------------------+

5.2. Metricas Permitidas¶

Metricas PERMITIDAS (Sin PHI):

  api_metrics:
    # Contadores
    - http_requests_total
      labels: [method, endpoint, status_code, tier]

    - http_errors_total
      labels: [endpoint, error_code, tier]

    # Latencia
    - http_request_duration_seconds
      labels: [endpoint, method]
      percentiles: [50, 95, 99]

    # Throughput
    - http_requests_per_second
      labels: [endpoint]

  database_metrics:
    # Conexiones
    - db_connections_active
    - db_connections_idle
    - db_connections_waiting

    # Queries
    - db_query_duration_seconds
      percentiles: [50, 95, 99]

    - db_queries_per_second

    # Storage
    - db_size_bytes
    - db_table_size_bytes
      labels: [table_name]

  sync_metrics:
    # Sincronizacion (sin contenido)
    - sync_operations_total
      labels: [operation_type, status]

    - sync_blob_size_bytes
      labels: [entity_type]

    - sync_duration_seconds
      labels: [direction]  # push/pull

  auth_metrics:
    # Autenticacion
    - auth_login_attempts_total
      labels: [status, tier]

    - auth_mfa_challenges_total
      labels: [method, status]

    - auth_token_refreshes_total

  mobile_metrics:
    # Crashes (sin stack traces con PHI)
    - app_crashes_total
      labels: [platform, version, error_type]

    # Performance
    - app_launch_time_seconds
      labels: [platform, launch_type]

    - screen_load_time_seconds
      labels: [screen_name]

5.3. Metricas Prohibidas¶

Metricas PROHIBIDAS (Contienen PHI):

  NUNCA_LOGUEAR:
    # Contenido
    - medication_names
    - user_names
    - dose_times
    - health_conditions

    # Detalles de requests
    - request_body
    - response_body
    - query_parameters (pueden tener nombres)

    # Stack traces
    - error_messages_with_user_data
    - exception_details_with_phi

    # Patterns
    - medication_adherence_patterns
    - user_behavior_details

5.4. Alertas de Infraestructura¶

Alertas de Infraestructura:

  critical:
    # P1 - Respuesta inmediata

    api_error_rate_high:
      condition: error_rate > 5% for 5m
      severity: critical
      notification: pagerduty
      action: auto_rollback if deployment in last 1h

    database_down:
      condition: db_connections_active == 0 for 1m
      severity: critical
      notification: pagerduty + sms
      action: failover to backup

    disk_space_critical:
      condition: disk_usage > 90%
      severity: critical
      notification: pagerduty
      action: auto_cleanup + scale

    auth_service_down:
      condition: auth_success_rate < 50% for 5m
      severity: critical
      notification: pagerduty
      action: investigate firebase status

  high:
    # P2 - Respuesta rapida (< 1 hora)

    slow_response_time:
      condition: p99_latency > 5s for 10m
      severity: high
      notification: slack
      action: investigate slow queries

    high_sync_failures:
      condition: sync_failure_rate > 10% for 15m
      severity: high
      notification: slack
      action: check network + backend health

    memory_usage_high:
      condition: memory_usage > 80% for 10m
      severity: high
      notification: slack
      action: investigate memory leaks

  medium:
    # P3 - Revision diaria

    increased_crash_rate:
      condition: crash_rate > 1% for 1h
      severity: medium
      notification: email
      action: review crashlytics

    backup_failed:
      condition: last_backup_age > 25h
      severity: medium
      notification: email
      action: retry backup

    ssl_cert_expiring:
      condition: ssl_cert_days_remaining < 30
      severity: medium
      notification: email
      action: renew certificate

5.5. Dashboards¶

Dashboard Principal:

Main Dashboard:

  panels:

    overview:
      - uptime (%)
      - active_users (count, sin identificacion)
      - requests_per_minute
      - error_rate (%)

    api_health:
      - request_latency (p50, p95, p99)
      - error_rate_by_endpoint
      - requests_by_status_code
      - top_slow_endpoints

    database:
      - connection_pool_usage
      - query_performance
      - database_size
      - slow_queries_count

    sync:
      - sync_operations_per_minute
      - sync_success_rate
      - avg_blob_size
      - pending_operations

    mobile:
      - crash_free_rate (iOS/Android)
      - app_launches_per_day
      - avg_session_duration
      - version_distribution

  refresh_rate: 30s

  retention: 30_days

# (DV2-P3) OPS-BAJO-001: Mobile dashboard refresh rate especificado
Mobile Dashboards:

  crashlytics_dashboard:
    platform: Firebase Crashlytics
    metrics:
      - crash_free_users (%)
      - crash_free_sessions (%)
      - crash_trends
      - top_crashes
      - affected_versions
    refresh_rate: 5_minutes
    retention: 90_days

  performance_dashboard:
    platform: Firebase Performance
    metrics:
      - app_launch_time
      - screen_rendering_time
      - network_request_latency
      - custom_traces
    refresh_rate: 5_minutes
    retention: 30_days

  analytics_dashboard:
    platform: Firebase Analytics
    metrics:
      - active_users
      - session_duration
      - screen_views
      - user_engagement
    refresh_rate: 5_minutes
    retention: 60_days

  justification: |
    5 minute refresh rate balances:
    - Real-time visibility for incident response
    - Firebase API rate limits
    - Cost optimization (fewer API calls)
    - Mobile battery impact (if viewing on mobile)

6. Disaster Recovery¶

6.1. RTO y RPO¶

Recovery Objectives:

  definitions:
    RTO: Recovery Time Objective (tiempo max de downtime)
    RPO: Recovery Point Objective (perdida max de datos)

  targets:

    database:
      rto: 4_hours
      rpo: 1_hour
      justification: |
        - Blobs estan cifrados (baja criticidad de confidencialidad)
        - Usuarios pueden operar offline
        - 1 hora de perdida es aceptable

    authentication:
      rto: 1_hour
      rpo: 0  # Firebase managed, replicated
      justification: |
        - Firebase Auth tiene HA nativa
        - Critical para nuevos logins
        - Usuarios existentes tienen tokens validos

    push_notifications:
      rto: 2_hours
      rpo: N/A  # Stateless
      justification: |
        - Las alertas locales son primarias
        - Push es backup
        - No hay datos en riesgo

    edge_functions:
      rto: 2_hours
      rpo: 0  # Stateless, IaC
      justification: |
        - Redeploy desde Git
        - Sin estado persistente

6.2. Backup Strategy¶

Backup Strategy:

  database:

    automated_backups:
      frequency: hourly
      retention: 30_days
      type: full_backup
      encryption: AES-256
      storage: Supabase managed + S3 copy

    point_in_time_recovery:
      enabled: true
      window: 7_days
      granularity: 1_second

    manual_snapshots:
      before: major_migrations
      retention: 90_days
      labeled: true

  storage_blobs:

    replication:
      strategy: multi_region
      regions: [us-east-1, us-west-2]
      sync: continuous

    versioning:
      enabled: true
      retention: 30_days

    lifecycle:
      transition_to_infrequent_access: 90_days
      transition_to_glacier: 1_year
      delete: never  # Compliance requirement

  configurations:

    infrastructure_as_code:
      source: Git repository
      backup: GitHub + GitLab mirror
      retention: indefinite

    secrets:
      source: AWS Secrets Manager
      backup: encrypted_export
      frequency: weekly
      retention: 90_days

  testing:
    # Backup restore tests
    frequency: quarterly
    scope: full_restore_to_staging
    success_criteria:
      - restore_time < RTO
      - data_integrity_100%
      - application_functional

6.3. Failover Procedures¶

Failover Procedures:

  database_failover:

    trigger:
      - primary_instance_down for 5m
      - manual_trigger by on_call

    automated_steps:
      1. Health check confirms primary down
      2. Promote read replica to primary
      3. Update DNS to point to new primary
      4. Restart application connections
      5. Verify new primary is healthy

    manual_steps:
      1. Notify team (automatic)
      2. Monitor application health
      3. Investigate root cause
      4. Document incident

    estimated_time: 15_minutes

  region_failover:

    trigger:
      - region_unavailable
      - disaster_in_primary_region

    steps:
      1. Confirm region is down
      2. Update DNS to secondary region
      3. Promote secondary database
      4. Verify application health
      5. Notify users of potential data loss (if RPO > 0)

    estimated_time: 1_hour

    data_loss: Up to RPO (1 hour)

  rollback_deployment:

    trigger:
      - error_rate > 5% after deployment
      - critical_bug_discovered
      - manual_trigger

    automated_steps:
      1. Detect error rate spike
      2. Revert to previous version
      3. Restart application
      4. Verify health

    manual_steps:
      1. Investigate cause
      2. Fix and redeploy
      3. Post-mortem

    estimated_time: 5_minutes

6.4. Data Recovery¶

Data Recovery Procedures:

  accidental_deletion:

    user_data:
      source: database_backup
      steps:
        1. Identify backup with data
        2. Restore to staging
        3. Extract specific records
        4. Verify data integrity
        5. Import to production
      max_time: 2_hours

    table_dropped:
      source: point_in_time_recovery
      steps:
        1. Identify timestamp before drop
        2. Restore to staging (PITR)
        3. Export table
        4. Import to production
        5. Verify via count/checksums
      max_time: 4_hours

  data_corruption:

    detection:
      - checksum_mismatch
      - user_reports
      - automated_integrity_checks

    response:
      1. Isolate affected data
      2. Identify corruption scope
      3. Find last good backup
      4. Restore from backup
      5. Replay transactions if possible
      6. Notify affected users

  compliance_recovery:

    audit_log_recovery:
      retention: 6_years (HIPAA)
      storage: glacier
      encryption: yes

    phi_recovery:
      # Todos los datos PHI estan cifrados E2E
      # Servidor no puede descifrar
      # Usuario debe restaurar desde backup local
      server_role: provide_encrypted_blobs
      user_role: decrypt_with_own_key

6.5. DR Testing Trimestral (DV2-P2)¶

Agregado: OPS-MEDIO-002 - Especificacion detallada de DR testing trimestral.

6.5.1. Calendario de DR Tests¶

DR Testing Schedule:
  version: "1.0"

  quarterly_tests:
    Q1: # Enero-Marzo
      month: February
      week: 2
      focus:
        - Database full restore
        - RTO validation
      environment: staging

    Q2: # Abril-Junio
      month: May
      week: 2
      focus:
        - Region failover simulation
        - RPO validation
      environment: staging

    Q3: # Julio-Septiembre
      month: August
      week: 2
      focus:
        - Complete disaster simulation
        - Full stack recovery
      environment: staging + limited prod

    Q4: # Octubre-Diciembre
      month: November
      week: 2
      focus:
        - Tabletop exercise with team
        - Documentation review
        - Annual DR plan update
      environment: documentation + staging

6.5.2. Escenarios de Test¶

DR Test Scenarios:

  scenario_1_database_restore:
    name: "Full Database Restore"
    frequency: quarterly
    duration: 4_hours
    participants:
      - DevOps Lead
      - Database Admin
      - Backend Developer

    preconditions:
      - Staging environment available
      - Latest backup identified
      - Runbook accessible

    steps:
      1:
        action: "Identify latest backup"
        command: "supabase db backups list"
        expected: "Backup list with timestamps"
        max_time: 5_minutes

      2:
        action: "Create test instance"
        command: "supabase db create --name dr-test-YYYYMMDD"
        expected: "Empty database created"
        max_time: 10_minutes

      3:
        action: "Restore backup to test instance"
        command: "supabase db restore --backup-id XXX --target dr-test"
        expected: "Restore completed successfully"
        max_time: 60_minutes

      4:
        action: "Verify data integrity"
        command: "psql -f scripts/dr-verify-integrity.sql"
        expected: "All checksums match"
        max_time: 15_minutes

      5:
        action: "Test application connectivity"
        command: "curl https://dr-test-api/health"
        expected: "HTTP 200 with healthy status"
        max_time: 10_minutes

      6:
        action: "Run smoke tests"
        command: "npm run test:smoke --env=dr-test"
        expected: "All smoke tests pass"
        max_time: 30_minutes

    success_criteria:
      - Total restore time < RTO (4 hours)
      - Data integrity 100%
      - Application functional
      - No data loss (matches backup timestamp)

    cleanup:
      - Delete test database instance
      - Remove temporary credentials
      - Document results

  scenario_2_failover:
    name: "Database Failover Simulation"
    frequency: quarterly
    duration: 2_hours
    participants:
      - DevOps Lead
      - On-Call Engineer

    steps:
      1:
        action: "Confirm replica sync status"
        command: "supabase db replicas status"
        expected: "Replica in sync, lag < 1s"

      2:
        action: "Simulate primary failure"
        command: "supabase db pause --instance primary --test-mode"
        expected: "Primary paused"

      3:
        action: "Trigger automatic failover"
        command: "Monitor: automatic or manual promote"
        expected: "Replica promoted to primary"
        max_time: 15_minutes

      4:
        action: "Verify application connectivity"
        expected: "Application reconnected to new primary"
        max_time: 5_minutes

      5:
        action: "Restore original primary"
        command: "supabase db resume --instance original-primary"
        expected: "Original primary as new replica"

    success_criteria:
      - Failover time < 15 minutes
      - No data loss
      - Application recovery automatic

  scenario_3_region_failover:
    name: "Cross-Region Disaster Recovery"
    frequency: annually (Q3)
    duration: 8_hours
    participants:
      - DevOps Lead
      - Tech Lead
      - Security Lead

    steps:
      1:
        action: "Pre-test: verify DR region ready"
        expected: "DR infrastructure operational"

      2:
        action: "Simulate primary region outage"
        command: "Block all traffic to primary region"
        expected: "Primary region unreachable"

      3:
        action: "Activate DR runbook"
        expected: "Follow documented procedures"

      4:
        action: "Promote DR database"
        command: "supabase db promote --region dr-region"
        expected: "DR database now primary"

      5:
        action: "Update DNS to DR region"
        command: "Update Route53/Cloudflare records"
        expected: "Traffic routed to DR region"

      6:
        action: "Verify full application stack"
        expected: "All services operational in DR region"

      7:
        action: "Measure data loss"
        command: "Compare last DR sync with current time"
        expected: "Data loss <= RPO (1 hour)"

    success_criteria:
      - Total recovery time < RTO (4 hours)
      - Data loss <= RPO (1 hour)
      - All services functional
      - User impact documented

6.5.3. Runbooks y Documentacion¶

DR Documentation:

  runbooks:
    location: "docs/runbooks/"

    database_restore:
      file: "RB-DR-001-database-restore.md"
      last_updated: null
      last_tested: null
      owner: "DevOps Lead"

    failover:
      file: "RB-DR-002-failover.md"
      last_updated: null
      last_tested: null
      owner: "DevOps Lead"

    region_failover:
      file: "RB-DR-003-region-failover.md"
      last_updated: null
      last_tested: null
      owner: "Tech Lead"

    communication:
      file: "RB-DR-004-incident-communication.md"
      last_updated: null
      last_tested: null
      owner: "Product Manager"

  templates:
    dr_test_report: "templates/dr-test-report.md"
    post_mortem: "templates/post-mortem.md"
    incident_timeline: "templates/incident-timeline.md"

6.5.4. Metricas y Reporting¶

DR Metrics:

  kpis:
    - name: "RTO Achievement"
      target: "< 4 hours"
      measurement: "Time from disaster declaration to full recovery"

    - name: "RPO Achievement"
      target: "< 1 hour"
      measurement: "Maximum data loss in any test"

    - name: "Test Success Rate"
      target: "> 95%"
      measurement: "Successful tests / Total tests"

    - name: "Runbook Accuracy"
      target: "100%"
      measurement: "Steps executed without deviation"

    - name: "Mean Time to Recovery"
      target: "< 2 hours"
      measurement: "Average recovery time across all tests"

  reporting:
    quarterly_report:
      recipients:
        - Engineering Lead
        - Security Lead
        - CTO
      contents:
        - Test summary
        - RTO/RPO metrics
        - Issues discovered
        - Remediation actions
        - Next quarter plan

    annual_review:
      recipients:
        - Executive Team
        - Board (if required)
      contents:
        - Year summary
        - Trend analysis
        - DR plan updates
        - Budget recommendations

6.5.5. CI/CD Integration¶

# .github/workflows/dr-test-automation.yml
name: DR Test Automation

on:
  schedule:
    # Monthly backup verification (every 1st of month at 3AM UTC)
    - cron: '0 3 1 * *'
  workflow_dispatch:
    inputs:
      test_type:
        description: 'Type of DR test to run'
        required: true
        type: choice
        options:
          - backup-verify
          - restore-staging
          - failover-test
      notify_team:
        description: 'Send notifications to team'
        required: true
        type: boolean
        default: false

jobs:
  backup-verify:
    if: github.event.inputs.test_type == 'backup-verify' || github.event_name == 'schedule'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Check Backup Status
        run: |
          # Verify backup exists and is recent
          LATEST_BACKUP=$(supabase db backups list --json | jq -r '.[0].created_at')
          BACKUP_AGE_HOURS=$(( ($(date +%s) - $(date -d "$LATEST_BACKUP" +%s)) / 3600 ))

          echo "Latest backup: $LATEST_BACKUP"
          echo "Backup age: $BACKUP_AGE_HOURS hours"

          if [ "$BACKUP_AGE_HOURS" -gt 25 ]; then
            echo "::error::Backup is more than 25 hours old!"
            exit 1
          fi

      - name: Verify Backup Integrity
        run: |
          # Download and verify backup checksum
          supabase db backups verify --latest

      - name: Report Results
        if: always()
        run: |
          # Log to monitoring system
          echo "DR Backup Verification: $(date)"
          echo "Status: ${{ job.status }}"

  restore-staging:
    if: github.event.inputs.test_type == 'restore-staging'
    runs-on: ubuntu-latest
    environment: staging
    steps:
      - uses: actions/checkout@v4

      - name: Start DR Test
        run: |
          echo "::notice::Starting DR Restore Test to Staging"

      - name: Identify Backup
        id: backup
        run: |
          BACKUP_ID=$(supabase db backups list --json | jq -r '.[0].id')
          echo "backup_id=$BACKUP_ID" >> $GITHUB_OUTPUT

      - name: Restore to Staging
        run: |
          supabase db restore \
            --backup-id ${{ steps.backup.outputs.backup_id }} \
            --target staging \
            --confirm
        timeout-minutes: 120

      - name: Verify Restore
        run: |
          # Run integrity checks
          npm run test:db-integrity --env=staging

      - name: Run Smoke Tests
        run: |
          npm run test:smoke --env=staging

      - name: Calculate RTO
        run: |
          START_TIME=${{ github.run_started_at }}
          END_TIME=$(date -Iseconds)
          # Calculate and report RTO

      - name: Notify Team
        if: github.event.inputs.notify_team == 'true'
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "DR Restore Test Completed",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "*DR Restore Test Completed*\nStatus: ${{ job.status }}\nEnvironment: Staging"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_DR }}

  generate-report:
    needs: [backup-verify]
    if: always()
    runs-on: ubuntu-latest
    steps:
      - name: Generate DR Test Report
        run: |
          cat << EOF > dr-report.md
          # DR Test Report

          **Date**: $(date -I)
          **Type**: Backup Verification
          **Status**: ${{ needs.backup-verify.result }}

          ## Metrics
          - Backup Age: Within 24h threshold
          - Integrity: Verified

          ## Recommendations
          - Continue quarterly testing schedule
          EOF

      - name: Upload Report
        uses: actions/upload-artifact@v3
        with:
          name: dr-report-${{ github.run_number }}
          path: dr-report.md

6.5.6. Checklist Trimestral¶

## DR Test Checklist - Q[X] YYYY

### Pre-Test (1 week before)
- [ ] Schedule confirmed with all participants
- [ ] Staging environment available
- [ ] Runbooks reviewed and updated
- [ ] Monitoring alerts configured
- [ ] Communication plan ready

### Test Day
- [ ] Kickoff meeting completed
- [ ] Backup identified and verified
- [ ] Restore initiated
- [ ] Restore completed within RTO
- [ ] Data integrity verified
- [ ] Application smoke tests passed
- [ ] Failover tested (if applicable)
- [ ] Cleanup completed

### Post-Test (within 1 week)
- [ ] Test report generated
- [ ] Issues documented
- [ ] Runbooks updated with findings
- [ ] Metrics recorded
- [ ] Report shared with stakeholders
- [ ] Remediation items assigned
- [ ] Next quarter test scheduled

### Sign-off
- [ ] DevOps Lead: _________ Date: _____
- [ ] Security Lead: ________ Date: _____
- [ ] Tech Lead: ___________ Date: _____

7. Security Infrastructure¶

7.1. Secrets Management¶

Secrets Management:

  provider: GitHub Secrets + Supabase Vault

  categories:

    ci_cd_secrets:
      storage: GitHub Secrets
      secrets:
        - SUPABASE_ACCESS_TOKEN
        - FIREBASE_TOKEN
        - FASTLANE_PASSWORD
        - ANDROID_KEYSTORE_BASE64
        - SLACK_WEBHOOK
      rotation: manual (annually)
      access: GitHub Actions only

    runtime_secrets:
      storage: Supabase Vault / AWS Secrets Manager
      secrets:
        - DATABASE_PASSWORD
        - JWT_SECRET
        - TWILIO_AUTH_TOKEN
        - ENCRYPTION_SALT
      rotation: automated (30_days for DB, 90_days others)
      access: Edge Functions only

    mobile_secrets:
      storage: Keychain (iOS) / Keystore (Android)
      secrets:
        - USER_MASTER_KEY (derived from password)
        - DEVICE_ID
        - SESSION_TOKENS
      rotation: per_session (tokens)
      access: App only

  best_practices:
    - NO hardcoded secrets in code
    - NO secrets in logs
    - NO secrets in error messages
    - Rotate after team member departure
    - Audit access quarterly

7.2. Certificate Management¶

Certificate Management:

  ssl_certificates:

    provider: Let's Encrypt (via Supabase/Firebase)

    domains:
      - api.medtime.app (Supabase)
      - medtime.app (Marketing site)

    renewal:
      method: automatic
      window: 30_days before expiry
      notification: email if renewal fails

    monitoring:
      check_frequency: daily
      alert_threshold: 30_days remaining

  code_signing:

    ios:
      type: Apple Developer Certificate
      storage: Fastlane Match (encrypted Git repo)
      rotation: automatic (Apple managed)

      certificates:
        - Development
        - Distribution (App Store)
        - Push Notifications

      provisioning_profiles:
        storage: Fastlane Match
        auto_update: enabled

    android:
      type: Keystore
      storage: GitHub Secrets (base64)
      rotation: never (breaks updates)

      backup:
        locations:
          - GitHub Secrets
          - Secure offline storage (2 physical locations)
        verification: quarterly

7.3. Network Security¶

Network Security:

  ddos_protection:
    provider: Cloudflare (Free tier)
    features:
      - rate_limiting
      - challenge_page for suspicious traffic
      - ip_reputation filtering

    custom_rules:
      - block_countries: [CN, RU]  # No soportados aun
      - rate_limit_api: 100_req/min per IP
      - challenge_on_login: after 5 failures

  waf:
    provider: Cloudflare WAF
    rulesets:
      - OWASP_Core_Ruleset
      - SQLi_protection
      - XSS_protection
      - Path_traversal_protection

    custom_rules:
      - block /admin from non-whitelisted IPs
      - rate_limit /api/auth/* endpoints

  api_security:

    authentication:
      method: JWT (Firebase tokens)
      validation: every_request

    rate_limiting:
      global: 1000_req/min per IP
      per_user:
        free: 60_req/min
        pro: 300_req/min
        perfect: 600_req/min

    cors:
      allowed_origins:
        - https://app.medtime.app
        - capacitor://localhost  # Mobile apps
      allowed_methods: [GET, POST, PUT, DELETE, OPTIONS]
      allowed_headers: [Authorization, Content-Type]
      max_age: 86400

  network_isolation:

    database:
      public_access: disabled
      allowed_ips: [Supabase Edge Functions only]
      ssl_mode: require

    admin_access:
      method: bastion_host
      mfa: required
      source_ips: whitelisted

7.4. Compliance Scanning¶

Compliance Scanning:

  vulnerability_scanning:

    dependencies:
      tool: Snyk
      frequency: on_every_commit
      auto_fix: enabled for low/medium
      block_pr: high/critical vulnerabilities

    containers:
      tool: Trivy
      frequency: on_build
      severity_threshold: high

    code:
      sast:
        tool: SonarQube
        frequency: on_pr
        quality_gate:
          - coverage >= 80%
          - no_critical_issues
          - no_blockers

      secrets_detection:
        tool: GitGuardian
        frequency: on_commit
        action: block_push if secrets found

  compliance_checks:

    hipaa:
      checklist:
        - encryption_at_rest: enabled
        - encryption_in_transit: TLS 1.3
        - audit_logs: 6_year_retention
        - access_controls: RLS + RBAC
        - backup: tested_quarterly
      frequency: quarterly
      auditor: external (annually)

    lgpd:
      checklist:
        - data_minimization: verified
        - consent_management: implemented
        - data_portability: API available
        - right_to_deletion: implemented
        - data_breach_notification: procedure documented
      frequency: quarterly

    owasp:
      tool: OWASP ZAP
      frequency: weekly
      scope: all_api_endpoints
      report: security_team

8. Cost Management¶

8.1. Cost Breakdown¶

Estimated Monthly Costs:

  # Startup phase (0-1000 users)

  supabase:
    plan: Pro
    base: $25/month
    database: $10 (Small instance)
    storage: $5 (50GB)
    bandwidth: $5 (100GB)
    total: $45/month

  firebase:
    plan: Blaze (pay-as-you-go)
    auth: $0 (< 50k MAU)
    fcm: $0 (< 1M messages)
    crashlytics: $0
    total: $0/month

  twilio:
    sms: $0.01/msg * 500 msgs = $5
    phone_number: $1
    total: $6/month

  other:
    domain: $1/month
    sentry: $0 (free tier)
    github_actions: $0 (public repo)
    total: $1/month

  TOTAL_STARTUP: $52/month

  # Growth phase (1k-10k users)

  supabase:
    database: $40 (Medium instance)
    storage: $20 (200GB)
    bandwidth: $20 (500GB)
    total: $105/month

  firebase:
    auth: $30 (75k MAU)
    fcm: $10 (2M messages)
    total: $40/month

  twilio:
    sms: $0.01 * 5000 = $50
    total: $51/month

  TOTAL_GROWTH: $197/month

  # Scale phase (10k-100k users)

  supabase:
    database: $200 (Large instance + read replicas)
    storage: $100 (1TB)
    bandwidth: $100 (2TB)
    total: $425/month

  firebase:
    auth: $300 (150k MAU)
    fcm: $50 (10M messages)
    total: $350/month

  twilio:
    sms: $0.01 * 50000 = $500
    total: $501/month

  other:
    cloudflare_pro: $20
    sentry_team: $29
    total: $50/month

  TOTAL_SCALE: $1,326/month

8.2. Resource Optimization¶

Optimization Strategies:

  database:

    query_optimization:
      - index slow queries (p99 > 1s)
      - use connection pooling
      - implement caching for catalogs
      - archive old data (> 2 years) to cold storage

    storage_optimization:
      - compress blobs (gzip)
      - deduplicate identical blobs
      - lifecycle policies (move to glacier after 1 year)

    instance_sizing:
      strategy: right_size
      review: monthly
      metrics: cpu_usage, memory_usage, iops

  bandwidth:

    reduction:
      - enable compression (gzip, brotli)
      - implement CDN for static assets
      - optimize API responses (only required fields)
      - batch operations (sync in batches)

    monitoring:
      - track bandwidth per endpoint
      - alert on spikes
      - identify optimization opportunities

  firebase:

    fcm_optimization:
      - use topics instead of individual tokens
      - batch notifications
      - reduce payload size
      - avoid unnecessary notifications

    auth_optimization:
      - cache user sessions client-side
      - use refresh tokens wisely
      - minimize custom claims

  twilio:

    sms_optimization:
      - use push notifications as primary
      - SMS only for critical alerts
      - implement user preferences (opt-out)
      - batch messages when possible

8.3. Cost Alerts¶

Cost Alerts:

  budget_thresholds:

    forecasted:
      threshold: 80% of monthly budget
      action: notify finance team

    actual:
      threshold: 100% of monthly budget
      action: investigate + optimize

    anomaly:
      threshold: 150% of daily average
      action: immediate investigation

  service_specific:

    supabase:
      database_size: alert at 80% of plan limit
      bandwidth: alert at 80% of plan limit

    firebase:
      mau: alert at 80% of free tier
      fcm_messages: alert at 80% of free tier

    twilio:
      sms_count: alert at budget threshold
      cost_per_day: alert if > expected

  optimization_triggers:

    high_storage:
      condition: storage_cost > $100/month
      action: review archiving strategy

    high_bandwidth:
      condition: bandwidth_cost > $50/month
      action: review API efficiency

    high_sms:
      condition: sms_cost > $200/month
      action: review notification strategy

8.4. Scaling Strategy¶

Scaling Strategy:

  database:

    vertical_scaling:
      trigger: cpu > 80% sustained for 1h
      action: upgrade instance size
      automation: manual (review first)

    horizontal_scaling:
      trigger: read_load > 70%
      action: add read replica
      automation: manual
      max_replicas: 3

    sharding:
      trigger: database_size > 1TB
      strategy: by user_id (hash)
      implementation: v2 (future)

  edge_functions:

    auto_scaling:
      enabled: true (Supabase managed)
      min_instances: 1
      max_instances: 10
      scale_up_trigger: cpu > 70%
      scale_down_trigger: cpu < 30% for 5m

  mobile_apps:

    # Apps escalan naturalmente (client-side)
    considerations:
      - monitor download size (keep < 50MB)
      - optimize assets
      - lazy load features

  costs_by_scale:

    0-1k_users: $52/month
    1k-10k_users: $197/month
    10k-100k_users: $1,326/month
    100k-1M_users: $8,000/month (estimated)

    revenue_targets:
      - 1k users * $3/month (Pro) = $3,000/month
      - 10k users * $3/month = $30,000/month
      - Cost ratio target: < 20% of revenue

9. Infrastructure as Code¶

Infrastructure as Code:

  philosophy: GitOps

  tools:
    - Supabase CLI (migrations, functions)
    - GitHub Actions (CI/CD)
    - Terraform (futuro, si multi-cloud)

  repository_structure:

    supabase/:
      migrations/: Database schema changes
      functions/: Edge Functions
      seed.sql: Initial data (catalogs)
      config.toml: Supabase config

    .github/:
      workflows/: CI/CD pipelines
      actions/: Custom actions

    ios/:
      fastlane/: iOS automation
      Podfile: Dependencies

    android/:
      gradle/: Build configs
      fastlane/: Android automation

  workflow:

    1_local_development:
       - supabase start (local instance)
       - develop + test locally
       - commit changes

    2_ci_validation:
       - GitHub Actions run tests
       - Deploy to staging

    3_manual_approval:
       - Tech lead reviews
       - Approve for production

    4_production_deploy:
       - GitHub Actions deploy
       - Monitor for issues
       - Rollback if needed

  disaster_recovery:
    # Todo esta en Git
    recovery_steps:
      1. Clone repository
      2. supabase db push (restore schema)
      3. Restore data from backup
      4. supabase functions deploy (restore functions)
      5. Verify health

    estimated_recovery_time: 2_hours

Ejemplo de config Supabase:

# supabase/config.toml
[api]
enabled = true
port = 54321
schemas = ["public", "auth"]
extra_search_path = ["public", "extensions"]
max_rows = 1000

[db]
port = 54322
major_version = 15

[db.pooler]
enabled = true
port = 54329
pool_mode = "transaction"
default_pool_size = 20
max_client_conn = 100

[realtime]
enabled = false  # No usado en MedTime v1

[storage]
enabled = true
file_size_limit = "5MB"

[auth]
enabled = false  # Usamos Firebase Auth
site_url = "https://app.medtime.app"

[auth.external.google]
enabled = true
client_id = "env(GOOGLE_CLIENT_ID)"
secret = "env(GOOGLE_CLIENT_SECRET)"

[edge_functions]
enabled = true

10. Runbooks¶

10.1. Runbook Template¶

# Runbook: [Nombre del Procedimiento]

## Metadata
- **Severity**: P1 / P2 / P3 / P4
- **Component**: [Mobile App | Backend | Database]
- **Last Updated**: YYYY-MM-DD
- **Owner**: [Team/Person]

## Symptoms
- [Sintoma observable SIN PHI]
- [Metricas afectadas]

## Impact
- **Users Affected**: [Estimate]
- **Services Affected**: [List]
- **Business Impact**: [Description]

## Diagnosis Steps
1. Check dashboard: [URL]
2. Verify metrics: [Specific metrics]
3. Check error logs (NO PHI): [Location]
4. Test manually: [Steps]

## Resolution Steps
1. [Paso con comando especifico]
2. [Paso con comando especifico]
3. [Verificacion]

## IMPORTANTE: Zero-Knowledge
- NO acceder a contenido de blobs
- NO descifrar datos para diagnostico
- Logs solo de metadata
- Escalar a usuario si necesita ver sus datos

## Rollback
1. [Paso para rollback]
2. [Verificacion de rollback]

## Post-Incident
1. Create incident report
2. Update runbook if needed
3. Schedule post-mortem

## References
- Related docs: [Links]
- Escalation: [Contact info]

10.2. Common Runbooks¶

Common Runbooks:

  RB-001: High API Error Rate
    severity: P1
    trigger: error_rate > 5% for 5m
    owner: Backend Team

  RB-002: Database Connection Pool Exhausted
    severity: P1
    trigger: db_connections_waiting > 10
    owner: Backend Team

  RB-003: Slow Database Queries
    severity: P2
    trigger: p99_query_time > 5s
    owner: Backend Team

  RB-004: Failed Backup
    severity: P2
    trigger: last_backup > 25h
    owner: DevOps

  RB-005: SSL Certificate Expiring
    severity: P3
    trigger: days_until_expiry < 30
    owner: DevOps

  RB-006: High Mobile Crash Rate
    severity: P2
    trigger: crash_rate > 2%
    owner: Mobile Team

  RB-007: Deployment Rollback
    severity: P1
    trigger: Manual or automated
    owner: DevOps

  RB-008: Firebase Auth Outage
    severity: P1
    trigger: auth_failure_rate > 50%
    owner: Backend Team

  location: docs/runbooks/

11. Referencias¶

Documento	Proposito
01-arquitectura-tecnica.md	Arquitectura general
02-arquitectura-cliente-servidor.md	Arquitectura dual
05-seguridad-servidor.md	Security requirements
07-testing-strategy.md	CI/CD integration con testing
Supabase Docs	Database platform
Firebase Docs	Auth/Push platform
Fastlane Docs	Mobile CI/CD
GitHub Actions	CI/CD platform

Documento generado por DevOpsDrone (MTS-DRN-OPS-001) / SpecQueen Technical Division - IT-07 "Infraestructura minimalista. El servidor es simple, el cliente es poderoso, el monitoreo es ciego."