06 - Infraestructura de MedTime¶
Identificador: TECH-INF-001 Version: 1.0.1 Fecha: 2025-12-08 Ultima Revision: DV2-P3 - Mobile dashboard refresh rate specification Autor: DevOpsDrone (MTS-DRN-OPS-001) / SpecQueen Technical Division Refs Arquitectura: TECH-ARC-001, TECH-CS-001 Refs Seguridad: TECH-SEC-SERVER-001, TECH-SEC-CLIENT-001 Refs Testing: TECH-TST-001 Estado: Borrador
- 1. Introduccion
- 1.1. Proposito
- 1.2. Alcance
- 1.3. Principios de Infraestructura
- 2. Cloud Architecture
- 2.1. Stack Tecnologico
- 2.2. Supabase Configuration
- 2.3. Firebase Services
- 2.4. Twilio Integration
- 2.5. Network Topology
- 3. Environments
- 3.1. Development
- 3.2. Staging
- 3.3. Production
- 3.4. Feature Branches
- 3.5. Environment Promotion
- 4. CI/CD Pipelines
- 4.1. Mobile CI/CD (GitHub Actions + Fastlane)
- 4.2. Backend CI/CD
- 4.3. Database Migrations
- 4.4. Release Management
- 5. Monitoring y Alerting
- 5.1. Zero-Knowledge Monitoring
- 5.2. Metricas Permitidas
- 5.3. Metricas Prohibidas
- 5.4. Alertas de Infraestructura
- 5.5. Dashboards
- 6. Disaster Recovery
- 6.1. RTO y RPO
- 6.2. Backup Strategy
- 6.3. Failover Procedures
- 6.4. Data Recovery
- 7. Security Infrastructure
- 7.1. Secrets Management
- 7.2. Certificate Management
- 7.3. Network Security
- 7.4. Compliance Scanning
- 8. Cost Management
- 8.1. Cost Breakdown
- 8.2. Resource Optimization
- 8.3. Cost Alerts
- 8.4. Scaling Strategy
- 9. Infrastructure as Code
- 10. Runbooks
- 11. Referencias
1. Introduccion¶
1.1. Proposito¶
Este documento define la infraestructura completa para MedTime, cubriendo:
- Cloud services (Supabase, Firebase, Twilio)
- CI/CD pipelines para mobile y backend
- Monitoring y alerting Zero-Knowledge
- Disaster recovery y backups
- Environments y deployment strategy
1.2. Alcance¶
| Componente | Incluido | Proveedor |
|---|---|---|
| Database | Si | Supabase (PostgreSQL) |
| Authentication | Si | Firebase Auth |
| Push Notifications | Si | Firebase FCM |
| SMS Notifications | Si | Twilio |
| Storage (blobs) | Si | Supabase Storage |
| Mobile CI/CD | Si | GitHub Actions + Fastlane |
| Backend CI/CD | Si | GitHub Actions |
| Monitoring | Si | Supabase Dashboard + Custom |
| Error Tracking | Si | Firebase Crashlytics + Sentry |
1.3. Principios de Infraestructura¶
PRINCIPIOS FUNDAMENTALES:
+------------------------------------------------------------------+
| 1. Zero-Knowledge Monitoring |
| - El monitoreo NUNCA ve PHI en claro |
| - Solo metadata operativa |
| |
| 2. Serverless-First |
| - Minimizar infraestructura a mantener |
| - Managed services sobre custom servers |
| |
| 3. Mobile-First |
| - CI/CD optimizado para iOS/Android |
| - Backend es minimalista (5% procesamiento) |
| |
| 4. Compliance-Ready |
| - HIPAA, LGPD, FDA desde el diseño |
| - Audit logs, backups, encryption por defecto |
+------------------------------------------------------------------+
2. Cloud Architecture¶
2.1. Stack Tecnologico¶
graph TB
subgraph MOBILE["Apps Moviles (95%)"]
direction LR
IOS["iOS App<br/>Swift/SwiftUI<br/>Realm"]
ANDROID["Android App<br/>Kotlin/Compose<br/>Room"]
end
subgraph BACKEND["Backend Services (5%)"]
direction TB
subgraph SUPABASE["Supabase"]
PG["PostgreSQL<br/>(Blobs E2E)"]
AUTH_SB["Auth Helper"]
STORAGE["Storage<br/>(Files)"]
EDGE["Edge Functions<br/>(Node.js)"]
end
subgraph FIREBASE["Firebase"]
AUTH_FB["Authentication"]
FCM["Cloud Messaging"]
CRASH["Crashlytics"]
end
subgraph EXTERNAL["External Services"]
TWILIO["Twilio SMS"]
SENTRY["Sentry<br/>(Error Tracking)"]
end
end
IOS -->|HTTPS/REST| EDGE
ANDROID -->|HTTPS/REST| EDGE
IOS -->|Auth| AUTH_FB
ANDROID -->|Auth| AUTH_FB
EDGE --> PG
EDGE --> STORAGE
EDGE --> AUTH_SB
EDGE --> TWILIO
AUTH_FB --> FCM
IOS -.->|Crashes| CRASH
ANDROID -.->|Crashes| CRASH
EDGE -.->|Errors| SENTRY
style MOBILE fill:#99ff99
style BACKEND fill:#99ccff
2.2. Supabase Configuration¶
Tier: Pro (minimo requerido para HIPAA compliance)
Supabase Configuration:
project:
name: medtime-prod
region: us-east-1 # O sa-east-1 para Brasil
organization: MedTime Inc.
database:
plan: Pro
version: PostgreSQL 15
instance_size: Small (inicio) -> Medium (escala)
storage: 8GB (inicio) -> auto-scale
extensions:
- uuid-ossp # UUIDs
- pgcrypto # Funciones crypto
- pg_stat_statements # Monitoring queries
connection_pooling:
mode: transaction
pool_size: 15
timeout: 10s
backups:
enabled: true
schedule: daily
retention: 30_days
point_in_time_recovery: 7_days
auth:
# Supabase Auth es complementario a Firebase
# Solo se usa para RLS context
enabled: true
jwt_expiry: 3600
storage:
# Para blobs grandes (imagenes de recetas)
enabled: true
file_size_limit: 5MB
public_buckets: [] # Todos privados
edge_functions:
# Node.js/TypeScript para APIs
runtime: deno
timeout: 60s
memory: 256MB
security:
ssl_enforcement: required
rls_enabled: true # Obligatorio
row_level_security: enforced
Configuracion de PostgreSQL:
-- Configuraciones de seguridad
ALTER SYSTEM SET ssl = 'on';
ALTER SYSTEM SET ssl_min_protocol_version = 'TLSv1.3';
-- Performance tuning (ajustar segun carga)
ALTER SYSTEM SET shared_buffers = '256MB';
ALTER SYSTEM SET effective_cache_size = '1GB';
ALTER SYSTEM SET maintenance_work_mem = '64MB';
ALTER SYSTEM SET work_mem = '16MB';
-- Logging (sin PHI)
ALTER SYSTEM SET log_statement = 'ddl'; -- Solo DDL
ALTER SYSTEM SET log_min_duration_statement = 1000; -- Queries > 1s
ALTER SYSTEM SET log_connections = 'on';
ALTER SYSTEM SET log_disconnections = 'on';
-- IMPORTANTE: NO loguear contenido de queries (puede tener PHI)
ALTER SYSTEM SET log_line_prefix = '%t [%p]: user=%u,db=%d,app=%a,client=%h ';
2.3. Firebase Services¶
Tier: Blaze (pay-as-you-go)
Firebase Configuration:
authentication:
providers:
- email_password
- google
- apple # Obligatorio para iOS
password_policy:
min_length: 12
require_uppercase: true
require_lowercase: true
require_number: true
mfa:
enforcement:
free: optional
pro: required
perfect: required
providers:
- totp
- sms
session_management:
id_token_expiry: 3600 # 1 hora
refresh_token_expiry: 2592000 # 30 dias
concurrent_sessions:
free: 2
pro: 5
perfect: 10
cloud_messaging:
# FCM para push notifications (backup de local)
platforms:
- ios
- android
message_types:
- alert_backup # Backup de alertas locales
- emergency_notification
- caregiver_notification
rate_limiting:
per_device: 100/hour
per_topic: 1000/hour
# IMPORTANTE: Payloads NO contienen PHI
payload_sanitization: enforced
crashlytics:
enabled: true
symbolication: automatic
# NO enviar PHI en crash reports
custom_keys_allowed: false
user_id_collection: hashed_only
retention: 90_days
analytics:
# Analytics minimalista (Zero-Knowledge)
enabled: true
events:
# Solo eventos de uso, sin contenido
- app_open
- screen_view
- user_engagement
# NO enviar datos sensibles
automatically_collected: false
user_properties: [tier, role] # Solo metadata
Firebase Security Rules:
// firestore.rules (si se usa Firestore para cache)
rules_version = '2';
service cloud.firestore {
match /databases/{database}/documents {
// Blobs cifrados - solo owner
match /encrypted_blobs/{userId}/{blobId} {
allow read, write: if request.auth != null
&& request.auth.uid == userId;
}
// Catalogos publicos - autenticados
match /public_catalogs/{catalog}/{item} {
allow read: if request.auth != null;
allow write: if false; // Solo admins via backend
}
// Por defecto: denegar
match /{document=**} {
allow read, write: if false;
}
}
}
// storage.rules
rules_version = '2';
service firebase.storage {
match /b/{bucket}/o {
// Imagenes de recetas (anonimizadas)
match /user_uploads/{userId}/{fileName} {
allow read, write: if request.auth != null
&& request.auth.uid == userId
&& request.resource.size < 5 * 1024 * 1024; // 5MB
}
// Por defecto: denegar
match /{allPaths=**} {
allow read, write: if false;
}
}
}
2.4. Twilio Integration¶
Twilio Configuration:
service: SMS (Voice en v2)
usage:
- MFA verification codes
- Emergency alerts (Pro/Perfect)
- Caregiver notifications
phone_numbers:
quantity: 1 (inicio) -> escala segun region
type: local_number
capabilities: [SMS]
rate_limiting:
per_user: 10/day
per_number: 100/day
cooldown: 60s between messages
message_templates:
mfa_code: |
MedTime: Tu codigo de verificacion es {code}.
Valido por 5 minutos. No compartas este codigo.
emergency_alert: |
MedTime: ALERTA - {patient_name} ha activado alerta de emergencia.
Ubicacion: {location_link}
caregiver_reminder: |
MedTime: {patient_name} no ha confirmado toma de medicamento.
Programado: {scheduled_time}
# IMPORTANTE: NO incluir nombres de medicamentos en SMS
phi_handling: sanitized
monitoring:
delivery_status: tracked
error_codes: logged
retry_policy: 3_attempts
2.5. Network Topology¶
NETWORK ARCHITECTURE:
+------------------------------------------------------------------+
INTERNET
|
v
┌────────────────────────────────────────────────────────────────┐
│ CDN / WAF (Cloudflare) │
│ - DDoS protection │
│ - Rate limiting │
│ - TLS termination │
└────────────────────────────────────────────────────────────────┘
|
v
┌────────────────────────────────────────────────────────────────┐
│ Load Balancer (Supabase Managed) │
│ - Health checks │
│ - SSL/TLS 1.3 │
└────────────────────────────────────────────────────────────────┘
|
+--- Edge Functions (Supabase) ---+--- Firebase Auth
| |
v v
┌─────────────────────┐ ┌──────────────────┐
│ PostgreSQL │ │ Firebase │
│ (Private subnet) │ │ (Google Cloud) │
│ - RLS enabled │ │ - FCM │
│ - Backups │ │ - Crashlytics │
└─────────────────────┘ └──────────────────┘
|
v
┌─────────────────────┐
│ Supabase Storage │
│ (Encrypted blobs) │
└─────────────────────┘
SECURITY LAYERS:
- Layer 7: WAF (SQL injection, XSS, DDoS)
- Layer 6: TLS 1.3 (encryption in transit)
- Layer 5: JWT Auth (Firebase tokens)
- Layer 4: RLS (Row Level Security)
- Layer 3: Encryption at rest (AES-256)
- Layer 2: E2E encryption (client-side)
- Layer 1: Keychain/Keystore (device)
3. Environments¶
3.1. Development¶
Development Environment:
purpose: Desarrollo local y feature branches
supabase:
project: medtime-dev
database:
instance: Micro (free tier)
data: Fixtures de prueba
url: https://dev.medtime.supabase.co
firebase:
project: medtime-dev
config: firebase-dev-config.json
access:
developers: all
testers: read-only
data:
# NUNCA datos de produccion
source: fixtures + factories
phi: synthetic_only
deployment:
trigger: manual
approval: none
3.2. Staging¶
Staging Environment:
purpose: QA y testing pre-produccion
supabase:
project: medtime-staging
database:
instance: Small
data: Replicas anonimizadas de produccion
url: https://staging.medtime.supabase.co
firebase:
project: medtime-staging
config: firebase-staging-config.json
access:
developers: read-write
testers: read-write
stakeholders: read-only
data:
# Datos anonimizados de produccion
source: prod_anonymized + synthetic
refresh: weekly
deployment:
trigger: push to develop branch
approval: automatic
testing:
- E2E tests
- Load tests
- Security scans
3.3. Production¶
Production Environment:
purpose: Usuarios reales
supabase:
project: medtime-prod
database:
instance: Medium (inicio) -> auto-scale
data: Real user data (encrypted)
url: https://api.medtime.app
custom_domain: api.medtime.app
firebase:
project: medtime-prod
config: firebase-prod-config.json
access:
developers: read-only (with audit)
on_call: read-write (emergency only)
admins: full (with approval)
data:
source: user_generated
phi: encrypted_e2e
backups:
frequency: hourly
retention: 30_days
deployment:
trigger: git tag (vX.Y.Z)
approval: manual (tech lead + security)
strategy: blue_green
rollback: automatic on errors
monitoring:
uptime_target: 99.9%
alerts: pagerduty
zero_knowledge: enforced
3.4. Feature Branches¶
Feature Branch Environments:
# Environments efimeros para PRs grandes
creation:
trigger: PR con label "needs-preview"
naming: feat-PR-{number}
resources:
supabase: Micro instance
firebase: Shared dev project
lifetime: hasta merge o cierre de PR
cleanup:
trigger: PR merged o closed
retention: 7_days despues de cierre
3.5. Environment Promotion¶
FLUJO DE PROMOCION:
+------------------------------------------------------------------+
feature/* ─┐
├──> develop ──> staging ──> main ──> production
feature/* ─┘ │ │ │
│ │ │
Auto Deploy Auto Deploy Manual Deploy
+ E2E Tests + Approval
GATES POR AMBIENTE:
develop -> staging:
- Unit tests pass
- Lint pass
- Security scan (SAST)
staging -> main:
- Integration tests pass
- E2E tests pass
- Load tests pass
- Security scan (DAST)
- Manual QA approval
main -> production:
- All previous gates
- Tech lead approval
- Security team approval (para releases con cambios de seguridad)
- Changelog generado
4. CI/CD Pipelines¶
4.1. Mobile CI/CD (GitHub Actions + Fastlane)¶
iOS Pipeline:
# .github/workflows/ios.yml
name: iOS CI/CD
on:
push:
branches: [main, develop, 'TechSpec-*']
pull_request:
branches: [main, develop]
release:
types: [created]
env:
FASTLANE_XCODEBUILD_SETTINGS_TIMEOUT: 120
jobs:
test:
name: Unit Tests
runs-on: macos-14
steps:
- uses: actions/checkout@v4
- name: Setup Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.2'
bundler-cache: true
- name: Install dependencies
run: |
cd ios
bundle install
pod install
- name: Run tests
run: |
cd ios
bundle exec fastlane test
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ios/coverage/lcov.info
flags: ios
lint:
name: Lint & Format
runs-on: macos-14
steps:
- uses: actions/checkout@v4
- name: SwiftLint
run: |
cd ios
swiftlint lint --strict
- name: SwiftFormat
run: |
cd ios
swiftformat --lint .
security:
name: Security Scan
runs-on: macos-14
steps:
- uses: actions/checkout@v4
- name: Snyk Security Scan
uses: snyk/actions/cocoapods@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
- name: OWASP Dependency Check
run: |
cd ios
bundle exec fastlane security_scan
build_testflight:
name: Build & Deploy to TestFlight
runs-on: macos-14
needs: [test, lint, security]
if: github.ref == 'refs/heads/develop'
steps:
- uses: actions/checkout@v4
- name: Setup certificates
env:
MATCH_PASSWORD: ${{ secrets.MATCH_PASSWORD }}
FASTLANE_USER: ${{ secrets.FASTLANE_USER }}
run: |
cd ios
bundle exec fastlane match development --readonly
bundle exec fastlane match appstore --readonly
- name: Build and upload to TestFlight
env:
FASTLANE_USER: ${{ secrets.FASTLANE_USER }}
FASTLANE_PASSWORD: ${{ secrets.FASTLANE_PASSWORD }}
FASTLANE_APPLE_APPLICATION_SPECIFIC_PASSWORD: ${{ secrets.FASTLANE_APP_PASSWORD }}
run: |
cd ios
bundle exec fastlane beta
- name: Upload build artifacts
uses: actions/upload-artifact@v3
with:
name: ios-build
path: ios/build/
release_appstore:
name: Release to App Store
runs-on: macos-14
needs: [test, lint, security]
if: github.event_name == 'release'
steps:
- uses: actions/checkout@v4
- name: Setup certificates
env:
MATCH_PASSWORD: ${{ secrets.MATCH_PASSWORD }}
run: |
cd ios
bundle exec fastlane match appstore --readonly
- name: Build and upload to App Store
env:
FASTLANE_USER: ${{ secrets.FASTLANE_USER }}
FASTLANE_PASSWORD: ${{ secrets.FASTLANE_PASSWORD }}
run: |
cd ios
bundle exec fastlane release
- name: Create GitHub Release
uses: softprops/action-gh-release@v1
with:
files: |
ios/build/*.ipa
ios/changelog.md
iOS Fastfile:
# ios/fastlane/Fastfile
default_platform(:ios)
platform :ios do
desc "Run unit tests"
lane :test do
scan(
scheme: "MedTime",
devices: ["iPhone 15"],
code_coverage: true,
output_directory: "coverage"
)
end
desc "Run security scan"
lane :security_scan do
# Snyk scan
sh("snyk test --all-projects")
# Check for hardcoded secrets
sh("git secrets --scan")
end
desc "Build for TestFlight"
lane :beta do
# Increment build number
increment_build_number(
build_number: latest_testflight_build_number + 1
)
# Sync certificates
match(
type: "appstore",
readonly: true
)
# Build
build_app(
scheme: "MedTime",
export_method: "app-store",
output_directory: "build"
)
# Upload to TestFlight
upload_to_testflight(
skip_waiting_for_build_processing: true,
distribute_external: false # Solo internal testers
)
# Notify team
slack(
message: "iOS build uploaded to TestFlight!",
channel: "#mobile-releases"
)
end
desc "Release to App Store"
lane :release do
# Sync certificates
match(
type: "appstore",
readonly: true
)
# Build
build_app(
scheme: "MedTime",
export_method: "app-store"
)
# Upload to App Store
upload_to_app_store(
submit_for_review: false, # Manual submission
automatic_release: false, # Phased release
phased_release: true,
submission_information: {
export_compliance_uses_encryption: true,
export_compliance_is_exempt: false,
export_compliance_encryption_updated: false
}
)
# Notify
slack(
message: "iOS app submitted to App Store!",
channel: "#releases"
)
end
end
Android Pipeline:
# .github/workflows/android.yml
name: Android CI/CD
on:
push:
branches: [main, develop, 'TechSpec-*']
pull_request:
branches: [main, develop]
release:
types: [created]
jobs:
test:
name: Unit Tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup JDK
uses: actions/setup-java@v3
with:
java-version: '17'
distribution: 'temurin'
- name: Setup Android SDK
uses: android-actions/setup-android@v2
- name: Cache Gradle
uses: actions/cache@v3
with:
path: |
~/.gradle/caches
~/.gradle/wrapper
key: ${{ runner.os }}-gradle-${{ hashFiles('**/*.gradle*') }}
- name: Run tests
run: ./gradlew testDebugUnitTest
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: android/app/build/reports/jacoco/testDebugUnitTestCoverage/testDebugUnitTestCoverage.xml
flags: android
lint:
name: Lint & Format
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup JDK
uses: actions/setup-java@v3
with:
java-version: '17'
distribution: 'temurin'
- name: Run ktlint
run: ./gradlew ktlintCheck
- name: Run Android Lint
run: ./gradlew lintDebug
security:
name: Security Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Snyk Security Scan
uses: snyk/actions/gradle@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
- name: MobSF Scan
run: |
# MobSF scan for APK
./gradlew assembleDebug
# Upload to MobSF API
curl -F "file=@app/build/outputs/apk/debug/app-debug.apk" \
-H "Authorization: ${{ secrets.MOBSF_API_KEY }}" \
https://mobsf-api.example.com/scan
build_beta:
name: Build & Deploy to Firebase App Distribution
runs-on: ubuntu-latest
needs: [test, lint, security]
if: github.ref == 'refs/heads/develop'
steps:
- uses: actions/checkout@v4
- name: Setup JDK
uses: actions/setup-java@v3
with:
java-version: '17'
distribution: 'temurin'
- name: Decode keystore
run: |
echo "${{ secrets.ANDROID_KEYSTORE_BASE64 }}" | base64 -d > keystore.jks
- name: Build APK
env:
KEYSTORE_PASSWORD: ${{ secrets.KEYSTORE_PASSWORD }}
KEY_ALIAS: ${{ secrets.KEY_ALIAS }}
KEY_PASSWORD: ${{ secrets.KEY_PASSWORD }}
run: ./gradlew assembleRelease
- name: Upload to Firebase App Distribution
uses: wzieba/Firebase-Distribution-Github-Action@v1
with:
appId: ${{ secrets.FIREBASE_APP_ID_ANDROID }}
token: ${{ secrets.FIREBASE_TOKEN }}
groups: internal-testers
file: app/build/outputs/apk/release/app-release.apk
releaseNotes: |
Build from commit: ${{ github.sha }}
Changes: See GitHub
release_playstore:
name: Release to Play Store
runs-on: ubuntu-latest
needs: [test, lint, security]
if: github.event_name == 'release'
steps:
- uses: actions/checkout@v4
- name: Setup JDK
uses: actions/setup-java@v3
with:
java-version: '17'
distribution: 'temurin'
- name: Decode keystore
run: |
echo "${{ secrets.ANDROID_KEYSTORE_BASE64 }}" | base64 -d > keystore.jks
- name: Build AAB
env:
KEYSTORE_PASSWORD: ${{ secrets.KEYSTORE_PASSWORD }}
KEY_ALIAS: ${{ secrets.KEY_ALIAS }}
KEY_PASSWORD: ${{ secrets.KEY_PASSWORD }}
run: ./gradlew bundleRelease
- name: Upload to Play Store
uses: r0adkll/upload-google-play@v1
with:
serviceAccountJsonPlainText: ${{ secrets.PLAY_SERVICE_ACCOUNT }}
packageName: com.medtime.app
releaseFiles: app/build/outputs/bundle/release/app-release.aab
track: internal # internal -> alpha -> beta -> production
status: completed
4.2. Backend CI/CD¶
# .github/workflows/backend.yml
name: Backend CI/CD
on:
push:
branches: [main, develop]
paths:
- 'backend/**'
- 'supabase/**'
pull_request:
branches: [main, develop]
paths:
- 'backend/**'
- 'supabase/**'
jobs:
test:
name: Test Supabase Functions
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Deno
uses: denoland/setup-deno@v1
with:
deno-version: v1.x
- name: Run tests
run: |
cd supabase/functions
deno test --allow-all
- name: Lint
run: |
cd supabase/functions
deno lint
db_test:
name: Test Database
runs-on: ubuntu-latest
services:
postgres:
image: supabase/postgres:15
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
steps:
- uses: actions/checkout@v4
- name: Run migrations
run: |
psql -h localhost -U postgres -f supabase/migrations/*.sql
env:
PGPASSWORD: postgres
- name: Run pgTAP tests
run: |
psql -h localhost -U postgres -f supabase/tests/*.sql
env:
PGPASSWORD: postgres
deploy_staging:
name: Deploy to Staging
runs-on: ubuntu-latest
needs: [test, db_test]
if: github.ref == 'refs/heads/develop'
steps:
- uses: actions/checkout@v4
- name: Setup Supabase CLI
uses: supabase/setup-cli@v1
- name: Link Supabase project
run: supabase link --project-ref ${{ secrets.SUPABASE_STAGING_PROJECT_ID }}
env:
SUPABASE_ACCESS_TOKEN: ${{ secrets.SUPABASE_ACCESS_TOKEN }}
- name: Deploy database migrations
run: supabase db push
- name: Deploy Edge Functions
run: |
cd supabase/functions
for func in */; do
supabase functions deploy ${func%/}
done
deploy_production:
name: Deploy to Production
runs-on: ubuntu-latest
needs: [test, db_test]
if: github.ref == 'refs/heads/main'
environment:
name: production
url: https://api.medtime.app
steps:
- uses: actions/checkout@v4
- name: Setup Supabase CLI
uses: supabase/setup-cli@v1
- name: Link Supabase project
run: supabase link --project-ref ${{ secrets.SUPABASE_PROD_PROJECT_ID }}
env:
SUPABASE_ACCESS_TOKEN: ${{ secrets.SUPABASE_ACCESS_TOKEN }}
- name: Deploy database migrations
run: supabase db push
- name: Deploy Edge Functions
run: |
cd supabase/functions
for func in */; do
supabase functions deploy ${func%/}
done
- name: Verify deployment
run: |
# Health check
curl -f https://api.medtime.app/health || exit 1
- name: Notify team
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "Backend deployed to production!",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "Backend deployed to production\nCommit: ${{ github.sha }}"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
4.3. Database Migrations¶
# Estrategia de migraciones
Database Migration Strategy:
tool: Supabase Migrations (SQL)
naming: YYYYMMDDHHMMSS_description.sql
workflow:
1. Crear migracion local:
supabase migration new add_table_xyz
2. Escribir SQL con rollback:
- UP migration (cambios)
- DOWN migration (revert)
3. Test local:
supabase db reset
supabase db test
4. PR con migracion
5. Deploy a staging (automatico)
6. Verificar staging
7. Deploy a produccion (manual approval)
rollback:
strategy: manual
procedure:
1. Identificar migracion problematica
2. Crear migracion de rollback
3. Deploy urgente
4. Post-mortem
Ejemplo de migracion:
-- supabase/migrations/20251208120000_add_user_preferences.sql
-- UP Migration
CREATE TABLE IF NOT EXISTS srv_user_preferences (
user_id UUID PRIMARY KEY REFERENCES srv_users(user_id) ON DELETE CASCADE,
language VARCHAR(10) NOT NULL DEFAULT 'es',
timezone VARCHAR(50) NOT NULL DEFAULT 'America/Mexico_City',
notification_enabled BOOLEAN NOT NULL DEFAULT true,
theme VARCHAR(20) NOT NULL DEFAULT 'system',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- RLS
ALTER TABLE srv_user_preferences ENABLE ROW LEVEL SECURITY;
CREATE POLICY user_preferences_select ON srv_user_preferences
FOR SELECT
USING (user_id = auth.uid());
CREATE POLICY user_preferences_update ON srv_user_preferences
FOR UPDATE
USING (user_id = auth.uid())
WITH CHECK (user_id = auth.uid());
-- Indices
CREATE INDEX idx_user_preferences_user_id ON srv_user_preferences(user_id);
-- Trigger para updated_at
CREATE TRIGGER update_srv_user_preferences_updated_at
BEFORE UPDATE ON srv_user_preferences
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();
-- DOWN Migration (en archivo separado o comentado)
-- DROP TABLE IF EXISTS srv_user_preferences;
4.4. Release Management¶
Release Strategy:
versioning: Semantic Versioning (SemVer)
format: vMAJOR.MINOR.PATCH
examples:
- v1.0.0 - Initial release
- v1.1.0 - New features (backward compatible)
- v1.1.1 - Bug fixes
- v2.0.0 - Breaking changes
branches:
main: Production releases
develop: Development (next release)
hotfix/*: Emergency fixes
release_process:
1. Create release branch from develop
git checkout -b release/v1.2.0 develop
2. Bump version numbers
- iOS: Info.plist
- Android: build.gradle
- Backend: package.json
3. Run full test suite
4. Deploy to staging
5. QA approval
6. Merge to main
7. Tag release
git tag -a v1.2.0 -m "Release v1.2.0"
git push origin v1.2.0
8. GitHub Actions deploys to production
9. Monitor for 24 hours
10. Announce release
rollout_strategy:
ios:
- TestFlight (internal): 100%
- TestFlight (external): 100%
- App Store (phased):
- Day 1: 1%
- Day 2: 2%
- Day 3: 5%
- Day 4: 10%
- Day 5: 20%
- Day 6: 50%
- Day 7: 100%
android:
- Internal testing: 100%
- Alpha: 100%
- Beta (staged):
- Day 1: 1%
- Day 2: 5%
- Day 3: 10%
- Day 4: 20%
- Day 5: 50%
- Day 7: 100%
backend:
- Blue-green deployment
- Canary release (10% -> 50% -> 100%)
- Automatic rollback on error rate > 1%
5. Monitoring y Alerting¶
5.1. Zero-Knowledge Monitoring¶
PRINCIPIO ZERO-KNOWLEDGE MONITORING:
+------------------------------------------------------------------+
| El sistema de monitoreo NUNCA tiene acceso a PHI |
| |
| PERMITIDO: |
| - Metadata operativa (counts, timings, sizes) |
| - Error codes (sin mensajes con PHI) |
| - Performance metrics (latency, throughput) |
| - Availability (uptime, downtime) |
| |
| PROHIBIDO: |
| - Contenido de blobs cifrados |
| - Nombres de medicamentos |
| - Datos de usuario |
| - Request/response bodies con PHI |
| - Stack traces con PHI |
+------------------------------------------------------------------+
5.2. Metricas Permitidas¶
Metricas PERMITIDAS (Sin PHI):
api_metrics:
# Contadores
- http_requests_total
labels: [method, endpoint, status_code, tier]
- http_errors_total
labels: [endpoint, error_code, tier]
# Latencia
- http_request_duration_seconds
labels: [endpoint, method]
percentiles: [50, 95, 99]
# Throughput
- http_requests_per_second
labels: [endpoint]
database_metrics:
# Conexiones
- db_connections_active
- db_connections_idle
- db_connections_waiting
# Queries
- db_query_duration_seconds
percentiles: [50, 95, 99]
- db_queries_per_second
# Storage
- db_size_bytes
- db_table_size_bytes
labels: [table_name]
sync_metrics:
# Sincronizacion (sin contenido)
- sync_operations_total
labels: [operation_type, status]
- sync_blob_size_bytes
labels: [entity_type]
- sync_duration_seconds
labels: [direction] # push/pull
auth_metrics:
# Autenticacion
- auth_login_attempts_total
labels: [status, tier]
- auth_mfa_challenges_total
labels: [method, status]
- auth_token_refreshes_total
mobile_metrics:
# Crashes (sin stack traces con PHI)
- app_crashes_total
labels: [platform, version, error_type]
# Performance
- app_launch_time_seconds
labels: [platform, launch_type]
- screen_load_time_seconds
labels: [screen_name]
5.3. Metricas Prohibidas¶
Metricas PROHIBIDAS (Contienen PHI):
NUNCA_LOGUEAR:
# Contenido
- medication_names
- user_names
- dose_times
- health_conditions
# Detalles de requests
- request_body
- response_body
- query_parameters (pueden tener nombres)
# Stack traces
- error_messages_with_user_data
- exception_details_with_phi
# Patterns
- medication_adherence_patterns
- user_behavior_details
5.4. Alertas de Infraestructura¶
Alertas de Infraestructura:
critical:
# P1 - Respuesta inmediata
api_error_rate_high:
condition: error_rate > 5% for 5m
severity: critical
notification: pagerduty
action: auto_rollback if deployment in last 1h
database_down:
condition: db_connections_active == 0 for 1m
severity: critical
notification: pagerduty + sms
action: failover to backup
disk_space_critical:
condition: disk_usage > 90%
severity: critical
notification: pagerduty
action: auto_cleanup + scale
auth_service_down:
condition: auth_success_rate < 50% for 5m
severity: critical
notification: pagerduty
action: investigate firebase status
high:
# P2 - Respuesta rapida (< 1 hora)
slow_response_time:
condition: p99_latency > 5s for 10m
severity: high
notification: slack
action: investigate slow queries
high_sync_failures:
condition: sync_failure_rate > 10% for 15m
severity: high
notification: slack
action: check network + backend health
memory_usage_high:
condition: memory_usage > 80% for 10m
severity: high
notification: slack
action: investigate memory leaks
medium:
# P3 - Revision diaria
increased_crash_rate:
condition: crash_rate > 1% for 1h
severity: medium
notification: email
action: review crashlytics
backup_failed:
condition: last_backup_age > 25h
severity: medium
notification: email
action: retry backup
ssl_cert_expiring:
condition: ssl_cert_days_remaining < 30
severity: medium
notification: email
action: renew certificate
5.5. Dashboards¶
Dashboard Principal:
Main Dashboard:
panels:
overview:
- uptime (%)
- active_users (count, sin identificacion)
- requests_per_minute
- error_rate (%)
api_health:
- request_latency (p50, p95, p99)
- error_rate_by_endpoint
- requests_by_status_code
- top_slow_endpoints
database:
- connection_pool_usage
- query_performance
- database_size
- slow_queries_count
sync:
- sync_operations_per_minute
- sync_success_rate
- avg_blob_size
- pending_operations
mobile:
- crash_free_rate (iOS/Android)
- app_launches_per_day
- avg_session_duration
- version_distribution
refresh_rate: 30s
retention: 30_days
# (DV2-P3) OPS-BAJO-001: Mobile dashboard refresh rate especificado
Mobile Dashboards:
crashlytics_dashboard:
platform: Firebase Crashlytics
metrics:
- crash_free_users (%)
- crash_free_sessions (%)
- crash_trends
- top_crashes
- affected_versions
refresh_rate: 5_minutes
retention: 90_days
performance_dashboard:
platform: Firebase Performance
metrics:
- app_launch_time
- screen_rendering_time
- network_request_latency
- custom_traces
refresh_rate: 5_minutes
retention: 30_days
analytics_dashboard:
platform: Firebase Analytics
metrics:
- active_users
- session_duration
- screen_views
- user_engagement
refresh_rate: 5_minutes
retention: 60_days
justification: |
5 minute refresh rate balances:
- Real-time visibility for incident response
- Firebase API rate limits
- Cost optimization (fewer API calls)
- Mobile battery impact (if viewing on mobile)
6. Disaster Recovery¶
6.1. RTO y RPO¶
Recovery Objectives:
definitions:
RTO: Recovery Time Objective (tiempo max de downtime)
RPO: Recovery Point Objective (perdida max de datos)
targets:
database:
rto: 4_hours
rpo: 1_hour
justification: |
- Blobs estan cifrados (baja criticidad de confidencialidad)
- Usuarios pueden operar offline
- 1 hora de perdida es aceptable
authentication:
rto: 1_hour
rpo: 0 # Firebase managed, replicated
justification: |
- Firebase Auth tiene HA nativa
- Critical para nuevos logins
- Usuarios existentes tienen tokens validos
push_notifications:
rto: 2_hours
rpo: N/A # Stateless
justification: |
- Las alertas locales son primarias
- Push es backup
- No hay datos en riesgo
edge_functions:
rto: 2_hours
rpo: 0 # Stateless, IaC
justification: |
- Redeploy desde Git
- Sin estado persistente
6.2. Backup Strategy¶
Backup Strategy:
database:
automated_backups:
frequency: hourly
retention: 30_days
type: full_backup
encryption: AES-256
storage: Supabase managed + S3 copy
point_in_time_recovery:
enabled: true
window: 7_days
granularity: 1_second
manual_snapshots:
before: major_migrations
retention: 90_days
labeled: true
storage_blobs:
replication:
strategy: multi_region
regions: [us-east-1, us-west-2]
sync: continuous
versioning:
enabled: true
retention: 30_days
lifecycle:
transition_to_infrequent_access: 90_days
transition_to_glacier: 1_year
delete: never # Compliance requirement
configurations:
infrastructure_as_code:
source: Git repository
backup: GitHub + GitLab mirror
retention: indefinite
secrets:
source: AWS Secrets Manager
backup: encrypted_export
frequency: weekly
retention: 90_days
testing:
# Backup restore tests
frequency: quarterly
scope: full_restore_to_staging
success_criteria:
- restore_time < RTO
- data_integrity_100%
- application_functional
6.3. Failover Procedures¶
Failover Procedures:
database_failover:
trigger:
- primary_instance_down for 5m
- manual_trigger by on_call
automated_steps:
1. Health check confirms primary down
2. Promote read replica to primary
3. Update DNS to point to new primary
4. Restart application connections
5. Verify new primary is healthy
manual_steps:
1. Notify team (automatic)
2. Monitor application health
3. Investigate root cause
4. Document incident
estimated_time: 15_minutes
region_failover:
trigger:
- region_unavailable
- disaster_in_primary_region
steps:
1. Confirm region is down
2. Update DNS to secondary region
3. Promote secondary database
4. Verify application health
5. Notify users of potential data loss (if RPO > 0)
estimated_time: 1_hour
data_loss: Up to RPO (1 hour)
rollback_deployment:
trigger:
- error_rate > 5% after deployment
- critical_bug_discovered
- manual_trigger
automated_steps:
1. Detect error rate spike
2. Revert to previous version
3. Restart application
4. Verify health
manual_steps:
1. Investigate cause
2. Fix and redeploy
3. Post-mortem
estimated_time: 5_minutes
6.4. Data Recovery¶
Data Recovery Procedures:
accidental_deletion:
user_data:
source: database_backup
steps:
1. Identify backup with data
2. Restore to staging
3. Extract specific records
4. Verify data integrity
5. Import to production
max_time: 2_hours
table_dropped:
source: point_in_time_recovery
steps:
1. Identify timestamp before drop
2. Restore to staging (PITR)
3. Export table
4. Import to production
5. Verify via count/checksums
max_time: 4_hours
data_corruption:
detection:
- checksum_mismatch
- user_reports
- automated_integrity_checks
response:
1. Isolate affected data
2. Identify corruption scope
3. Find last good backup
4. Restore from backup
5. Replay transactions if possible
6. Notify affected users
compliance_recovery:
audit_log_recovery:
retention: 6_years (HIPAA)
storage: glacier
encryption: yes
phi_recovery:
# Todos los datos PHI estan cifrados E2E
# Servidor no puede descifrar
# Usuario debe restaurar desde backup local
server_role: provide_encrypted_blobs
user_role: decrypt_with_own_key
6.5. DR Testing Trimestral (DV2-P2)¶
Agregado: OPS-MEDIO-002 - Especificacion detallada de DR testing trimestral.
6.5.1. Calendario de DR Tests¶
DR Testing Schedule:
version: "1.0"
quarterly_tests:
Q1: # Enero-Marzo
month: February
week: 2
focus:
- Database full restore
- RTO validation
environment: staging
Q2: # Abril-Junio
month: May
week: 2
focus:
- Region failover simulation
- RPO validation
environment: staging
Q3: # Julio-Septiembre
month: August
week: 2
focus:
- Complete disaster simulation
- Full stack recovery
environment: staging + limited prod
Q4: # Octubre-Diciembre
month: November
week: 2
focus:
- Tabletop exercise with team
- Documentation review
- Annual DR plan update
environment: documentation + staging
6.5.2. Escenarios de Test¶
DR Test Scenarios:
scenario_1_database_restore:
name: "Full Database Restore"
frequency: quarterly
duration: 4_hours
participants:
- DevOps Lead
- Database Admin
- Backend Developer
preconditions:
- Staging environment available
- Latest backup identified
- Runbook accessible
steps:
1:
action: "Identify latest backup"
command: "supabase db backups list"
expected: "Backup list with timestamps"
max_time: 5_minutes
2:
action: "Create test instance"
command: "supabase db create --name dr-test-YYYYMMDD"
expected: "Empty database created"
max_time: 10_minutes
3:
action: "Restore backup to test instance"
command: "supabase db restore --backup-id XXX --target dr-test"
expected: "Restore completed successfully"
max_time: 60_minutes
4:
action: "Verify data integrity"
command: "psql -f scripts/dr-verify-integrity.sql"
expected: "All checksums match"
max_time: 15_minutes
5:
action: "Test application connectivity"
command: "curl https://dr-test-api/health"
expected: "HTTP 200 with healthy status"
max_time: 10_minutes
6:
action: "Run smoke tests"
command: "npm run test:smoke --env=dr-test"
expected: "All smoke tests pass"
max_time: 30_minutes
success_criteria:
- Total restore time < RTO (4 hours)
- Data integrity 100%
- Application functional
- No data loss (matches backup timestamp)
cleanup:
- Delete test database instance
- Remove temporary credentials
- Document results
scenario_2_failover:
name: "Database Failover Simulation"
frequency: quarterly
duration: 2_hours
participants:
- DevOps Lead
- On-Call Engineer
steps:
1:
action: "Confirm replica sync status"
command: "supabase db replicas status"
expected: "Replica in sync, lag < 1s"
2:
action: "Simulate primary failure"
command: "supabase db pause --instance primary --test-mode"
expected: "Primary paused"
3:
action: "Trigger automatic failover"
command: "Monitor: automatic or manual promote"
expected: "Replica promoted to primary"
max_time: 15_minutes
4:
action: "Verify application connectivity"
expected: "Application reconnected to new primary"
max_time: 5_minutes
5:
action: "Restore original primary"
command: "supabase db resume --instance original-primary"
expected: "Original primary as new replica"
success_criteria:
- Failover time < 15 minutes
- No data loss
- Application recovery automatic
scenario_3_region_failover:
name: "Cross-Region Disaster Recovery"
frequency: annually (Q3)
duration: 8_hours
participants:
- DevOps Lead
- Tech Lead
- Security Lead
steps:
1:
action: "Pre-test: verify DR region ready"
expected: "DR infrastructure operational"
2:
action: "Simulate primary region outage"
command: "Block all traffic to primary region"
expected: "Primary region unreachable"
3:
action: "Activate DR runbook"
expected: "Follow documented procedures"
4:
action: "Promote DR database"
command: "supabase db promote --region dr-region"
expected: "DR database now primary"
5:
action: "Update DNS to DR region"
command: "Update Route53/Cloudflare records"
expected: "Traffic routed to DR region"
6:
action: "Verify full application stack"
expected: "All services operational in DR region"
7:
action: "Measure data loss"
command: "Compare last DR sync with current time"
expected: "Data loss <= RPO (1 hour)"
success_criteria:
- Total recovery time < RTO (4 hours)
- Data loss <= RPO (1 hour)
- All services functional
- User impact documented
6.5.3. Runbooks y Documentacion¶
DR Documentation:
runbooks:
location: "docs/runbooks/"
database_restore:
file: "RB-DR-001-database-restore.md"
last_updated: null
last_tested: null
owner: "DevOps Lead"
failover:
file: "RB-DR-002-failover.md"
last_updated: null
last_tested: null
owner: "DevOps Lead"
region_failover:
file: "RB-DR-003-region-failover.md"
last_updated: null
last_tested: null
owner: "Tech Lead"
communication:
file: "RB-DR-004-incident-communication.md"
last_updated: null
last_tested: null
owner: "Product Manager"
templates:
dr_test_report: "templates/dr-test-report.md"
post_mortem: "templates/post-mortem.md"
incident_timeline: "templates/incident-timeline.md"
6.5.4. Metricas y Reporting¶
DR Metrics:
kpis:
- name: "RTO Achievement"
target: "< 4 hours"
measurement: "Time from disaster declaration to full recovery"
- name: "RPO Achievement"
target: "< 1 hour"
measurement: "Maximum data loss in any test"
- name: "Test Success Rate"
target: "> 95%"
measurement: "Successful tests / Total tests"
- name: "Runbook Accuracy"
target: "100%"
measurement: "Steps executed without deviation"
- name: "Mean Time to Recovery"
target: "< 2 hours"
measurement: "Average recovery time across all tests"
reporting:
quarterly_report:
recipients:
- Engineering Lead
- Security Lead
- CTO
contents:
- Test summary
- RTO/RPO metrics
- Issues discovered
- Remediation actions
- Next quarter plan
annual_review:
recipients:
- Executive Team
- Board (if required)
contents:
- Year summary
- Trend analysis
- DR plan updates
- Budget recommendations
6.5.5. CI/CD Integration¶
# .github/workflows/dr-test-automation.yml
name: DR Test Automation
on:
schedule:
# Monthly backup verification (every 1st of month at 3AM UTC)
- cron: '0 3 1 * *'
workflow_dispatch:
inputs:
test_type:
description: 'Type of DR test to run'
required: true
type: choice
options:
- backup-verify
- restore-staging
- failover-test
notify_team:
description: 'Send notifications to team'
required: true
type: boolean
default: false
jobs:
backup-verify:
if: github.event.inputs.test_type == 'backup-verify' || github.event_name == 'schedule'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Check Backup Status
run: |
# Verify backup exists and is recent
LATEST_BACKUP=$(supabase db backups list --json | jq -r '.[0].created_at')
BACKUP_AGE_HOURS=$(( ($(date +%s) - $(date -d "$LATEST_BACKUP" +%s)) / 3600 ))
echo "Latest backup: $LATEST_BACKUP"
echo "Backup age: $BACKUP_AGE_HOURS hours"
if [ "$BACKUP_AGE_HOURS" -gt 25 ]; then
echo "::error::Backup is more than 25 hours old!"
exit 1
fi
- name: Verify Backup Integrity
run: |
# Download and verify backup checksum
supabase db backups verify --latest
- name: Report Results
if: always()
run: |
# Log to monitoring system
echo "DR Backup Verification: $(date)"
echo "Status: ${{ job.status }}"
restore-staging:
if: github.event.inputs.test_type == 'restore-staging'
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- name: Start DR Test
run: |
echo "::notice::Starting DR Restore Test to Staging"
- name: Identify Backup
id: backup
run: |
BACKUP_ID=$(supabase db backups list --json | jq -r '.[0].id')
echo "backup_id=$BACKUP_ID" >> $GITHUB_OUTPUT
- name: Restore to Staging
run: |
supabase db restore \
--backup-id ${{ steps.backup.outputs.backup_id }} \
--target staging \
--confirm
timeout-minutes: 120
- name: Verify Restore
run: |
# Run integrity checks
npm run test:db-integrity --env=staging
- name: Run Smoke Tests
run: |
npm run test:smoke --env=staging
- name: Calculate RTO
run: |
START_TIME=${{ github.run_started_at }}
END_TIME=$(date -Iseconds)
# Calculate and report RTO
- name: Notify Team
if: github.event.inputs.notify_team == 'true'
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "DR Restore Test Completed",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*DR Restore Test Completed*\nStatus: ${{ job.status }}\nEnvironment: Staging"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_DR }}
generate-report:
needs: [backup-verify]
if: always()
runs-on: ubuntu-latest
steps:
- name: Generate DR Test Report
run: |
cat << EOF > dr-report.md
# DR Test Report
**Date**: $(date -I)
**Type**: Backup Verification
**Status**: ${{ needs.backup-verify.result }}
## Metrics
- Backup Age: Within 24h threshold
- Integrity: Verified
## Recommendations
- Continue quarterly testing schedule
EOF
- name: Upload Report
uses: actions/upload-artifact@v3
with:
name: dr-report-${{ github.run_number }}
path: dr-report.md
6.5.6. Checklist Trimestral¶
## DR Test Checklist - Q[X] YYYY
### Pre-Test (1 week before)
- [ ] Schedule confirmed with all participants
- [ ] Staging environment available
- [ ] Runbooks reviewed and updated
- [ ] Monitoring alerts configured
- [ ] Communication plan ready
### Test Day
- [ ] Kickoff meeting completed
- [ ] Backup identified and verified
- [ ] Restore initiated
- [ ] Restore completed within RTO
- [ ] Data integrity verified
- [ ] Application smoke tests passed
- [ ] Failover tested (if applicable)
- [ ] Cleanup completed
### Post-Test (within 1 week)
- [ ] Test report generated
- [ ] Issues documented
- [ ] Runbooks updated with findings
- [ ] Metrics recorded
- [ ] Report shared with stakeholders
- [ ] Remediation items assigned
- [ ] Next quarter test scheduled
### Sign-off
- [ ] DevOps Lead: _________ Date: _____
- [ ] Security Lead: ________ Date: _____
- [ ] Tech Lead: ___________ Date: _____
7. Security Infrastructure¶
7.1. Secrets Management¶
Secrets Management:
provider: GitHub Secrets + Supabase Vault
categories:
ci_cd_secrets:
storage: GitHub Secrets
secrets:
- SUPABASE_ACCESS_TOKEN
- FIREBASE_TOKEN
- FASTLANE_PASSWORD
- ANDROID_KEYSTORE_BASE64
- SLACK_WEBHOOK
rotation: manual (annually)
access: GitHub Actions only
runtime_secrets:
storage: Supabase Vault / AWS Secrets Manager
secrets:
- DATABASE_PASSWORD
- JWT_SECRET
- TWILIO_AUTH_TOKEN
- ENCRYPTION_SALT
rotation: automated (30_days for DB, 90_days others)
access: Edge Functions only
mobile_secrets:
storage: Keychain (iOS) / Keystore (Android)
secrets:
- USER_MASTER_KEY (derived from password)
- DEVICE_ID
- SESSION_TOKENS
rotation: per_session (tokens)
access: App only
best_practices:
- NO hardcoded secrets in code
- NO secrets in logs
- NO secrets in error messages
- Rotate after team member departure
- Audit access quarterly
7.2. Certificate Management¶
Certificate Management:
ssl_certificates:
provider: Let's Encrypt (via Supabase/Firebase)
domains:
- api.medtime.app (Supabase)
- medtime.app (Marketing site)
renewal:
method: automatic
window: 30_days before expiry
notification: email if renewal fails
monitoring:
check_frequency: daily
alert_threshold: 30_days remaining
code_signing:
ios:
type: Apple Developer Certificate
storage: Fastlane Match (encrypted Git repo)
rotation: automatic (Apple managed)
certificates:
- Development
- Distribution (App Store)
- Push Notifications
provisioning_profiles:
storage: Fastlane Match
auto_update: enabled
android:
type: Keystore
storage: GitHub Secrets (base64)
rotation: never (breaks updates)
backup:
locations:
- GitHub Secrets
- Secure offline storage (2 physical locations)
verification: quarterly
7.3. Network Security¶
Network Security:
ddos_protection:
provider: Cloudflare (Free tier)
features:
- rate_limiting
- challenge_page for suspicious traffic
- ip_reputation filtering
custom_rules:
- block_countries: [CN, RU] # No soportados aun
- rate_limit_api: 100_req/min per IP
- challenge_on_login: after 5 failures
waf:
provider: Cloudflare WAF
rulesets:
- OWASP_Core_Ruleset
- SQLi_protection
- XSS_protection
- Path_traversal_protection
custom_rules:
- block /admin from non-whitelisted IPs
- rate_limit /api/auth/* endpoints
api_security:
authentication:
method: JWT (Firebase tokens)
validation: every_request
rate_limiting:
global: 1000_req/min per IP
per_user:
free: 60_req/min
pro: 300_req/min
perfect: 600_req/min
cors:
allowed_origins:
- https://app.medtime.app
- capacitor://localhost # Mobile apps
allowed_methods: [GET, POST, PUT, DELETE, OPTIONS]
allowed_headers: [Authorization, Content-Type]
max_age: 86400
network_isolation:
database:
public_access: disabled
allowed_ips: [Supabase Edge Functions only]
ssl_mode: require
admin_access:
method: bastion_host
mfa: required
source_ips: whitelisted
7.4. Compliance Scanning¶
Compliance Scanning:
vulnerability_scanning:
dependencies:
tool: Snyk
frequency: on_every_commit
auto_fix: enabled for low/medium
block_pr: high/critical vulnerabilities
containers:
tool: Trivy
frequency: on_build
severity_threshold: high
code:
sast:
tool: SonarQube
frequency: on_pr
quality_gate:
- coverage >= 80%
- no_critical_issues
- no_blockers
secrets_detection:
tool: GitGuardian
frequency: on_commit
action: block_push if secrets found
compliance_checks:
hipaa:
checklist:
- encryption_at_rest: enabled
- encryption_in_transit: TLS 1.3
- audit_logs: 6_year_retention
- access_controls: RLS + RBAC
- backup: tested_quarterly
frequency: quarterly
auditor: external (annually)
lgpd:
checklist:
- data_minimization: verified
- consent_management: implemented
- data_portability: API available
- right_to_deletion: implemented
- data_breach_notification: procedure documented
frequency: quarterly
owasp:
tool: OWASP ZAP
frequency: weekly
scope: all_api_endpoints
report: security_team
8. Cost Management¶
8.1. Cost Breakdown¶
Estimated Monthly Costs:
# Startup phase (0-1000 users)
supabase:
plan: Pro
base: $25/month
database: $10 (Small instance)
storage: $5 (50GB)
bandwidth: $5 (100GB)
total: $45/month
firebase:
plan: Blaze (pay-as-you-go)
auth: $0 (< 50k MAU)
fcm: $0 (< 1M messages)
crashlytics: $0
total: $0/month
twilio:
sms: $0.01/msg * 500 msgs = $5
phone_number: $1
total: $6/month
other:
domain: $1/month
sentry: $0 (free tier)
github_actions: $0 (public repo)
total: $1/month
TOTAL_STARTUP: $52/month
# Growth phase (1k-10k users)
supabase:
database: $40 (Medium instance)
storage: $20 (200GB)
bandwidth: $20 (500GB)
total: $105/month
firebase:
auth: $30 (75k MAU)
fcm: $10 (2M messages)
total: $40/month
twilio:
sms: $0.01 * 5000 = $50
total: $51/month
TOTAL_GROWTH: $197/month
# Scale phase (10k-100k users)
supabase:
database: $200 (Large instance + read replicas)
storage: $100 (1TB)
bandwidth: $100 (2TB)
total: $425/month
firebase:
auth: $300 (150k MAU)
fcm: $50 (10M messages)
total: $350/month
twilio:
sms: $0.01 * 50000 = $500
total: $501/month
other:
cloudflare_pro: $20
sentry_team: $29
total: $50/month
TOTAL_SCALE: $1,326/month
8.2. Resource Optimization¶
Optimization Strategies:
database:
query_optimization:
- index slow queries (p99 > 1s)
- use connection pooling
- implement caching for catalogs
- archive old data (> 2 years) to cold storage
storage_optimization:
- compress blobs (gzip)
- deduplicate identical blobs
- lifecycle policies (move to glacier after 1 year)
instance_sizing:
strategy: right_size
review: monthly
metrics: cpu_usage, memory_usage, iops
bandwidth:
reduction:
- enable compression (gzip, brotli)
- implement CDN for static assets
- optimize API responses (only required fields)
- batch operations (sync in batches)
monitoring:
- track bandwidth per endpoint
- alert on spikes
- identify optimization opportunities
firebase:
fcm_optimization:
- use topics instead of individual tokens
- batch notifications
- reduce payload size
- avoid unnecessary notifications
auth_optimization:
- cache user sessions client-side
- use refresh tokens wisely
- minimize custom claims
twilio:
sms_optimization:
- use push notifications as primary
- SMS only for critical alerts
- implement user preferences (opt-out)
- batch messages when possible
8.3. Cost Alerts¶
Cost Alerts:
budget_thresholds:
forecasted:
threshold: 80% of monthly budget
action: notify finance team
actual:
threshold: 100% of monthly budget
action: investigate + optimize
anomaly:
threshold: 150% of daily average
action: immediate investigation
service_specific:
supabase:
database_size: alert at 80% of plan limit
bandwidth: alert at 80% of plan limit
firebase:
mau: alert at 80% of free tier
fcm_messages: alert at 80% of free tier
twilio:
sms_count: alert at budget threshold
cost_per_day: alert if > expected
optimization_triggers:
high_storage:
condition: storage_cost > $100/month
action: review archiving strategy
high_bandwidth:
condition: bandwidth_cost > $50/month
action: review API efficiency
high_sms:
condition: sms_cost > $200/month
action: review notification strategy
8.4. Scaling Strategy¶
Scaling Strategy:
database:
vertical_scaling:
trigger: cpu > 80% sustained for 1h
action: upgrade instance size
automation: manual (review first)
horizontal_scaling:
trigger: read_load > 70%
action: add read replica
automation: manual
max_replicas: 3
sharding:
trigger: database_size > 1TB
strategy: by user_id (hash)
implementation: v2 (future)
edge_functions:
auto_scaling:
enabled: true (Supabase managed)
min_instances: 1
max_instances: 10
scale_up_trigger: cpu > 70%
scale_down_trigger: cpu < 30% for 5m
mobile_apps:
# Apps escalan naturalmente (client-side)
considerations:
- monitor download size (keep < 50MB)
- optimize assets
- lazy load features
costs_by_scale:
0-1k_users: $52/month
1k-10k_users: $197/month
10k-100k_users: $1,326/month
100k-1M_users: $8,000/month (estimated)
revenue_targets:
- 1k users * $3/month (Pro) = $3,000/month
- 10k users * $3/month = $30,000/month
- Cost ratio target: < 20% of revenue
9. Infrastructure as Code¶
Infrastructure as Code:
philosophy: GitOps
tools:
- Supabase CLI (migrations, functions)
- GitHub Actions (CI/CD)
- Terraform (futuro, si multi-cloud)
repository_structure:
supabase/:
migrations/: Database schema changes
functions/: Edge Functions
seed.sql: Initial data (catalogs)
config.toml: Supabase config
.github/:
workflows/: CI/CD pipelines
actions/: Custom actions
ios/:
fastlane/: iOS automation
Podfile: Dependencies
android/:
gradle/: Build configs
fastlane/: Android automation
workflow:
1_local_development:
- supabase start (local instance)
- develop + test locally
- commit changes
2_ci_validation:
- GitHub Actions run tests
- Deploy to staging
3_manual_approval:
- Tech lead reviews
- Approve for production
4_production_deploy:
- GitHub Actions deploy
- Monitor for issues
- Rollback if needed
disaster_recovery:
# Todo esta en Git
recovery_steps:
1. Clone repository
2. supabase db push (restore schema)
3. Restore data from backup
4. supabase functions deploy (restore functions)
5. Verify health
estimated_recovery_time: 2_hours
Ejemplo de config Supabase:
# supabase/config.toml
[api]
enabled = true
port = 54321
schemas = ["public", "auth"]
extra_search_path = ["public", "extensions"]
max_rows = 1000
[db]
port = 54322
major_version = 15
[db.pooler]
enabled = true
port = 54329
pool_mode = "transaction"
default_pool_size = 20
max_client_conn = 100
[realtime]
enabled = false # No usado en MedTime v1
[storage]
enabled = true
file_size_limit = "5MB"
[auth]
enabled = false # Usamos Firebase Auth
site_url = "https://app.medtime.app"
[auth.external.google]
enabled = true
client_id = "env(GOOGLE_CLIENT_ID)"
secret = "env(GOOGLE_CLIENT_SECRET)"
[edge_functions]
enabled = true
10. Runbooks¶
10.1. Runbook Template¶
# Runbook: [Nombre del Procedimiento]
## Metadata
- **Severity**: P1 / P2 / P3 / P4
- **Component**: [Mobile App | Backend | Database]
- **Last Updated**: YYYY-MM-DD
- **Owner**: [Team/Person]
## Symptoms
- [Sintoma observable SIN PHI]
- [Metricas afectadas]
## Impact
- **Users Affected**: [Estimate]
- **Services Affected**: [List]
- **Business Impact**: [Description]
## Diagnosis Steps
1. Check dashboard: [URL]
2. Verify metrics: [Specific metrics]
3. Check error logs (NO PHI): [Location]
4. Test manually: [Steps]
## Resolution Steps
1. [Paso con comando especifico]
2. [Paso con comando especifico]
3. [Verificacion]
## IMPORTANTE: Zero-Knowledge
- NO acceder a contenido de blobs
- NO descifrar datos para diagnostico
- Logs solo de metadata
- Escalar a usuario si necesita ver sus datos
## Rollback
1. [Paso para rollback]
2. [Verificacion de rollback]
## Post-Incident
1. Create incident report
2. Update runbook if needed
3. Schedule post-mortem
## References
- Related docs: [Links]
- Escalation: [Contact info]
10.2. Common Runbooks¶
Common Runbooks:
RB-001: High API Error Rate
severity: P1
trigger: error_rate > 5% for 5m
owner: Backend Team
RB-002: Database Connection Pool Exhausted
severity: P1
trigger: db_connections_waiting > 10
owner: Backend Team
RB-003: Slow Database Queries
severity: P2
trigger: p99_query_time > 5s
owner: Backend Team
RB-004: Failed Backup
severity: P2
trigger: last_backup > 25h
owner: DevOps
RB-005: SSL Certificate Expiring
severity: P3
trigger: days_until_expiry < 30
owner: DevOps
RB-006: High Mobile Crash Rate
severity: P2
trigger: crash_rate > 2%
owner: Mobile Team
RB-007: Deployment Rollback
severity: P1
trigger: Manual or automated
owner: DevOps
RB-008: Firebase Auth Outage
severity: P1
trigger: auth_failure_rate > 50%
owner: Backend Team
location: docs/runbooks/
11. Referencias¶
| Documento | Proposito |
|---|---|
| 01-arquitectura-tecnica.md | Arquitectura general |
| 02-arquitectura-cliente-servidor.md | Arquitectura dual |
| 05-seguridad-servidor.md | Security requirements |
| 07-testing-strategy.md | CI/CD integration con testing |
| Supabase Docs | Database platform |
| Firebase Docs | Auth/Push platform |
| Fastlane Docs | Mobile CI/CD |
| GitHub Actions | CI/CD platform |
Documento generado por DevOpsDrone (MTS-DRN-OPS-001) / SpecQueen Technical Division - IT-07 "Infraestructura minimalista. El servidor es simple, el cliente es poderoso, el monitoreo es ciego."