Cloud Computing Intermediate to Advanced +200 XP

Cloud Concepts

Disaster Recovery (DR) Paradigms: RTO, RPO & High Availability Limits

Designing production applications in the cloud requires moving away from the assumption that hardware is infallible. Instead, we architect systems to survive localized outages, regional disasters, and system-wide failures using two core metrics:

Key Disaster Recovery Targets:
  • Recovery Time Objective (RTO): The maximum acceptable delay between service interruption and service restoration. (How long can your website be down?)
  • Recovery Point Objective (RPO): The maximum acceptable age of data that can be lost due to an outage. (How much data can you lose from your database?)
The Four DR Design Patterns (Low Cost to Low RTO/RPO):
  • Backup and Restore (RTO: Hours, RPO: 24h): Regular database dumps are shipped to S3/GCS. Cheap but slow to restore.
  • Pilot Light (RTO: Tens of mins, RPO: Mins): A minimal, inactive clone of the app core (like an idle DB read replica) runs in the recovery region. App nodes are spun up only during failover.
  • Warm Standby (RTO: Mins, RPO: Secs): A scaled-down but fully operational duplicate environment runs 24/7. Auto-scaling expands capacity instantly when DNS is redirected.
  • Multi-Site / Active-Active (RTO/RPO: Real-time): Complete production traffic is shared actively across two or more regions. If one region drops, DNS routes all users immediately to the surviving region.

Database & Workload Migrations: Homogeneous vs Heterogeneous Pipelines

Moving existing databases and file shares from on-premise data centers to the cloud is a core task for enterprise cloud engineers. Migrations are split into two major archetypes:

Homogeneous Migrations: The source database engine matches the target engine (e.g. migrating PostgreSQL on-premise to AWS RDS PostgreSQL). The schema structure is identical, making the transition simple and low-risk.

Heterogeneous Migrations: The source and target engines differ (e.g. migrating Oracle or Microsoft SQL Server to Amazon Aurora PostgreSQL). This requires a two-step approach: first, using a **Schema Conversion Tool (SCT)** to convert SQL schemas, stored procedures, and triggers, and second, utilizing a replication tool like **AWS Database Migration Service (DMS)** to mirror data and catch up deltas.

Offline Data Transfer (Snowball / Transfer Appliance): When migrating petabytes of data, copying over WAN connections could take months. Companies use hardware appliances (AWS Snowball or GCP Transfer Appliance) to load data locally, ship the physical box to the provider, and sync live deltas once the bulk import completes.

Interactive Pipeline: AWS DMS / GCP Transfer Appliance Migration Stages

Observe how enterprise data moves from local data centers to managed cloud instances with zero database downtime. First, the Schema Conversion Tool rewrites queries, then AWS DMS triggers a Full Data Load of active tables, catches up database deltas using transaction logs, and completes the cutover safely by switching client DNS endpoints.

Pipeline W: AWS DMS / GCP Transfer Migration Stages

SCT
Convert Schema
Rewrite Oracle to PG SQL
Load
DMS Full Load
Bulk copy existing rows
Sync
Delta CDC Catchup
Replicate live transactions
Cutover
DNS Endpoint Switch
Zero downtime DNS cut

Cloud Migration & Replication command reference

Here are the standard AWS DMS CLI commands to spin up replication instances, configure endpoints, and start replicating database workloads dynamically:

# Provision a Database Migration replication instance
aws dms create-replication-instance \
    --replication-instance-identifier prod-dms-instance \
    --replication-instance-class dms.t3.medium \
    --allocated-storage 50 \
    --vpc-security-group-ids sg-dms-security-rules

# Create source Oracle connection database endpoint
aws dms create-endpoint \
    --endpoint-identifier local-oracle-source \
    --endpoint-type source \
    --engine-name oracle \
    --username migration_user \
    --password "SuperSecretPass" \
    --server-name 192.168.12.50 \
    --port 1521

# Initialize the replication task mapping (Full Load + CDC capture)
aws dms create-replication-task \
    --replication-task-identifier prod-oracle-to-aurora \
    --source-endpoint-arn arn:aws:dms:us-east-1:1234:endpoint:oracle-source \
    --target-endpoint-arn arn:aws:dms:us-east-1:1234:endpoint:aurora-target \
    --replication-instance-arn arn:aws:dms:us-east-1:1234:rep:dms-instance \
    --migration-type full-load-and-cdc \
    --table-mappings '{"rules":[{"rule-type":"selection","rule-id":"1","rule-name":"1","object-locator":{"schema-name":"SALES","table-name":"%"},"rule-action":"include"}]}'