## 1. Security & Identity Management
Q1: A developer accidentally deletes an S3 bucket in the dev account. How would you prevent such incidents across all accounts without impacting productivity?
Testing: Knowledge of SCPs, IAM best practices, and least privilege.
Answer:
Use AWS Organizations → Service Control Policies (SCP) to restrict destructive actions (e.g., `s3:DeleteBucket`) in non-prod accounts.
Enable MFA Delete for critical buckets.
Configure CloudTrail + EventBridge alerts for bucket deletion events.
Q2: How do you implement least privilege for a team of developers?
Testing: IAM role & policy design.
Answer:
Assign roles with service-specific permissions rather than using root/admin.
Use managed policies but customize to remove unused actions.
Monitor with IAM Access Advisor to tighten permissions over time.
---
## 2. Networking
Q1: Your application in VPC A needs to connect to a database in VPC B in the same region without using public internet. How would you do it?
Testing: VPC connectivity, security best practices.
Answer:
Use VPC Peering if it’s a simple one-to-one link.
Use PrivateLink if only specific services need access.
Avoid routing through IGW or NAT for security.
Q2: You need to connect 10 VPCs across 3 regions. How would you design the network to minimize complexity?
Testing: Multi-VPC & multi-region architecture.
Answer:
Use AWS Transit Gateway to centralize connectivity.
Enable inter-region peering between TGWs.
Configure route tables carefully per environment to control traffic flow.
---
## 3. Compute Layer
Q1: During a flash sale, EC2 instances behind an ALB reach 90% CPU. What’s your immediate plan?
Testing: Auto Scaling & load management.
Answer:
Check Auto Scaling policies and trigger scale-out.
Temporarily add spot or on-demand instances.
Implement caching (CloudFront/ElastiCache) for future spikes.
Q2: How do you ensure zero downtime during EC2 app updates?
Testing: Deployment strategies.
Answer:
Use Auto Scaling group rolling updates.
Implement blue/green deployment via CodeDeploy.
Gradually switch traffic with weighted ALB target groups.
---
## 4. Database Layer
Q1: RDS MySQL is experiencing slow queries at peak traffic. How do you troubleshoot and optimize?
Testing: Performance tuning.
Answer:
Check CloudWatch metrics for CPU, memory, IOPS.
Enable Performance Insights to identify slow queries.
Add indexes, use Read Replicas for read-heavy workloads.
Consider Provisioned IOPS if storage is bottleneck.
Q2: How do you migrate a large production database from on-prem to AWS with minimal downtime?
Testing: Database migration strategy.
Answer:
Use AWS DMS with Change Data Capture (CDC).
Pre-create schema on target RDS/Aurora.
Perform full load first, then replicate ongoing changes until cutover.
---
## 5. Automation & DevOps
Q1: Manual deployments lead to wrong versions being deployed. How would you automate?
Testing: CI/CD implementation.
Answer:
Use CodePipeline + CodeDeploy or integrate Jenkins/GitHub Actions.
Bake versioned AMIs using Packer.
Automate rollback strategies in deployment configs.
Q2: How would you automate IAM key rotation?
Testing: Security automation.
Answer:
Store keys in AWS Secrets Manager.
Rotate keys automatically using Lambda triggered via EventBridge.
Send notifications on rotation completion.
---
## 6. Serverless
Q1: Your Lambda function has cold start issues affecting response time. How do you mitigate?
Testing: Serverless performance optimization.
Answer:
Use Provisioned Concurrency.
Keep function code and dependencies small.
Use VPC endpoints if accessing VPC resources to reduce latency.
Q2: How would you secure a serverless REST API using API Gateway + Lambda?
Testing: Serverless security best practices.
Answer:
Enable IAM authorization or Cognito user pools.
Enable WAF to filter malicious requests.
Use API keys and throttling to prevent abuse.
---
## 7. Monitoring & Logging
Q1: How do you monitor Lambda functions for performance and errors?
Testing: CloudWatch and observability.
Answer:
Enable CloudWatch Logs for function output.
Use CloudWatch Metrics & Alarms for errors and duration.
Use X-Ray for distributed tracing.
Q2: How do you set up centralized logging for multiple AWS accounts?
Testing: Cross-account logging & monitoring.
Answer:
Aggregate logs to central S3 bucket via CloudWatch Logs subscription or Kinesis.
Enable CloudTrail multi-account logging.
Integrate with SIEM tools like Splunk.
## 8. Storage
Q1: Your S3 bucket stores critical backups. How do you ensure data durability, compliance, and disaster recovery?
Testing: Storage architecture & DR planning.
Answer:
Enable versioning and MFA delete.
Use S3 Cross-Region Replication (CRR) for DR.
Apply server-side encryption (SSE-KMS).
Set lifecycle policies to move data to Glacier/Deep Archive for compliance.
Q2: EBS volume performance is low for a high-I/O database. What’s your solution?
Testing: Block storage optimization.
Answer:
Use Provisioned IOPS SSD (io2/io1).
Striping multiple volumes with RAID 0 if necessary.
Enable EBS optimization on EC2.
Q3: When would you use EFS over S3?
Answer:
For shared file system access by multiple EC2 instances.
When NFS file semantics are required.
For low-latency file storage rather than object storage.
---
## 9. Databases
Q1: Your read-heavy RDS MySQL instance is under load. How do you scale reads efficiently?
Answer:
Implement Read Replicas.
Use Aurora Global Database for multi-region reads.
Offload caching with ElastiCache.
Q2: How would you optimize DynamoDB for high traffic spikes?
Answer:
Enable on-demand mode or auto-scaling read/write capacity.
Use partition keys with high cardinality.
Enable DAX (DynamoDB Accelerator) for caching.
Q3: What is the difference between Multi-AZ and Read Replica in RDS?
Answer:
Multi-AZ: High availability and failover, synchronous replication.
Read Replica: Scale reads, asynchronous replication.
---
## 10. DevOps Integration
Q1: How do you implement zero-downtime deployments with AWS CodeDeploy?
Answer:
Use Blue/Green deployment to switch traffic to the new version gradually.
Validate using health checks and rollback on failure.
Integrate ALB target group weighting for incremental traffic shift.
Q2: How would you automate AMI creation for multiple environments?
Answer:
Use Packer templates to build AMIs with required packages/config.
Automate builds with CodePipeline or Lambda triggers.
Version AMIs for dev/test/prod environments.
Q3: How do you integrate CI/CD with containerized apps in ECS/EKS?
Answer:
Build Docker images using CodeBuild or Jenkins.
Push to ECR.
Deploy to ECS/Fargate or EKS using CodePipeline or ArgoCD.
---
## 11. Migration & Hybrid
Q1: You need to migrate petabytes of on-premises data to AWS quickly. What options do you use?
Answer:
AWS Snowball/Snowmobile for large-scale offline transfer.
Direct Connect for high-bandwidth online transfer.
Use AWS DataSync for incremental sync.
Q2: How do you design a hybrid cloud setup for on-prem apps using AWS?
Answer:
Establish VPN or Direct Connect for secure connectivity.
Use VPC Peering/Transit Gateway for centralized routing.
Extend Active Directory via AWS Managed AD for authentication.
---
## 12. Cost Optimization
Q1: Your AWS monthly bill is high. How would you identify and optimize costs?
Answer:
Use AWS Cost Explorer & Trusted Advisor for resource recommendations.
Analyze underutilized EC2 instances and EBS volumes.
Switch to Reserved Instances or Savings Plans for predictable workloads.
Use Spot Instances for non-critical workloads.
Q2: How would you reduce costs in a multi-region deployment?
Answer:
Review cross-region replication usage; replicate only necessary data.
Optimize data transfer costs by using CloudFront for caching.
Right-size instances and remove idle resources.
Q3: How do you manage cost in serverless applications?
Answer:
Monitor Lambda invocation metrics.
Optimize memory allocation and execution time.
Use DynamoDB on-demand mode for unpredictable workloads.
---
## 13. Advanced Architecture / High Availability
Q1: Design a multi-region highly available web application. What’s your approach?
Answer:
Deploy ALB + EC2/ECS in multiple AZs per region.
Use Route 53 latency-based routing across regions.
Replicate databases using Aurora Global DB or DynamoDB global tables.
Use CloudFront for global content caching.
Q2: How do you design for Disaster Recovery (DR) in AWS?
Answer:
Backup & restore: Snapshots, S3/Glacier backups.
Pilot light: Minimal resources in DR region, scale when needed.
Warm standby: Scaled-down environment running continuously.
Multi-site: Fully active-active in multiple regions.
Q3: Define RTO and RPO.
Answer:
RTO (Recovery Time Objective): Max tolerable downtime.
RPO (Recovery Point Objective): Max tolerable data loss.
Choose DR strategy based on business requirements.