GWU Online Engineering Course Takeway

Cloud Application Architecture

Architecting the cloud : design decisions for cloud computing service models (SaaS, PaaS, and IaaS)

Objectives

1. Understand the differences between traditional computing and cloud computing
2. Learn about different cloud service models and when to choose which one
3. Learn the importance of monitoring and disaster recovery in cloud architectures
4. See how cloud computing changes your perspective of application scaling
5. Determine whether moving traditional applications to the cloud makes business sense

Takeaways

Managing SLA:
- Published SLAs for enterprise are more than just getting a refund.
- Consider the trade-offs of going on your own.
- Architect across Zones / Regions.
- Auditing/compliance: it doesn’t matter where the data resides - the threats are the same.
- The success rate of penetrations from outside threats is much higher in enterprise data centers than in CSP environments.

Regulation in cloud:
- CSP and company building the applications take the responsibility.
- Misunderstanding regulations - none of regulation declare where the data can and cannot reside.
- Government seizure of data can be mitigated with encryption.

Security design:
- How much security you need? Target industry-healthcare, financial is the high.
- Customer expectation-end user can affect the security hosting.
- Data sensitivity- social media low sensitivity, companies processing medical, payment are required encryption at rest and transaction.
- Risk tolerance - startup and enterprise have different standards.
- Product maturity - product evolves, does not need strong security at the beginning.
- Transmission boundaries refer to what endpoints the data travels and form.

Responsibility for each model:
- IaaS: CSP: data centers, server-related hardware and peripherals, network, infrastructure, storage devices, along with security with all of them. Users: Application stack - OS, Programming language, App Svr, middleware, database, monitoring.
- PaaS: CSP: infra and application stack (as above). Service customer-application, user.
- SaaS: CSP - Infra, application stack, application, service customer - User.

DR:
- RTO (about time) - the time within which the business requires that the service is back up and running.
- RPO (about data) - the amount of time in which data loss can be tolerated.
- Value of mitigating disaster - a measurement of how much it is worth to the company to mitigate disaster situations.
- IaaS DR: build redundancy across many zones, build redundancy across regions, hybrid cloud, leverage multiple public cloud vendors.
- Recovering from DR in data centers: classic backup and restore method, redundant data centers - active-passive cold/warm, redundant data centers - active(DB)-active(web apps) hot disaster recovery.
- PaaS DR: vendor responsible for application stack+ infra, customer responsible for applications built top on that.
- SaaS DR: software escrow, multiple SaaS vendors, data extracts.

- Identify project constraints such as time, seasonality, management, and customer needs.
- Assess organizational readiness for cloud migration and understand Service Level Agreements (SLAs) with composite SLAs for web applications, SQL DB or Queue, and choose the right service model based on technical, financial, strategic, and organizational factors and risk considerations.
- SaaS is the most mature service model and provides features such as security updates and patches, infrastructure, mobile compatibility, compatibility across major browsers, frequently updated features, and databases.
- PaaS is the least mature model and has throttling limits to manage performance, reliability, and scalability, and workarounds include error trapping and breaking work into smaller chunks.
- IaaS provides granular control, is cost-effective for large data volumes, and mitigates downtime.
- RESTful APIs use a client-server architecture, uniform interface, statelessness, layered system, cacheability, and code on demand.
- ACID and BASE transactions have differences in availability, consistency, isolation, and durability.
- Migration of legacy apps poses challenges for cloud-based architecture.
- Data considerations include physical characteristics such as data location, ownership, performance requirements, volatility, regulatory requirements, transaction boundaries, retention period, and single vs. multi-tenancy.
- SQL and NoSQL databases have differences in schema, scalability, and ACID support.

Managing SLA:
- Review published SLAs carefully as they are more than just getting a refund.
- Consider the trade-offs of going on your own.
- Architect across zones/regions.

Auditing/Compliance:
- Threats to data are the same regardless of where it resides.
- Outside threats have a higher success rate in enterprise data centers than in CSP environments, where security is a core competency.
- Focus on real issues and constraints around auditing, laws and compliance, customer requirements, and risks.

Regulation in Cloud:
- CSP and the company building the applications take responsibility.
- Misunderstanding regulations- none of them declare where the data can and cannot reside.
- Government seizure of data can be mitigated with encryption.

Security Design:
- Target industries like healthcare and finance require higher security.
- End user expectations can affect security hosting.
- Data sensitivity, risk tolerance, and product maturity affect security needs.
- Transmission boundaries refer to what endpoints the data travels and in what form.
- Responsibility for security differs for each model.

DR:
- RTO and RPO are critical factors.
- Value of mitigating disaster depends on customers and criticality of the service.
- IaaS DR can be designed by building redundancy across many zones/regions and leveraging multiple public cloud vendors.
- Recovering from DR in data centers can be done through classic backup and restore, redundant data centers (active-passive cold/warm, active-active), or PaaS DR.
- SaaS DR can be done through software escrow, multiple SaaS vendors, or data extracts.

- DevOps is a culture shift that emphasizes collaboration between development, operations, and quality assurance in software development and release.
- Traditional software delivery methods suffer from lack of communication, technical debt, and shortcuts that affect software quality.
- The DevOps CAMS principles focus on culture, automation, measurement, and sharing.
- The goal of DevOps is to build systems with an understanding that the needs of development, operations, and quality assurance are interrelated and part of a collaborative process.
- DevOps principles include understanding the flow of work, seeking to increase flow, not passing defects downstream, and achieving a profound understanding of the system.
- DevOps mindsets include automating infrastructure, automating deployments, designing feature flags, measuring, monitoring and experimenting, and continuous integration and continuous delivery.
- DevOps is a grassroots cultural movement driven mostly by operations practitioners and lean manufacturing principles.
- DevOps requires the right culture (people), processes like CI/CD, and technology to deliver on agility.
- Architecture challenges and benefits for DevOps include complexity, asynchronous messaging and eventual consistency, inter-service communication, and manageability.
- N-tier style architecture divides an application into logical layers and physical tiers and is a natural fit for migrating existing workloads to the cloud from data centers.
- Web-Queue Worker Style architecture separates the web front end from the worker using asynchronous messaging and is relatively simple to understand, deploy, and manage.
- Microservices style architecture consists of small, autonomous services that are independently deployed and scaled, but requires careful design, governance, and management due to complexity and data integrity challenges.

- CQRS Style: It separates read operations from write operations. It is useful where multiple users access the same data, especially when the read and write workloads are asymmetrical. It benefits from independently scaling read and write workloads, optimized data schemas, security, separation of concerns, and simpler queries. However, it also poses challenges such as complexity (if they include event processing pattern), messaging, and eventual consistency.

- Event Driven Style: It involves event producers that generate a stream of events and event consumers that listen for the events. It is useful when multiple subsystems must process the same events, real-time processing with minimum time lag, complex event processing, and high volume and velocity of data. It benefits from decoupling producers and consumers, no point-to-point integrations, highly scalable and distributed subsystems, and independent views of the event stream. However, it also poses challenges such as guaranteed delivery and processing events in order or exactly once.

Big Data Style: It handles the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. It is useful for batch processing of big data sources at rest, real-time processing of big data in motion, interactive exploration of big data, and predictive analytics and machine learning. It benefits from technology choices, performance through parallelism, and cloud infrastructure. However, it also poses challenges such as complexity, specialized skill sets, technology maturity, and security.

Big Compute Style: It handles workloads that require many cores. It is useful for intensive operations such as simulation and number crunching. It benefits from high performance with embarrassingly parallel processing and specialized high-performance hardware. However, it also poses challenges such as provisioning thousands of cores in a timely manner and managing the volume of number crunching

Design Principles:
-Design for self-healing: The principles include retry failed operations, protect failing remote services, isolate critical resources, perform load leveling, fail over, compensate failed transactions, checkpoint long-running transactions, degrade gracefully, throttle clients, block bad actors, use leader election, test with fault injection, and embrace chaos engineering.

- Make all things redundant: The principles include considering business requirements, placing VMs behind a load balancer, replicating databases, enabling geo-replication, partitioning for availability, deploying to more than one region, synchronizing front and backend failover, using automatic failover but manual failback, and including redundancy for Traffic Manager.

- Minimize Coordination: The principles include using a scheduler, state store, supervisor, agent, and remote server, as well as embracing eventual consistency.

-Design to Scale Out: Partition data within a single database, design idempotent operations, use asynchronous parallel processing, use optimistic concurrency, consider MapReduce or other parallel, distributed algorithms, and use leader election for coordination.

-Partition Around Limits: Use partitioning to work around database, network, and compute limits. Partition different parts of the application, design the partition key to avoid hot spots, partition around CSP subscription and service limits, and partition at different levels.

- Design for Operations: Design an application so that the operations team has the tools they need. Make all things observable, instrument for monitoring, instrument for root cause analysis, use distributed tracing, standardize logs and metrics, automate management tasks, and treat configuration as code.

- Use Managed Services: Use managed services to reduce operational overhead.

-Use the Best Data Store for the Job: Don't use a relational database for everything. Consider the type of data and prefer availability over (strong) consistency.

-Design for Evolution: Enforce high cohesion and loose coupling, encapsulate domain knowledge, use asynchronous messaging, don't build domain knowledge into a gateway, expose open interfaces, and abstract infrastructure away from domain logic.

- Design for the Needs of the Business: Define business objectives and document service level agreements (SLA) and service level objectives (SLO).

DR (Disaster Recovery) in cloud computing.

-IaaS DR: Involves building redundancy across multiple zones and regions in a hybrid cloud environment. Leveraging multiple public cloud vendors can also be beneficial.

- Recovering from DR in datacenters:
a. Classic Backup and Restore Method: Daily full backups and incremental backups are created and stored to a disk service provided by the cloud vendor. These backups are copied to secondary data centers and third-party vendors. In case of an issue, the last good backup can be restored, and incremental backups can be applied. It is the cheapest method, but there are no redundant servers running, and the RTO (Recovery Time Objective) is long.
b. Redundant Data Centers - Active-Passive Cold: In this model, the secondary data center is prepared to take over the duties from the primary data center if the primary is in a disaster state. When a disaster is declared, the team runs automated scripts that create database servers and restore the latest backups. It is cost-effective since the cold servers do not cost anything, but it is only suitable for high RTO cases.
c. Redundant Data Centers - Active-Passive Warm: This method runs the database server hot, meaning that it is always on and always in sync with the master data center. It is more expensive but has a low RPO (Recovery Point Objective).
d. Redundant Data Centers - Active(DB)-Active(web apps) Hot Disaster Recovery: In this model, all the compute resources are always used, and a complete failure of one data center may not cause any downtime. It uses a DB master-slave model, and it is the most expensive and redundant method.

- PaaS DR: Involves leveraging the vendor's responsibility for the application stack and infrastructure, while the customer is responsible for applications built on top of that. PaaS abstracts the underlying infrastructure and stack, including scaling the database, failover, and patching. The downside is that the customer is at the mercy of PaaS providers during a disaster, making it hard to control the underlying infrastructure. Private PaaS provides an alternative.

-SaaS DR: Involves software escrow, multiple SaaS vendors, and data extracts. Each model has its responsibility, with the CSP (Cloud Service Provider) providing data centers, server-related hardware and peripherals, network infrastructure, storage devices, and security. Users are responsible for the application stack, programming language, app server, middleware, database, monitoring, authentication, authorization, user interface, transactions, reports, dashboard, and service customer login, registration, and administration.

Challenges of migration legacy apps: Legacy systems are based on ACID, while cloud-based architecture requires partition tolerance. If a legacy app only runs in a single tenant, it is not advantageous in the cloud.

-Architecture principles: Understanding the problem we are trying to solve, business goals and drivers, and the users of the system, both internal and external, will uncover functional and non-functional requirements.

- Choosing the right service model considerations: Technical, financial, strategic, organizational, and risk.

- Data considerations: Physical characteristics, performance requirements, volatility, regulatory requirements, transaction boundaries, retention period, and multi or single tenants.

- SQL vs. NoSQL: SQL involves tables with rows and columns, rigid schemas, vertical scale-up, and ACID support. NoSQL involves flexible schemas, horizontal scale-out, BASE support, and is suitable for dynamic datasets.