Tagged: AWS

Redis Over EFS for Caching In moodle cluster solution

If you are on this page, it means you are going with a scaling part for moodle. Now web server part is okay, the challenge comes with Shared file storage. That is also get resolved by using shared storage like EFS.

In case , if you want to read about horizontal Scaling for moodle cluster solution.

in case of AWS, EFS is good as shared storage for media data.

But if you are using EFS as filestore caching (required and default) , means you are using $CFG->cachedir at EFS location, this will make the system response slow.

Challenge with Memcache to use as application cache mode

So , we see that there are lot of posts for memcache. But the problem with Memcache is , this does not provide data guarantee and locking . For the application cache which are stored in $CFG->cachedir, it is base requirement. Memcache is good option for session store.

Just for reference : there are three type of cache modes in moodle ( application, session, static request [localcache])

Now we have choice for mongo db for application cache with new moodle version. Redis is more popular and have support for session store as well while mongo db mode is application only.

so the the choice for selection are , shared storage EFS or Redis.

I will go with redis, because of following testing results.

Test-Config

Load test Env : 2 aws t2.micro server 1 medium DB server, 1 ALB,  one window  server from another region used for testing.
load test Config: https://developerck.com/load-testing-on-moodle/

EFS for caching store

20 user, 30 seconds rampup time, 1 iteration

30 user, 30 seconds rampup time, 1 iteration

40 user, 30 seconds rampup time, 1 iteration

note : system was failed for this test. it is just an upper cap

Redis Cache for caching store (AWS Elasticache, micro server)

20 user, 30 seconds rampup time, 1 iteration

30 user, 30 seconds rampup time, 1 iteration
40 user, 30 seconds rampup time, 1 iteration

note : system was failed for this test. it is just an upper cap

Comparison

Disclaimer : Test are repeated with same configuration, just a change of Caching store. In both cases, server stats were also monitored and did on a virgin stage. Before putting load, cache were generated by manually following same steps. In spite of all these consideration, Caching store may not be the only reason for difference pattern in response time. There may be other factors as well, however, it is one of the factor.

by comparing above charts, one can see, there was a lot of improvement once i moved my application cache from EFS to REDIS .

EFSREDIS
20/30/12000-10000 ms1000-2200 ms
30/30/1mostly > 9000 msmostly < 6000 ms

If you are using only one web server, EBS volume results are 10 times better than Redis. It is just , if we have more than one application server and a shared storage, REDIS is performing much better than EFS for cachestore.

more about redis : https://aws.amazon.com/redis/

AWS Well-Architect Framework : Points

General Design Principles :

  • Stop guessing your capacity needs:
  • Test systems at production scale:
  • Automate to make architectural experimentation easier:
  • Allow for evolutionary architectures:
  • Drive architectures using data:
  • Improve through game days:

Well Architect Framework is all about 5 pillars.

  1. Operational Excellence
  2. Security
  3. Reliability
  4. Performance
  5. Cost Optimisation
courtesy: https://wa.aws.amazon.com/wat.map.en.html

Operational Excellence

The Operational Excellence pillar includes the ability to run and monitor systems to deliver business value and to continually improve supporting processes and Procedures.

Design Principles

  • Perform operations as code
  • Annotate documentation:
  • Make frequent, small, reversible changes
  • Refine operations procedures frequently
  • Anticipate failure
  • Learn from all operational failures

Definition :

  • Prepare (Aws Config)
    • 1: How do you determine what your priorities are?
    • 2: How do you design your workload so that you can understand its state?
    • 3: How do you reduce defects, ease remediation, and improve flow into production?
    • 4: How do you mitigate deployment risks?
    • 5: How do you know that you are ready to support a workload?
  • Operate (CW)
    • 6: How do you understand the health of your workload?
    • 7: How do you understand the health of your operations?
    • 8: How do you manage workload and operations events?
  • Evolve (Amazon Elasticsearch Service)
    • 9: How do you evolve operations?

Security

The ability to protect information, system, assets and migration strategies.

Design Principles

  • Implement a strong identity foundation
  • Enable trace-ability:
  • Apply security at all layers [, edge network, VPC, subnet, load balancer, every instance,operating system, and application]
  • Automate security best practices
  • Protect data in transit and at rest
  • Keep people away from data:
  • Prepare for security events:

Definition :

  • Identity and Access Management (IAM)
    • 1: How do you manage credentials and authentication?
    • 2: How do you control human access?
    • 3: How do you control programmatic access?
  • Detective Controls (AWS CloudTrail,AWS Config)
    • 4: How do you detect and investigate security events?
    • 5: How do you defend against emerging security threats?
  • Infrastructure Protection (Amazon VPC, WAF, CF, ELB)
    • 6: How do you protect your networks?
    • 7: How do you protect your compute resources?
  • Data Protection (KMS, SSE)
    • 8: How do you classify your data?
    • 9: How do you protect your data at rest?
    • 10: How do you protect your data in transit?
  • Incident Response (CW-events, Lambda, CF for create env)
    •  11: How do you respond to an incident?

Reliability

The Reliability pillar includes the ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.

Design Principles

  • Test recovery procedures:
  • Automatically recover from failure:
  • Scale horizontally to increase aggregate system availability:
  • Stop guessing capacity:
  • Manage change in automation:

Definition :

  • Foundations [IAM, VPC]
    • 1: How do you manage service limits?
    • 2: How do you manage your network topology?
  •  Change Management [Config, cloud trial, CW, auto scaling]
    • 3: How does your system adapt to changes in demand?
    • 4: How do you monitor your resources?
    • 5: How do you implement change?
  • Failure Management [CF, durabile services : s3, galicer, KMS]
    • 6: How do you back up data?
    • 7: How does your system withstand component failures?
    • 8: How do you test resilience?
    • 9: How do you plan for disaster recovery?

Performance Efficiency

The ability to use computing resource efficiently to meet system requirement and to maintain that.

Design Principles

  • Democratize advanced technologies:
  • Go global in minutes:
  • Use serverless architectures: 
  • Experiment more often:
  • Mechanical sympathy:

Definition :

  • Selection
    • 1: How do you select the best performing architecture?
    • 2: How do you select your compute solution?
    • 3: How do you select your storage solution?
    • 4: How do you select your database solution?
    • 5: How do you configure your networking solution?
  • Review
    • 6: How do you evolve your workload to take advantage of new releases?
  • Monitoring
    • 7: How do you monitor your resources to ensure they are performing as expected?
  • Tradeoff
    • 8: How do you use tradeoffs to improve performance?

Selection

Compute: Auto Scaling is key to ensuring that you have enough instances to meet demand and maintain responsiveness.

Storage: EBS, S3

Database: Amazon RDS provides a wide range of database features (such as PIOPS and read replicas) that allow you to optimize for your use case. Amazon DynamoDB provides single-digit millisecond latency at any scale.

Network: Amazon Route 53 provides latency-based routing. Amazon VPC endpoints and AWS Direct Connect can reduce network distance or jitter.

• Review:

The AWS Blog and the What’s New section on the AWS website are resources for learning about newly launched features and services.

Monitoring:

Amazon CloudWatch provides metrics, alarms, and notifications that you can integrate with your existing monitoring solution, and that you can use with AWS Lambda to trigger actions.

Tradeoffs:

Amazon ElastiCache, Amazon CloudFront, and AWS Snowball are services that allow you to improve performance

Cost Optimisation

The ability to run systems to deliver business value at the lowest price point.

Design Principles

  • Adopt a consumption model:
  • Measure overall efficiency:
  • Stop spending money on data center operations
  • Analyze and attribute expenditure:
  • Use managed and application level services to reduce cost of ownership:

Definition :

  • Expenditure Awareness
    • 1: How do you govern usage?
    • 2: How do you monitor usage and cost?
    • 3: How do you decommission resources?
  • Cost-Effective Resources
    • 4: How do you evaluate cost when you select services?
    • 5: How do you meet cost targets when you select resource type and size?
    • 6: How do you use pricing models to reduce cost?
    • 7: How do you plan for data transfer charges?
  • Matching supply and demand
    • 8: How do you match supply of resources with demand?
  • Optimizing Over Time
    • 9: How do you evaluate new services?

More about this topic : https://wa.aws.amazon.com/