AWS security best practices for rapid and scalable web application infrastructure

Adoption of any product is dependent on it’s user-friendly behaviour and use-fullness which internally depends on rapidness, scalability and it’s security. To make sure end customers (for B2B companies) can blindly trust these qualities and to get SOC 2compliant, we started process of majoring quality of security and scalability using protocols or mandates defined by AWS and doing Penetration testing on our AWS infrastructure. In this blog we are going to talk about how we revised our infrastructure to follow principles suggested by AWS for tightening security of web application.

Our initial infrastructure

Bare minimum infrastructure was in place i.e private and public subnets in a VPC(private network) having EC2 instances viz frontend, backend along with RDS(postgresql database) in this infrastructure. We had used nginx as reverse proxy which will internally re-route the traffic to different ports.Our frontend (EC2 instance) was hosted in public subnet so that anyone in world can connect to it through nginx (nginx was part of same EC2 instance) and one bastion server (jump host) to connect to our private infrastructure (for development use). Our backend (another EC2 instance) was part of private subnet along with RDS which were connected to internet via NAT gateway and were not accessible from outside world.Till this point everything was fine and working well, however, now we wanted to test our infrastructure using SecurityHub to validate if we are following guidelines or protocols defined by AWS correctly or what amendments we should make.Attaching our initial architecture diagram for a clear understanding.

‍

Issues we faced when we enabled AWS SecurityHub

We can say with guarantee that enabling SecurityHub to follow AWS security best practices for rapid and scalable infrastructure was the best decision ever. We were intrigued seeing kind of detailing AWS SecurityHub provides after enabling it. First of all, it doesn’t only provides the issue list but it does it so beautifully that it segregates it with severity and also provides remediations. There were quite a few critical issues and then a few with low or medium severity, we would like to note them here for you.

Don’t create any EC2 instance in public subnet (other than bastion server) i.e publicIP should be disabled for all instances
Don’t open ports other than 80 (http) and 443 (https) for everyone in security groups
SecurityHub even complains about allowing usage of any port publicly in NetworkACLs
Encryption of data at rest and in-transit

Success rate was only around 40% with SecurityHub enabled and bells were ringing to update infrastructure without wasting any other second.

Our revised approach

There were certain questions in-front of us when we saw above critical issues that incase we apply these remediations how do we make our infrastructure work, because:

Don’t create any EC2 instance in public subnet (other than bastion server) i.e publicIP should be disabled for all instancesIt means

Shifting frontend (EC2 instance) in private subnet then how world (end-user)can connect to frontend
If we use load balancer in public subnet while making this amendment do we really need nginx?

Don’t open ports other than 80 (http) and 443 (https) for everyone in security groups. We all know

Port 80 in 21st century is anyway not used due to SSL/TLS security mechanisms and it’s recommended to use 443 instead along with reliable certificate.
Even any other port which is used for backend on same or different instance should not be opened to all, rather traffic should be re-routed from 443 to that port (say 3000 or 5000 or 8000 or 8080) through load balancer rules

SecurityHub even complains about allowing usage of any port publicly in NetworkACLs. This was surprising as well as interesting to understand
Network ACLs are rule based approach to allow or deny specific traffic to subnets. SecurityHub asks to deny all the traffic to any other port other than 80(http) or 443 (https), even for something like ssh or any other important backend(proxy server) port. If at all you need to enable anything it should be managed with rules (which are given precedence by their number where lowest are given more precedence than the highest number)
Encryption of data at rest and in-transit

We used AWS KMS (key management service) mechanism for ESB (EC2volume) encryption, here please note - we can’t apply encryption if instance is already running. In our case we had to take AMI and then re-launch instance after enabling encryption
Same mechanism was used to encrypt data at rest i.e at S3 and at RDS

Here attaching our revised AWS architecture diagram for a better understanding on our revised approach.

Success percentage of revised infrastructure

So, we followed below approach:Having bastion server and load balancer (connected to internet gateway) in public subnet. Remember bastion server is only for developer use and thus only port 3389(RDP) and port 22 (SSH) will be or should be opened for specific ip address(es). IfAWS security best practices for rapid and scalable web application infrastructure 5developers don’t have static public address(es) we recommend to use VPN which will make sure no frequent modifications are required at this stage to access jump server.

Having bastion server and load balancer (connected to internet gateway) in public subnet. Remember bastion server is only for developer use and thus only port 3389(RDP) and port 22 (SSH) will be or should be opened for specific ip address(es). IfAWS security best practices for rapid and scalable web application infrastructure 5developers don’t have static public address(es) we recommend to use VPN which will make sure no frequent modifications are required at this stage to access jump server.
Load balancer will re-route traffic to frontend (via 443, https) without worrying about if the applications are running on same or different instance(s)

- and frontend will communicate with backend instance with private ip address and port as they are in same private network

- When we use 443 port instead of 80, we do need to setup certificate and we were using name cheap as our domain provider.

It was easy to add entry in name cheap with “A Record” of DNS name of load balancer to make it work with https and adding certificate, key and chained file was important

We were happy to see our success percentage reached to 83 from 40 (more than double) and at a satisfactory level. To reach at better percentage say (around 95-100)there were a few more “low” severity guidelines which we needed to follow like:

RDS was already in private subnet however we were asked to not use default database or username, password.
Setting up CloudTrails for each service to keep monitoring and recording logs across account.
Setting up thresholds for CPU utilisation of EC2 instances and IOPS of database to understand failures, if there’s any.

- Taking backups of database in timely manner

Concluding Notes

Often we wonder if it is important to enable or use AWS services as cost associated to it bother us. It was very surprising to see cost associated with this service in our caseFor our use-case, the real usage estimated monthly cost which we get from the AWS management console was only $18.We calculated the cost for even medium sized company where for example if there are2 regions and 10 accounts in each region and if AWS Security Hub performs 5000security checks per account/ regions/ month and aggregates 10,000 finding ingestions per account/regions/month then the cost would be around only ~$100.And thus we would say SOC2 compliant or not henceforth we will keep usingSecurityHub for all our small to heavy AWS specific applications or services used to consistently delivery secure infrastructure.

Want to receive update about our upcoming podcast?

Latest Articles

View All Articles

Implementing custom windowing and triggering mechanisms in Apache Flink for advanced event aggregation

Dive into advanced Apache Flink stream processing with this comprehensive guide to custom windowing and triggering mechanisms. Learn how to implement volume-based windows, pattern-based triggers, and dynamic session windows that adapt to user behavior. The article provides practical Java code examples, performance optimization tips, and real-world implementation strategies for complex event processing scenarios beyond Flink's built-in capabilities.

15

min read

Implementing feature flags for controlled rollouts and experimentation in production

Discover how feature flags can revolutionize your software deployment strategy in this comprehensive guide. Learn to implement everything from basic toggles to sophisticated experimentation platforms with practical code examples in Java, JavaScript, and Node.js. The post covers essential implementation patterns, best practices for flag management, and real-world architectures that have helped companies like Spotify reduce deployment risks by 80%. Whether you're looking to enable controlled rollouts, A/B testing, or zero-downtime migrations, this guide provides the technical foundation you need to build robust feature flagging systems.

12

min read

Implementing incremental data processing using Databricks Delta Lake's change data feed

Discover how to implement efficient incremental data processing with Databricks Delta Lake's Change Data Feed. This comprehensive guide walks through enabling CDF, reading change data, and building robust processing pipelines that only handle modified data. Learn advanced patterns for schema evolution, large data volumes, and exactly-once processing, plus real-world applications including real-time analytics dashboards and data quality monitoring. Perfect for data engineers looking to optimize resource usage and processing time.

12

min read