Everybody believes that good and cheap infrastructure is a myth and doesn’t work in real life. Autoscaling, infrastructure as code, Kubernetes, monitoring and logging — this all sounds very expensive. But what about a startup that can’t afford expensive infrastructure?
In this article, we will consider two examples of how to build meaningful infrastructure on a limited budget without limiting your business and not making bad decisions.
When to Build Infrastructure for a Startup
Often, the first version of a product is ready, but there is nowhere to launch it. There is simply no infrastructure or it is in a bad condition. Unfortunately, this means a startup is wasting time while competitors continue building.
Infrastructure is the services, processes, and sites for running a project. In fact, it is the whole bundling of the product along with the computing power. These are servers, development and rollout processes, monitoring services for logs, and a high-level architectural vision.
There can be a huge number of things that really need to be taken care of before the public release.
- if traffic comes,
- if you have enough users,
- if someone wants to buy your startup,
- if users don’t come and your startup will turn out to be a dead product.
Several times I witnessed when a startup was sold, and it had grown so interwoven into its founders that the process of changing all accesses and accounts dragged on for months.
What can also happen is when the product grows too much into the company’s existing product portfolio, and it is impossible to cut it out for months. In my experience, there were several trips to the offices to those who would buy applications to transfer knowledge and learn all the details of the system. It is expensive and time-consuming, so it is better to lay a solid and clean foundation at the very beginning.
In the worst case, you will reuse the infrastructure work to launch a new startup. In the best case, you will get a scalable managed platform that will be easy to maintain, without limiting your business nor causing problems.
Let’s say the hypothesis has already been confirmed and passed the validation stage, then you decide to implement it to get a final product so you change it several times, get to the top on Product Hunt, and …the end.
The only server the application was running on crashed because of overload. And then, everything starts to become blurry: a letter to the provider, endless waiting time, reboot, and again users come and again everything is bad. Very disappointing.
To avoid that, you need to think about scaling the application way before the public release. Another thing you can do is to perform load testing, determine the maximum load and rent more servers than necessary. This option is more expensive because you have to pay for it even if the capacity will be having downtimes when not used.
A startup needs fast feedback loops, fast development and frequent releases.
Ideally, product and infrastructure development should run parallel: building and deployment processes will speed up the movement, as a result, the testing phase will be much faster — this will allow the startup to move faster.
Who Is Supposed to Create the First Version of the Platform?
I would highly recommend finding a part-time DevOps engineer to implement an initial infrastructure in only a few days On the other hand, if you are a developer, or you have limited resources, you can do it on your own, following the recommendations, but it will take much longer.
What you have more of: money or time — is up to you.
Consider a situation: there is $20 in a pocket called “infrastructure.” What do we do in this case?
Best Practices in Infrastructure Design
In the first part of this article, we talked about the most common mistakes, but now let’s talk about avoiding them. Requirements must be collected before budgeting and allocating $20.
Let’s take a look at what is the most important:
It doesn’t matter how much time you spend on hypothesis validation and development if the public release is a complete mess. Infrastructure must be designed with potential growth in mind so Kubernetes and cloud providers can solve this problem easily.
Let’s take this example:
- 1000 users came — good, let’s increase the number of virtual servers in the cloud,
- 10,000 arrived — perfect, we’ll increase it even more,
- the database starts to slow down (as we can see from the monitoring graphs) — okay, we should increase its size or type.
You should always use common sense: sharding the database for a startup? no. Increasing the size of the virtual database server? yes.
2. Infrastructure as Code
Lucky for us, in 2020, no one is raising the infrastructure manually. Manual changes imply a huge number of routine actions, leading to errors. In this case, no one ever knows in what condition the infrastructure is at the moment, no one understands how it generally works and how it can work. But you can solve these problems using Terraform.
That would be really cool if we could run the application and pay very little money for the infrastructure. But If we are a startup at the initial stage, then we still do not earn money, and the less the infrastructure costs — the more “tries” we have to make it work.
4. Focus on Benefits
Everything related to the application and the basic infrastructure should be carried out in SaaS as much as possible. Do you need monitoring? Datadog or New Relic. Nowhere to put logs? Logentries or LogDNA. Nothing to build an application on? TravisCI, CircleCI or Github Actions.
5. Separate Accounts and Credit Cards
Create a separate mailbox for registering services and a separate bank card for paying for services. This is really helpful advice from any point of view.
Serverless for AWS: First Pennies Then a Truckload of Money
Serverless architecture is the very first solution that easily allows us to solve the problem of cheap infrastructure for a startup. Long story short, let’s just agree that we don’t need any servers, we don’t want to know anything about them at all, and we think in terms of functions.
We should use the most logical solution: Amazon Web Services’ Lambda + API Gateway. It works like this: we describe business logic in Lambda functions, and on top, we wind up routing between several functions using the API Gateway.
The huge advantage of this approach is very little cost, and only after actual resource usage. It turns out that while we do not have traffic and users — we pay almost nothing, but when the first traffic comes — we start paying for each request. Another benefit we are getting is scaling out of the box.
The concept of infrastructure as code will work great and will allow you to describe all the resources in Terraform (and several environments for development, quick releases). Additionally, there will be basic monitoring and basic logging out of the same box and all this is very cheap, while the product is not used by anyone.
The strongest advantage is the focus on value because everything works by itself, you just need to develop business logic and don’t to worry about anything else.
On the other hand, there are also disadvantages:
- Vendor lock (if we design a serverless system inside AWS, we will be limited only to this cloud);
- Additional payment for each movement: traffic, logs, monitoring, time, resources;
- It will cost a truckload of money when a lot of users come;
- It will cost a truckload of money if we design the system incorrectly — it will take more memory than necessary, long execution time, etc.
If you are testing an idea and do not expect a huge number of requests and users — serverless for AWS is an ideal option for you. But if you do get many users, it will be expensive and super inconvenient. So be aware of that.
Kubernetes in DigitalOcean: $20 per Month
My personal recommendation is to look towards any Kubernetes implementations in order not to redo the infrastructure every six months or a year. At the moment, Kubernetes has become the de facto standard of the market, it is a leading area that is being developed and supported, and it makes sense to invest time and money here.
For business, Kubernetes provides a platform in which scaling, high availability, deployment strategies, and a huge number of integrations are available out of the box.
DigitalOcean is one of the cheapest cloud providers. Also, a couple of months ago, the company released the DigitalOcean Kubernetes service. Managed Kubernetes means that we will get all the benefits of the platform without worrying about configuring, managing, and maintaining Kubernetes itself.
Alternatives for PaaS Kubernetes are AWS Elastic Kubernetes Service (about $120/month for an empty Kubernetes cluster), Google Kubernetes Engine (from $50 for the two smallest nodes). They are more expensive, but at the same time, more production-ready. It can be considered as a migration option when the startup already has money coming in.
The cheapest configuration is two virtual servers. It can be one, but then it will not be fault-tolerant, so there is no way it can only be one.
DigitalOcean Kubernetes supports the concept of infrastructure-as-code. The resources and providers are available in Terraform, and they are super easy to configure. A ready-made Kubernetes cluster is created with literally one resource and a few lines of code.
In terms of scalability, we have the ability to scale both vertically (increasing the size and type of virtual server) and horizontally (automatically increasing the number of servers in the cluster). Horizontal scalability is available by default, and it works like this:
- Depending on the metric (CPU load, memory, network), Kubernetes determines that a virtual server needs to be added to the cluster;
- Kubernetes realizes that a new server has appeared, and moves some applications there;
- When the load is gone, Kubernetes gradually removes the servers from the cluster;
- Victory: we only paid for a couple of hours only while the load was on.
In order to implement detailed monitoring and logging in the Kubernetes world, it is better to use the Prometheus-operator and the Elastic stack. These services are installed inside the cluster and consume resources. We are building infrastructure for $20, so we have no extra resources. Let’s see how we can get out of this situation.
Out of all the SaaS monitoring systems existing, I only found a free plan from Datadog. With the help of Datadog, we will be able to collect metrics from our cluster, view graphs in a browser and analyze why our application is performing poorly.
Additionally, I would recommend using Sentry for the initial analysis of application errors. They also have a free plan.
LogDNA has a free plan for logging solutions: but you need to know that logs will be deleted after 24 hours. This can be inconvenient because you always want to understand when the bug first appeared, at the same time, with our requirements — using LogDNA works just fine.
It is worth saying that all three of these products are quite expensive even at the start if you exceed the limits of the free plan. But let’s be realistic: the resources of a small Kubernetes cluster and developers’ time will not be enough to support their monitoring, the service for collecting logs and analyzing errors. If your infrastructure grows, most likely, you decide to support your solutions, but for now, when you need to move fast — SaaS solutions will be a great option.
This will be more than enough to get you started.
But What about Cheap Spot Instances?
Yes, a great idea. But not for the first release.
Spot instances are virtual servers that can be up to 90% cheaper on the most popular cloud providers. Their main thing is that these servers can be taken from you at any time, or they may not exist at all.
It turns out that the application must be designed in a special way which means that you need to: implement a graceful shutdown (it is necessary when scaling), add message queues to the architecture, and also take care of many other little details. This approach is a complication of the product, and an additional extra emphasis not on the core business.
If the application is already in Kubernetes and you can easily add an additional cluster from spot servers there. When the infrastructure is large enough it will be a significant saving of the budget but at the initial stages, it will only bring an extra headache.
GCP Free Tier and AWS Free Tier
Almost all cloud providers have a free trial period, which can be very helpful when you need to save money. In most cases, this is a certain amount of money for a period of one year, which can be spent using the useful services of a cloud provider.
Let’s take a look at our options:
- Google Cloud Platform gives $300 for 12 months. If you try to build a Kubernetes cluster, it will cost about ~ $50 per month so the money will be good enough for six months.
- AWS Free Tier implies the use of certain services within quotas, also for 12 months.
Unfortunately, Kubernetes Engine is not available in the free trial, but it is quite possible to build the serverless architecture that we discussed in the section earlier.
The providers are hoping that during this test period you will enjoy using the cloud so much that you will start adding additional services: cache, CDN, databases, and more. But you should be very careful with this because in case of aggressive use, it is very easy to get hooked: when billing is already very expensive, and you can’t quickly move to a cheaper solution.
I should also mention Google Cloud for Startups and AWS for Startups. If your startup is already operational, there are users, a legal entity, and you can show growth on the charts — most likely you will be able to get funding from cloud providers.
If your current startup idea does not take off, you can easily launch a new idea on the existing infrastructure.
The conditions and opportunities are flexible, in general, they can be described like this: providers will provide you with the amount required for infrastructure development every month for up to three years. During this time, you will most likely learn how to use internal services (aggressive marketing and a personal manager will contribute to this), and after a while, you will not even think about leaving.
In real life, you can run an initial scalable infrastructure with high availability even when there is no money at all. Of course, $20 per month is the minimum amount that the very first version of the smallest infrastructure starts with, but as the startup grows, the number of servers, their cost and the price of additional services for logging and monitoring will increase.
We ended up with a scalable infrastructure by default (and nothing extra needs to be done for this), we took the most usable technology stack on the market, and this solution will not need to be redone in the near future.
Last but not least, Kubernetes gives you the ability to easily migrate from the cheaper DigitalOcean Kubernetes to the more expensive Google Kubernetes Engine, or the most expensive Amazon Kubernetes Engine once the investment comes in. In the future, it will be possible to move to your own servers or it will even be possible to move towards cloud-agnostic (when the platform operates in several cloud providers at the same time).
The infrastructure is ready for development, it does not restrict your business and is very cheap. If your current startup idea does not take off, you can easily launch a new idea on the existing infrastructure.
I will be really happy if your startup or product can get through hard times, or if it doesn’t fall under the pressure of unexpected users with the technique you used after reading this article. If that happens, please transfer me 1% of the company’s shares.
Amazon Web Services, CircleCI, LogDNA and New Relic are sponsors of The New Stack.
Feature image via Pixabay.