Cases NCI INNOVA
FCJ Venture Builder Group
The largest group of venture builders in Latin America, with over 130 startups.
The group approached NCI Innova with an unusual demand: the development of a video conferencing platform.
Some startups within the group were using an international vendor to provide customizations for their virtual meeting rooms. As the number of meetings grew during the pandemic, the costs associated with these solutions became too high.
Challenge
Creating a Video Conferencing Platform
with the following attributes
Benchmark
Quality comparable to Google Meet.
Economy
Operating costs below those provided by international vendors.
Capacity
Scalability to run more than 300 concurrent meetings.
Control
Store recordings of the meetings on servers.
Platforms
Mobile-friendly operation, without needing to download apps.
Access
99.9% availability.
Results
The impact of NCI INNOVA's innovations
Solution
Process Steps
As the project was ambitious, we needed a plan to validate the hypotheses with the least possible investment. We divided the work into 3 main stages:
Step 1: Assess Technical Feasibility
Can we create a product with the quality of Google Meet, and that records meetings on the server?
Step 2: Economic Viability
Is the infrastructure cost for this project appropriate? Could we achieve a lower cost compared to other market vendors?
Step 3: Scalability
Can we meet the demand of 300 parallel meetings while maintaining quality and operational costs?
Solution
Process Evolution
Follow the step-by-step of our work on the project.
We used WebRTC technology for audio and video communication in a very simple project, where it was possible to create a room at a specific URL, and two people could access the same URL and see and hear each other.
WebRTC technology is relatively recent, and its specification was only recommended by the W3C in 2021. It establishes a protocol for exchanging data, audio, and video over the web.
This technology can be used in a few different ways, with the most commonly used ones being:
- Peer to peer (P2P) – People in a meeting room communicate directly, without the need to transmit audio and video data through a server.
- Using a media server – All participants in the room send their audio and video to a server, which then distributes it to the other participants.
In our proof of concept, we initially started with P2P connections, but we couldn’t progress using this method. Despite having a lower operating cost (as it doesn’t require media servers), this solution doesn’t allow recording on servers since the media transmission doesn’t pass through any server.
So, we needed to move on to the approach of using an open-source media server. We found some alternatives: Kurento, Jitsi, and Janus. We implemented the first version of our room using Kurento.
In 2 months of the project, we achieved a satisfactory result with a fully functional room, and meetings were being recorded.
Technical feasibility? Check!
Ok, we were able to create the technology, but is it worth it?
To answer this question, we need to understand two main factors:
- Server Costs
- Traffic Costs
To understand these points, we need to know how many meetings each server can handle in parallel and measure all data traffic to understand the cloud costs.
For this purpose, we created a load testing subproject that generated multiple meeting rooms and accessed them with several open browsers, transmitting recorded video from a 720p webcam. We used Selenium as the technology for this load testing.
We deployed our servers on Amazon AWS and also created instances to act as “clients” for the meeting rooms, running our load tests.
At this point, we encountered our first surprise. Recorded rooms using Kurento as the media server were very resource-intensive. A server with two CPUs and 4GB of RAM could only handle 4 parallel meetings.
To achieve our goal of 300 simultaneous meetings, we would need around 75 servers.
Back to square one, we had to replace our chosen media server. In two weeks, we studied Kurento’s competitors, chose Janus as an alternative, and implemented all necessary changes.
We ran our load test with Janus, and… Bingo! With the same server, we could run 114 simultaneous meetings.
Server Costs? Check.
With this study, we identified that the project is financially viable, but there is still room for improvements.
We explored various cloud providers to understand the traffic costs, as it could be the most significant cost for a project of this scale.
We evaluated the following providers:
- Amazon AWS
- Azure
- Digital Ocean
- Oracle Cloud
- Some local providers.
Since the main requirement for this project to perform well is a fast internet connection (both upload and download), due to the massive volume of data that will be transmitted, we eliminated the local providers as they couldn’t deliver a fast enough link.
After a thorough study of pricing and billing models, we decided to implement this project on Oracle Cloud, as it offers the lowest cost per GB of traffic and also provides 10 TB of free traffic every month.
However, to avoid lock-in with any specific cloud provider, we implemented our entire deployment solution with Kubernetes. This way, we can easily migrate to a cheaper cloud provider if prices were to be adjusted.
Economic Viability? Check.
The biggest challenge of this type of project is scalability. Scaling this type of solution is much more challenging than scaling regular web applications due to several factors:
- People in the same meeting room need to be directed to the same server.
- We cannot perform a scale-down of a server if a meeting is still ongoing on that server.
- Standard load distribution strategies like Round Robin do not work for this type of solution.
To achieve scalability, we need to create an innovative mechanism. We divided our application into microservices, and the key to scalability lies in the scheduler.
The scheduler is a service that creates meeting rooms on the appropriate server and directs all participants trying to access the meeting to the same server.
To keep this mechanism simple, we utilized a pattern used in Kubernetes’ implementation, known as the Leader-Elector.
How does it work? The scheduler always directs all meetings to a single media server: the leader server.
In this way, the scheduler can remain simple and doesn’t need to know about all servers that are up and running. Every time a new meeting needs to be created, it is assigned to the leader.
When someone wants to join a meeting that already exists, the scheduler checks on which server the meeting was scheduled and redirects the user accordingly.
But how do we determine who is the leader?
We coded a Docker image for the Leader-Elector. The Leader-Elector is a Node.js job that competes for leadership. This job runs as a sidecar of the media server to register the IP of the media server as the leader whenever it assumes leadership.
We established certain criteria to decide who becomes the leader:
- An instance can become a leader when it has at least 70% free CPU, and the current leader has not renewed its leadership.
- Leaders renew their leadership if their CPU usage is below 85%.
The cloud scaling deploys a media server that competes for leadership whenever the average CPU usage is higher than 70%. This ensures that all meetings are instantly scheduled since a new media server is already up and running when needed.
This mechanism allows leaders to alternate if necessary, and a new leader takes all new meetings, allowing the old leader to be turned off when all meetings on that server are finished, and there is no longer a need for an extra server.
Scalability? Check!
result
With this project, we were able to achieve our clients' objectives
As an ambitious project with a strong market demand, it eventually took on a life of its own. Today, Anymeet (www.anymeet.io) is a spin-off of NCI Innova.
Armando Júnior
CEO of Connsult
Technologies involved in the project
Typescript / nodejs
Shellscript
Webrtc
Coturn
Janus Webrtc server
React
Selenium
Kubernetes
Prometheus
Cloud computing
AWS Cloud
Oracle Cloud
websockets
Ffmpeg