
During
the 18th TechDay of The Computing Conference, Qi Jun,
CTO of Router Software Company Limited from Nanjing, made a presentation titled
How Can Medium-sized and Small Enterprises Make Clever Use of Container
Technology, sharing the company's experiences, issues and lessons learnt during
its use of Alibaba Cloud container services and Docker. Focusing on
the impact of container technology on business production and the overall
productivity, the presentation is of significant reference to medium-sized and
small enterprises.
Download presentation slides
The
following is a summary of the shared points of view at the activity.
Dilemmas

First, let's take a look at the first
dilemma:
There
are too many or overlapping sub-products in the product line, leading to
numerous repeated basic services, such as e-mail services and SMS services. But
in development stage, they cannot be split.
The
second dilemma is the O&M. At present, there are a total of 40 servers in
the company and a bunch of services, but only one full-time O&M personnel.
The
third dilemma is the core issue: Extremely low efficiency in deliveries of
products/services, giving rise to a serious waste of human resources.
The
fourth dilemma is being unable to effectively manage load peaks and valleys for
reasonable resource allocation.
The
fifth dilemma is the demanding requirements of the company on reliability and
security. Every day, hundreds of media rely on our products for their
publication and production. If we cannot improve business reliability, a
problem means the loss of an issue of a newspaper or an episode of a TV program
in a region.
Old Architecture

The
above figure illustrates the old architecture of the company, and every line is
identical. A heap of clients visit a public IP address
behind which may be an ECS server. On the ECS server, one
or more applications may be deployed, and these applications are connected to
Alibaba Cloud database services, or cache or log services. All of these
constitute the minimum unit. During peak hours, there may be up to 60 minimum units
in the company, involving nearly 80 ECS servers, leading to upgrading and
maintenance issues. But this approach has to be abandoned. Because once the
code is released, you have to negotiate a time with the client for upgrading.
But clients deploy their systems independently, and it is impossible to agree
on a uniform upgrading time for all clients.
Present Architecture

In
the current architecture, users access through the internet, first by way of
the SLB (load balancing forwarder) which can be
understood as an official website IP address that
accesses its back-end servers through the official IP addresses provided by SLB; in the
back-end server, we have a VPC network which houses around 20-30 ECS servers.
The 20-30 ECS servers constitute the present four
major clusters in use. The container cluster needs to have several container
instances to run after the cluster is created.
Upon receipt of the request for access, the container cluster
will forward the requests to applications according to the request domain name
and port number. After the requests arrive at the application, the containers
for various services on the application will connect to Alibaba Cloud databases
or self-commending databases.
Core Issues

Reliability, cost and agility are
three core issues. Agility refers to agile development and deployment; cost is a
shared concern of the boss and the CTO; and
reliability secures our survival.
Container Technology & Agile Development

Container
technology and agility development now face three problems:
First,
the uniform development, testing and business environments. The difference in
versions, applications and operating environments may cause a variety of, and
usually unexpected, problems. Meanwhile, new employees still need to repeatedly
establish various development, testing and operation environments, lowering the
efficiency.
For
this issue, our solution is to encapsulate and package all the basic
environments in the image warehouse. New employees only need to download the
image, and put his/her own code into the image during local debugging and
development. Although read-only during the process, the image is actually
connectible with data on external disks. In addition, the developed and debugged
code won't suffer problems, because all the images are consistent and are
operating-system-independent.
Second,
how can we achieve continuous development of applications developed during
different stages. This is a pain point for development teams of many
medium-sized and small enterprises. New employees are reluctant to maintain and
fix issues in the old code.
For
such problems, we can further break down the application: several items of a
big application can be separated to small applications, slashing the cost for
maintenance. During continuous development, these small applications do not
require re-writing and you can focus on the necessary tasks.
Third,
code contamination and manual packaging faults need urgent solution. When a
company has many small applications, packaging may see frequent issues.
Currently,
we adopt Git auto build to solve the problem, because
it gets code from Git. When we push a branch, an
image will be automatically created, in which case code contamination and
manual packaging faults are unlikely to happen.
Container Technology & Business Cost

Next,
let's take a look at the relationship between container technology and business
costs. At present, there are several cost problems as follows:
- Long
downtime is costly for updating and greatly impairs user experience.
Previously, we could only update at 3 o'clock in the morning. If any
problems occur, we needed to solve them before 8 o'clock. This has
increased the cost a lot, and sometimes the code updating will take a whole
night, let alone the more effort-consuming case when rollback is required.
For this issue, the blue-green release mentioned in the speech of the
previous presenter actually is a good solution.
- Simultaneous
updating for multiple loaded servers is hard to achieve, as the rollback
cost in case of errors is very high. Blue-green release can solve this
problem. In our company, we adopt Alibaba Cloud containers that can
accommodate minuses. Before the container is fully updated, users cannot
access the content in the new image.
- Business
precision elasticity is not available, and server-level elasticity is far
from enough. Among our 40
servers, probably only one third of their service time witnesses a
utilization rate of 80% or
above. During the remaining two thirds of their
service time, the resources are idle and wasted. Through
container level elasticity, we can activate more containers for web
services or API services
to realize better loads and execution efficiency. In such circumstances,
more physical servers can provide such reliable computing resources and
the performance is far better than imagined. Because in normal cases, it
is impossible for a server to always stay fully loaded under any
circumstances.
Container Technology & Reliability

Now
let's talk about container technology and reliability. One truth about
reliability is that on-cloud full failover may not solve the problem instantly,
but it is needed by all. On the cloud, many of us may have overlooked the hot
backup issue, thinking hardware faults are not likely to happen on the cloud.
But the truth is the other way round. Alibaba's container service can be
configured through arguments. When a cluster fails, you can manage to put the
container into another cluster. Although there are no perfect solutions for
domain name or port configurations, and manual adjustments are required, it is
great progress.
Data
synchronization and sharing in multi-server load scenarios. The appendixes of
some old services cannot be separated in case of poor coupling, or some
concentrated reads or writes may exist. In such cases, shared storage should be
used to solve this issue. At present, we solve this problem through the two
solutions provided by the container service: first, the OSS data volume,
and the other is NSA data volume. Both can
support access from multiple containers to the same file data source, as well
as real-time concentrated writes and reads.
The traps we once fell into

Now I want
to summarize the lessons I learnt over the years, in a hope they can be of some
help to you:
First, containers without decoupling are difficult to use. This
is much in evidence. Because if the container has a bunch of applications with
high relevance, once the container goes wrong, the applications will fail and
the whole business system may even collapse. For this reason alone, we need to
decouple applications, which is very important.
Second, it is the micro services. Microservices are booming at
present, but it does not mean the architecture would become amazing after it is
split into microservices. Microservices have their own merits, but again, it
does not mean all businesses should adopt microservices. Only universal,
repeated and reusable applications are suitable to be split into microservices.
Third, the bigger the project, the more difficult the container
architecture for use. A great majority of our projects are currently hosted in
container services, but not all of them. Some products are too big to be placed
in a container service. This involves the internal management, as well as the
product or business scenario and user requests. The bigger the project, the
wider the scope, and the harder to impose a sweeping approach on it, that is,
placing the project into a container cluster.
Fourth, reliability. Do not count on Docker alone to solve all the reliability issues. Docker
and container technology, in my opinion, are both a kind of architecture
instead of a tool. Reliability is related with a lot of elements, from the
network environment and the overall architecture layer, to the quality of
developers. These elements are beyond the control of Docker.
Finally, container technology is only a kind of architecture.
Running Docker
on a single service is an experiment. Only when it drives the operation of a
cluster can it truly give play to its power.
Common Scenarios

The first
scenario is efficient API cluster. Sometimes
we can encounter such circumstances that some APIs in the
company are for external use, and some others are not, but the API can be used by the company APP and the APP may access the API
through the official approach, and the API can be called by internal services at the same time. But the
issue is: when an external domain name visits it, there are no problems. But
when an intranet domain name does so, DNS
is required and some official traffic may be even consumed for the visit. As a
matter of fact, the server called through intranet may be very close to the API server.
In such circumstances, we can solve the problem using a model:
when an internal server initiates the call, use the intranet SLB (Alibaba's
intranet SLB is free of charge); when an external server initiates the call,
use the internet SLB (traffic charges only, if I recall it right); the two
container clusters (container clusters are Alibaba's container services)
connected to the API
respectively are correctly configured internally. One advantage of this is:
when this service is used in a concentrated way, you can make choices at will.
On-cloud calls can be completed through intranet SLB; and external calls can be completed through internet SLB, facilitating a maximum and most
reliable access efficiency. Regarding internet access, the response may be
around 10
milliseconds, while the figure for intranet access may be around 1 to 2 milliseconds.

The second
scenario is fast delivery of applications/services. Our company focuses on SaaS services as well as applications and
software. Most of the work is about assisting in development and customer
services, and some deployment is also undertaken by developers. We solve the
problem with a model. After the product image is developed, we submit the image
to the code library; O&M personnel enter the container service and create a
cluster, then they create an application using the orchestration template.
After the application is created, the service configurations are modified.
Because the container is not running in the same way as the code we are
familiar with, some configuration files need to be modified. The current solution
is: we put all configurations in the container environment variables, which is
also the mainstream approach. Then we modify the service configuration, the
environment variable and restart the service. After the service is restarted,
the restarted container will read an environment variable as its configuration
and it will then be running successfully.
Summary
Simply put: in the past year, from pains and the verge of giving
up to the flush of dawn, and to the bumps along the way, we should thank all
the media clients and Alibaba for their constant support.