This article is more than one year old. Older articles may contain outdated content. Check that the information in the page has not become incorrect since its publication.
Alibaba Cloud - DAMO Academy - Cloud Xiaomi conversational robot product is based on deep machine learning technology, natural language understanding technology, and dialogue management technology, providing enterprises with multi-engine, multi-channel, and multi-modal conversational robot services. In 2017, Cloud Xiaomi’s conversational robot started public testing on the public cloud, and continuously expanded in hybrid cloud scenarios. To ensure efficiency and stability in public and hybrid cloud releases, we adopted a major version iteration every 1-2 months after much consideration.
After years of development, to better support business growth, architectural upgrades and refactoring are unavoidable. For stability, every public cloud release requires developers to do two things:
1. Review the changes in interface dependencies compared to online versions and determine the release order and proportion of applications.
2. Simulate the release order output from the first step to ensure a smooth upgrade for backend services without customer perception.
These actions take about 2-3 weeks each time to sort out and practice focused, but only ensure that the exposed PaaS API updates smoothly.
The console service requires the frontend, API, and backend to maintain version consistency for a seamless experience, leading to previous releases during traffic valleys to minimize publication time and avoid occasional errors in certain console modules. We considered using blue-green and gray releases to address these issues early on, but such expansion of ordinary cloud products within Alibaba was no longer allowed, resulting in no redundant machines and complete lack of traffic governance.
Bearing the above issues, in September 2021, Cloud Xiaomi migrated its business to Alibaba Cloud.
“The most impressive thing at the time was this image; although I didn’t know exactly what the middleware team was doing, I remembered two keywords: Trinity and Dividend. I didn’t expect to actually enjoy this dividend at the end of 2021.”
Cloud Xiaomi uses the group’s internal HSF service framework, which needs to be migrated to Alibaba Cloud, while also requiring intercommunication and mutual governance with Alibaba’s internal business domain. Cloud Xiaomi’s public services are deployed in the public cloud VPC, while some dependent data services are deployed internally, necessitating the RPC interoperability between internal and cloud services, a typical hybrid cloud scenario.
In summary, their core demands include: a preference for open-source solutions for easier future business promotion; ensuring safety during network communication; and needing low-cost solutions for business upgrades and transformations.
After many discussions and explorations, the solution was finalized.
After resolving interoperability and service registration and discovery issues, the focus shifted to service governance solutions.
After migrating to Alibaba Cloud, there are many traffic control solutions such as the group’s full-link solution and the unitization scheme within the group.
Take RPC as an example:
While considering options 1, 2, 3, and 4 as imperfect, while researching and preparing self-built solution 5, we eventually got access to Alibaba Cloud MSE Microservices Governance Team’s “20-Minute Enterprise-Level Full-Link Gray Release Capability”, which aligned perfectly with our self-built idea, utilizing the RPC framework’s routing strategy for traffic governance, realizing productization (Microservice Engine - Microservice Governance Center).
As seen in the image above, each application is required to set up baseline (base) environments and gray (gray) environments. Apart from the traffic entrance - business gateway, the downstream business modules should deploy gray (gray) environments as needed. If a module does not change during a release, it does not need to be deployed.
Applications only require slight configurations:
The traffic distribution module determines the granularity of traffic governance and the flexibility of management.
The conversational robot product requires gray releases and blue-green releases currently implemented through the following two schemes:
The effect of the first release after going live: “The new version codes of each module are already launched, covering publishing and functional regression, taking about 2.5 hours, which is a significant improvement compared to previous releases that extended to the early hours.” MSE Microservices Governance full-link gray release solution met Cloud Xiaomi’s needs for rapid iteration and cautious validation in the context of accelerating business growth, helping Cloud Xiaomi quickly realize enterprise-level full-link gray capabilities through JavaAgent technology.
As traffic governance develops with business growth, further needs will arise—next steps will include ongoing collaboration with the microservices governance product team to expand the capabilities and use cases of this solution, for example with the gray governance capabilities of RocketMQ and SchedulerX.
After using MSE service governance, we discovered additional out-of-the-box governance capabilities greatly enhancing development efficiency, including service querying, service contracts, and service testing, among others. Notably, the cloud service testing provides users with a private network Postman on the cloud, enabling us to easily call our services. We can overlook the complexity of the cloud’s network topology without concern for service protocols, and without needing to create testing tools, everything can be accomplished through the console. It supports the Dubbo 3.0 framework and the mainstream Triple protocol of Dubbo 3.0.
Ultimately, the Cloud Xiaomi conversational robot team successfully implemented full-link gray release functionality, resolving the long-standing publishing efficiency issue. During this process, we migrated parts of the business to Alibaba Cloud, upgraded the service framework to Dubbo 3.0, and chose MSE microservices governance capabilities, making numerous new choices and attempts. “There was originally no road in the world; as more people walked, it became a road.” Through repeated exploration and practice by our engineers, we can distill many best practices for more colleagues. I believe these best practices will shine like pearls in the sea, becoming even more brilliant through practical application and the passage of time.