You cannot build modern software unless your organization uses modern processes and procedures. Modern applications require modern organizations.
It doesn’t matter how scalable your application is if your development organization isn’t structured to support it, or if your organization does not have the right culture to drive higher availability and greater scalability.
Organizing your teams to better support your scalability needs will create a culture that supports your application’s scaling needs.
In Chapter 3, we discussed what a service was and how it could be utilized to help take the complexity of an application and divide it among many different development teams, each working on its own code base and supporting its own services. We discussed how to size services and how services should interact.
But we didn’t delve deeply into the specifics of what it meant for a team to “own” a service, and why this ownership is important. In this chapter, we will explain what is meant by service ownership, and what is necessary for a Single Team Owned Service Architecture to work.
Single Team Owned Service Architecture
What is Single Team Owned Service Architecture (STOSA)? STOSA is an important guiding principle for large organizations that have many development teams that own and manage services comprising one or more applications.
What does it mean to have a STOSA application and organization? To be STOSA, you must meet the following criteria:
· You must have an application that is constructed with a service-based architecture.
· There must be multiple development teams responsible for building and maintaining the application.
· Each and every service in your application must be assigned to a development team, who owns that service. Who owns which service should be well documented and readily available to everyone in the organization.
· No service should be assigned to more than one development team.
· Individual development teams may own more than one service.
· Teams are responsible for all aspects of managing the service, from service architecture and design to development, testing, deployment, monitoring, and incident resolution.
· Services have strong boundaries between them, including well-documented APIs.
· The service owns its own data. Data is part of the service. If a service needs to access data stored in a different service, it must use one of the well-documented APIs to access that data.
· Services maintain internal service-level agreements (SLAs) between them that are monitored, with violations reported to the owning team.
A STOSA-based application is an application for which all services follow the preceding rules. A STOSA-based organization is one in which all service teams follow the preceding rules and all applications are STOSA applications.
In a STOSA-based organization, each team should be of reasonable size (typically between three and eight engineers). If a team is too small, it cannot manage a service effectively. If it’s too large, managing the team becomes cumbersome.
Figure 6-1 shows a typical STOSA-based organization managing a STOSA application.
Figure 6-1. STOSA-based organization with a STOSA application
In this diagram, the boxes labeled A through L represent each individual service within the application. The ovals represent development teams that own the enclosed services.
This application contains twelve services managed by five teams. You’ll notice that each service is managed by a single team, but several teams manage more than one service. Every service has an owner, and no service has more than one owner.
Clear ownership for every aspect of the application exists. For any part of the application, you can clearly determine who is responsible and who to contact for questions, issues, or changes.
Figure 6-2 shows an example application and organization that are not STOSA-based.
Figure 6-2. Non-STOSA-based organization
You’ll notice a couple things. First, Service I does not have any owner. Yet Services C and D are owned and maintained by more than one team.
There is no clear ownership. If you need something done in Service C or Service D, it’s not clear who is responsible. If one of those services has a problem, who responds? What happens if you need something done to Service I? Who do you contact? This lack of clear ownership and responsibility makes managing a complex application even more complicated.
Advantages of a STOSA Application and Organization
As applications grow in size, they grow in complexity. A STOSA-based application can grow larger than a non-STOSA-based application and can be managed by a larger development team. As such, it can scale much larger while still maintaining solid, documented, supportable interfaces.
A STOSA-based organization can handle larger and more complicated applications than a non-STOSA-based organization can. This is because STOSA shares the complexity of a system across multiple development teams effectively and efficiently, while maintaining clear ownership and lines of responsibility.
What Does It Mean to “Own” a Service?
In a STOSA organization, the team that owns a service is ultimately 100% responsible for all aspects of that service. That team might depend on other teams for assistance (such as an operations team for hardware), but ultimately the owning team is responsible for the service.
This includes the following responsibilities:
API design
The design, implementation, testing, and version management of all APIs, internal and external, that the service exposes.
Service development
The design, implementation, and testing of the service’s business logic and business responsibilities.
Data
The management of all data the service owns and maintains, its representation and schema, access patterns, and lifecycle.
Deployments
The process of determining when and if a service update is required, and the deployment of new software to the service, including verification and rollback of all service nodes and the availability of the service during the deployment.
Deployment windows
When it is safe and when it is not safe to deploy. This includes enforcing company- and product-wide blackouts as well as service-specific windows.
Production infrastructure changes
All production infrastructure changes needed by the service (such as load balancer settings and system tuning).
Environments
Managing the production environment, along with all development, staging, and pre-production deployment environments for the service.
Service SLAs
Negotiating, setting, and monitoring SLAs, along with the responsibility of keeping the service operating within those SLAs.
Monitoring
Ensuring that monitoring is set up and managed for all appropriate aspects of the service, including monitoring service SLAs, and also reviewing the monitoring on a regular and consistent basis.
Incident response
Ensuring that notifications are generated when the system begins to function out of specification. Providing on-call rotation and notification management, as necessary, to make sure someone from the team is available to handle incidents. Handling incidents within prescribed SLA boundaries for incident responsiveness.
Reporting
Internal reporting to other teams (consumers and dependencies) as well as management reporting on the operational health of the service.
Often, some of these aspects are not handled directly by the owning team but are the responsibility of a shared infrastructure, tools, operations, or platform engineering team. Even in those cases in which aspects are handled by other core teams, however, it is ultimately the service owner’s responsibility to make sure the activities are handled to the level required to meet their SLAs and customer expectations.
The following items often are handled by shared teams on behalf of the owning team:
Servers/hardware
All hardware and infrastructure needed to run the hardware for production and all supporting environments. This is often provided by an operations team, or by a cloud provider, or both.
Tooling
Various tooling required by the owning team is often centrally owned and managed. This can include deployment tools, compiling and code management tools, monitoring tools, on-call and incident response tools, and reporting tools.
Databases
The hardware and database applications that store the data used by the service are often managed by a central team. However, the data itself, the data schema, and the use of the data are always managed by the owning team.
Figure 6-3 shows a typical organization hierarchy for a STOSA-based organization. Essentially, all development teams that are service-owning teams are peers, organizationally. They are all supported uniformly by a series of supporting teams, including operations, tooling, databases, and other similar teams. All of these may or may not also sit on top of other core teams that have universal responsibility for the organization but not for individual services. These can include teams such as an architectural guidance team or a program management team.
Figure 6-3. STOSA-based organization hierarchy
Service-owning teams in a STOSA organization are the teams that are ultimately responsible for all aspects of the services they own. A service-owning team might depend on the core and support teams, but it is ultimately responsible for ensuring that all issues are dealt with and that the service is operating properly.
For example, let’s assume that a service fails because a deployment went bad due to a failure in the core deployment tool. The service failure is the responsibility of the service-owning team. That team may have issues or concerns with the tooling team that it needs to deal with, but ultimately the service-owning team is the one responsible for the failure. It cannot simply say “it was the tooling team’s fault.” Ultimately, even if that were true, it was the service that failed, and hence the service-owning team is responsible.
With strong ownership of results also comes strong ownership of decision making affecting your service. Typically, a service-owning team is given a set of requirements it needs to implement, but the details of how those requirements are implemented are the team’s responsibility. The team might have system-wide compliance requirements it needs to conform to (such as architecture guidelines or rules, tooling that must be used, language and hardware selection restrictions, or industry-specific regulatory requirements), but these ultimately are part of the service requirements given to the owning team.
Beyond these requirements, all design details and decisions are the responsibility of the owning team.
Ultimately, the owning team is making a commitment to achieve an expected set of results and maintain an appropriate set of SLAs.
Using Core Teams and Services
Often in a strong STOSA-based organization, service teams may choose not to make use of a standard shared core and support capabilities. As an example, they may support their own database rather than using a database provided and supported by a centralized database team. Or they could decide to use their own cloud provider rather than the cloud provider supported by the operations team.
As long as a service team meets its specified requirements, it does not necessarily need to be forced to use these common infrastructure components. Of course, there are advantages for the service team in utilizing standard, shared capabilities. If the team chooses not to use these supported shared capabilities, it may in fact generate additional support headaches for itself. The key, though, is that this decision is the decision of the service team to make—and it has to live with the repercussions.
One advantage of this model is it gives motivation and responsibility to the core teams to treat the service teams as real customers…customers that can go somewhere else if they don’t provide the capabilities they require. This can provide strong motivation for a centralized team to provide higher quality offerings to the service teams.
Your organization does not have to do this to support STOSA, and in fact your organization may put in service requirements that require the use of core infrastructure components. But in general, the greater the flexibility given to the service teams, the greater the ingenuity and ultimately the better the services that are produced.
As your organization grows and scales, there will be a natural tendency toward accepting these standardized core platform teams by the individual service teams. In fact, in a large, highly scaled organization, there may be little difference between service teams that are “forced” to use common platforms and those that “choose” to because it’s the only way for them to meet their specified requirements. The more you can make this a choice, whether real or perceived, the better buy-in you’ll have within your organization in general.
Summary
STOSA is an important model for determining how services should be owned and managed by individual development teams. It describes a model for an organization culture that makes building and maintaining services a scalable practice. In the remaining chapters in Part II, we’ll continue discussing service ownership as we focus on service interactions and the interfaces between services.