Advancing your company as a Software Engineer

Process Mar 7, 2022

Life as a software engineer can be frustrating. You know that this new lib will make everybodies life easier. Nevertheless, you can't just introduce it: You need to get an approval. Reflecting on my past experiences of 12 successful and 3 unsuccessful advancements, I show you key learnings on how to inflict technical and organizational advancements.

You would like to get rid of Java and utilize Kotlin, TypeScript or Python? You would like to replace the cumbersome Hibernate ORM with a more lightweight, less frustrating approach? You would like to base your teams priorities on data and metrics?

As a software engineer, these decisions are not fully in your hands. However, these decisions are neither fully out of your hands. Such decisions are a result of the input of a large set of people. As a software engineer, you have the ability to influence decisions at a company into a favorable direction.

For this post, I reflected on a dozen of past experiences where I participated in or drove technical or organizational advancements. During these advancements, I have been in the position of a Senior Software Engineer. As such, I had some but not full control over languages, tech stack and processes. With the learnings from this article, you will be able to approach such advancements from such a or from a similar position.

Reflecting the twelve experiences, I investigated what the common factors have been that made the advancements possible.

Support 🔗

Introducing a technology or advancing a process is not a one-man-show. It doesn't matter if an advancement is objectively beneficial. If you can not convince your team members and stakeholders and address their concerns, then it is not beneficial to implement that change. In the following sections, I show different strategies to gain the required support of team members, stakeholders and the company.

Convincing team members 🔗

There can be many reasons why team members could not agree with your advancement. For one technology change, a team member told me that they are afraid that their previous knowledge would become useless after that advancement. For a change to a more functional programming style, some team members were used to an imperative programming style and preferred it. For introducing an event-based mechanism, there have been team members that made bad experiences with an event-based mechanism before.

To address such concerns, I found it helpful to go smaller steps. This makes the changes more approachable and leaves the option to revert.

Instead of moving from Java to Kotlin, we utilized the library Vavr as a functional advancement to Java collections. Later, we used more and more functional data types like Try and Either from the same library. Eventually, a team decided to write a new service in Kotlin.

Instead of immediately basing a new service on Kubernetes, we first got it working on the old runtime. We avoided to introduce blockers for replication and found solutions that would work within Kubernetes. Later, we moved the service to run on Kubernetes.

Another strategy is to reduce scope.

Instead of introducing RxJava in various places across a microservice, we limited its usage to a single class. Later, we gathered experience with it and team members started to see the value and adopted it in other places.

Instead of replacing all internal SDKs with OkHttp at once, we replaced a single SDK and monitored it in production. Later, we replaced the remaining SDKs.

Convincing stakeholders 🔗

Also, depending on the organizational setup, external stakeholders outside the team need to be convinced.

I found that some strategies are effective to convince stakeholders.

Gain broad support of the team
Separating controversial from non-controversial changes
If all fails: Going rogue

If you are able to get more team members to vouch for a technical advancement, then an external stakeholder is more likely to allow it. As an example, when introducing Node.js in the team, all team members including Java developers were convinced that it was the right tool for the job. When introducing testcontainers to allow isolated component tests, the shared e2e-test suite was widely considered as a problem.

If you are able to separate the controversial changes from the non-controversial changes, then the controversial change that needs a stakeholder approval is smaller, entails lower risk and is more likely to get approved. The switch to Kubernetes entailed the risk that the replication would not work with the way we build our service logic. During the implementation on the old runtime, we kept an eye on its readiness for Kubernetes. Avoiding local in-memory state, long-running transactions, application-internal locks, etc. Ensuring this is non-controversial: No external stakeholder will tell you to please store variables in global application variables or to keep this transaction open over three class boundaries (Well, if they do, run!). However, a buzzword like Kubernetes is more likely to catch the attention of an external stakeholder and is hence more controversial. For the introduction of Kubernetes, we managed to avoid introducing blockers via non-controversial work and then had a lower-risk less controversial switch to Kubernetes.

The last strategy of going rogue needs to be used rarely since if used wrongly, it can massively undermine trust. When I am referring to going rogue, I am talking about consciously acting against expectations of what you are doing for a small amount of time to prove a thesis that you were not able convince people otherwise. In the very few times I was going rogue, it was always for a few days only and never without the knowledge of at least one trusted person.

Note: With more freedom in time organization in higher positions, going rogue becomes less and less necessary. With fewer expectations on what specific tasks you spend your time on, investigating is just part of your job and nobody would consider it to be going rogue.

If you go rogue, it is mandatory that you are correct. Ensuring this, requires that you align your views with at least some person beforehand. If you can't find support, it is likely that you are wrong. Then don't go rogue, rather try to broaden your perspective. There is something that you are not seeing that would change your view if you knew it.

Convincing a company 🔗

If you are striving for huge changes like moving from imperative to functional programming, from top-down to bottom-up management or from service-oriented to domain-driven architecture, then this requires a continuous change of a companies culture.

Such changes are not happening in a sprint or in a quarter. Even a person in charge can not mandate that everybody now writes functional instead of imperative code. There has to happen a continuous process of cultural change.

There are strategies to achieve such a change over a period of years.

Knowledge sharing: Distribute resources for people to become more knowledgeable and familiar in the topic
Templates and best practices: Make it work somewhere and share recipes to adopt the approach elsewhere
Facilitate change: Act as a knowledge hub and offer support for adoption

As an example, we aimed for an increase in the frontend development efficiency with the company-wide adoption of micro frontends. To achieve this, we first got a micro frontend working within the team. After that, there was a template to adopt and a small optional library to use. The approach and the knowledge was shared in a presentation and documentation for the adoption was written. When opportunities in the different teams opened up, some of them reached out for and got support. After a year, we had eleven micro frontends across six teams in four tribes. The culture successfully changed from monolithic frontend development to micro frontends.

Gaining the support of the team is mandatory, gaining the support of external stakeholders is beneficial, changing a culture is necessary for huge changes. Another important factor I encountered was timing.

Timing 🔗

For eleven out of twelve advancements, there has been some opportunity that opened up. For three of these advancements, it was a new service that was being bootstrapped. That is the case for the introduction of Jdbi as a Hibernate alternative, the introduction of input validation with an OpenAPI spec and the introduction of Node.js within the team.

Note: That is one of the benefits of a microservice architecture. It allows to continuously open up opportunities for technical advancements.

For the introduction of Vavr, the opportunity was the director mentioning this technology from a conversation with another team. For the introduction of RxJava, long wait times for users required a solution, with no parallelization in place up to this point. For the introduction of product metrics within the team, it was the director pushing for more OKR involvement of the teams. For the introduction of a zero-bug policy, it was a bug scrubbing week eliminating all major bugs of the team. For the introduction of micro frontends, it was the engineering manager and director pushing for a general speedup in frontend development.

Timing and opportunity are crucial for technical and organizational advancements.

In three major unsuccessful advancements that I have pushed for, either the timing was not good or the opportunity was not big enough. For the introduction of a search service, the intended functionality of a type-ahead search didn't justify the effort of the setup of a new service in an environment where at that time setting up new services was rare and complex. For the introduction of a Kafka event bus to solve performance issues in the communication of two services, this would have risked the timely delivery of the functionality. For the introduction of a document database, the loose structure of the data types were to small of a problem to justify this effort of setting up additional backup and compliance strategies across the company.

Overall, it is essential to listen for signs when it is a good time and when it is a bad time to push for advancements. You don't want to repeatedly bring up a topic in times where customer complaints are gathering, where a deadline looms or where the value of a service itself needs yet to be proven. On the other hand, you want to listen for opportunities that higher-ups are creating and for problems in the customer value delivery that need a solution.

Context 🔗

I am a huge fan of pure functional programming languages. Nevertheless, I have never pushed for my team to write a service in Haskell. The reason for this is context.

There is no objective truth to what the best technologies are. In different teams, different technologies can be the best choice to make.

In a team of mostly Java developers like my first three teams, a step towards Kotlin is logical as an advancement. In a team of mostly Angular frontend developers like my fourth team, a step in the backend towards Node.js is a more reasonable advancement.

Additionally, not only your team is part of the context you need to consider, but also existing code. When we replaced Hibernate, there were options to go for Jooq or Jdbi. I was first vouching for Jooq because of its modularity. Nevertheless, with the context of a Dropwizard service, we went for the similarly powerful option Jdbi because of its integration with Dropwizard.

As a software engineer trying to advance the tech stack, it is crucial to stay flexible towards the technologies. Don't fixate on solutions, but on the problems that require solving.

Problem 🔗

Most importantly, but still often overlooked, is that an advancement solves a problem. The twelve advancements I investigated, had an underlying concrete problem that was becoming apparent.

Many technologies solve an abstract problem.

Node.js solves the problem of verbosity of Java
Testcontainers solves the problem of running tests against a database in unit test suites
RxJava solves the problem of complicated multi-threading of explicit threads

However, if you want to advance a technology stack, there needs to be a concrete problem in your context at your current point in time. The benefits of solving a concrete problem are more apparent then the abstract benefits of solving an abstract problem.

Node.js solved the problem of writing tons of Java data classes in the context of a JSON-heavy API at a time where we wanted to bootstrap and iterate on a new service quickly
Testcontainers solved the problem of the bottleneck of a common integration test suite in the context of a Java microservice at a time where the number of engineers was doubling per year
RxJava solved the problem of complicated multi-threading in the context of a single-threaded Java microservice at a time where the user load started to stress the service

Analyzing and understanding the underlying problem is crucial. Knowing the solutions you can supply to specific sets of problems allows you to more easily identify the problem.

Conclusion 🔗

When introducing technical and organizational advancements, I have four types of advice.

Gather the support of the team, external stakeholders or the entire organization depending on the scale.
Seize opportunities that open up.
Consider the context of your team and existing code.
Solve a concrete apparent problem.

Appendix: List of advancements 🔗

Introducing Vavr 🔗

Problem: In an increasingly functional architecture, the Java Streams API is verbose and makes testing difficult.
Timing/Opportunity: Director mentions the library.
Context: Java microservice with Java engineers.
Support: Create a draft pull request showcasing to team members a part of the production code with Vavr instead of Java Streams API.

Introducing Jdbi 🔗

Problem: The mutation-based Hibernate ORM prevents a functional architecture.
Timing/Opportunity: New service is being bootstrapped.
Context: Java microservice with Java engineers.
Support: Compare available alternatives (Jooq and Jdbi) and develop and present first usage.

Introducing RxJava 🔗

Problem: Parallelization and asynchronicity with threads is complicated, error-prone and difficult to test.
Timing/Opportunity: Maturing service causes long waiting times for users.
Context Single-threaded Java microservice with Java engineers.
Support: Introduce RxJava simplistically in a single class and continuously utilize RxJava features.

Introducing OkHttp 🔗

Problem: A dependency on a company-internal configured HTTP client causes problems with updating and version conflicts in the dependency tree.
Timing/Opportunity: Wide acknowledgment that the company-internal SDK is an anti-pattern.
Context: Java microservice with Java engineers with internal SDK usage.
Support: Discuss and continuously replace different SDKs with OkHttp over a few of days.

Introducing validation with OpenAPI 🔗

Problem: The published OpenAPI specification is lacking precision and gets out of sync with the input validation.
Timing/Opportunity: New service is being bootstrapped.
Context: Mandatory OpenAPI specs and new Node.js service with primarily frontend engineers.
Support: Discuss and setup mandatory OpenAPI specification and then add the validation.

Introducing testcontainers 🔗

Problem: A long-running integration test suite becomes a bottleneck for independent releases.
Timing/Opportunity: None (or I forgot)
Context: Java microservice with no integration tests with Java engineers.
Support: Try and compare different libraries and get it running on a service.

Using Kubernetes for the first service in the company 🔗

Problem: A company-wide release train becomes the bottleneck for team velocity and prevents fast feedback loops in the development of a new product.
Timing/Opportunity: A platform team did the ground work for Kubernetes and the director is pushing for a first service in Kubernetes.
Context: Central Kubernetes knowledge in platform team and no prior experience in team of Java engineers
Support: Prevent blockers up-front and in a sprint of interteam-collaboration setup the Kubernetes deployment.

Introducing Node.js in the team 🔗

Problem: A new JSON-heavy product becomes unfeasible to maintain in a Java application reducing team velocity.
Timing/Opportunity: New service is being bootstrapped.
Context: Team of mostly frontend engineers and few Java engineers, bad team experience with existing Java service, JSON-heavy service.
Support: Evaluate two existing Express.js and Nest.js services, gather feedback and team support, get approval from external stakeholders

Introducing a micro frontend in the team 🔗

Problem: The development of a small frontend within a frontend monolith for a backend-heavy product risks to reduce team velocity.
Timing/Opportunity: A basic user-facing UI is required for a backend-heavy application.
Context: Backend-heavy team with little to no frontend experience
Support: Evaluate existing options with other team, frontend lead and infrastructure engineer, talk with product owner and team, get approval from external stakeholders

Introducing micro frontends in the company 🔗

Problem: Growing engineering head count and complexity of the frontend increases release times and decreases engineering velocity.
Timing/Opportunity: Engineering manager pushing for general solution and a new part in the frontend is being developed
Context: Different micro frontend technologies working in two other teams, weaknesses known
Support: Spike webcomponent-based solution with manager and director, implement product frontend, address challenges, present to engineering organization, write setup guide, offer setup support

Introducing product metrics as the first team in the company 🔗

Problem: The team is able to deliver faster than qualitative user feedback can be gathered.
Timing/Opportunity: Director is pushing for more team involvement in OKRs and access to tools is being allowed to request.
Context: Existing operational metrics in a team with no product metrics experience
Support: Implement product metrics and start investigating and refining them in sprint reviews additionally to the existing review method.

Pushing an engineering-wide RFC process 🔗

Problem: Growing engineering head count makes it tougher to coordinate and develop a common engineering culture.
Timing/Opportunity: Staff engineer recommends RFC process, frontend people complain they need to be in a meeting to stay informed and frontend lead pushes for the usage of first RFCs.
Context: Meeting-based exchange between teams and the engineering organization.
Support: Start writing first RFCs and share them with the organization, gathering and incorporating feedback.

Introducing a zero-bug policy in the team 🔗

Problem: A huge backlog of old bugs reduces team velocity.
Timing/Opportunity: In a company-wide bug fixing week all important bugs are eliminated.
Context: Scrum sprints with a backlog of hundreds of untouched bugs older than six months
Support: Mention zero-bug policy of Microsoft multiple times, seize opportunity.

Introducing a search service 🔗

Unsuccessful
Problem: A fully overhauled memory-heavy type ahead search risks the stability of the main application.
Timing/Opportunity: Missing organizational experience in setting up new service, advancement to big in comparison to the problem
Context: Monolithic application with few services around it

Introducing an event-based approach 🔗

Unsuccessful
Problem: A new high-traffic service heavily affects the availability of the main application.
Timing/Opportunity: Advancement to big in comparison to the problem.
Context: Local Kafka's running semi-successfully

Introducing a document-based database 🔗

Unsuccessful
Problem: A new domain-specific type with many subtypes risks to reduce team velocity in a schema-based database.
Timing/Opportunity: Advancement to big in comparison to the problem.
Context: Node.js microservice with mainly TypeScript engineers