11

Hunting Tech Debt via Org Charts

 3 years ago
source link: https://bellmar.medium.com/hunting-tech-debt-via-org-charts-92df0b253145
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
Hunting Tech Debt via Org Charts. Knowing where to look for problems by…

Responses (1)

There are currently no responses for this story.

Be the first to respond.

You have 2 free member-only stories left this month.

Knowing where to look for problems by figuring out who reports up to whom

Carson as Carnac the Magnificent
“Let’s see…. no backups, twenty convoluted stored procedures and old C++ code”

After a couple years in government I used to blow peoples’ minds with a fairly simple party trick. Before going into our first meeting with a new agency stakeholder I would predict several key details of what their problems were going to be. All I needed was the name and title of the person we were meeting.

Nowadays, as I talk other people through their legacy modernization struggles, the first piece of advice I give them is to look at the org chart. On large, complex systems there are too many stones to flip over while looking for problems, but the types of problems organizations have are heavily influenced by their incentive structure and the easiest way to figure out their incentive structure is by looking at the org chart.

On particularly old systems you may even end up doing an overview of how the org chart has changed over time. Breaking things out into sublayers of problems as incentives have shifted. In my book Kill It With Fire, I go into a lot more detail about the nature of incentives: this isn’t about money, this is about how people feel seen. A bonus is nice, but feeling secure in your job because the role you’re playing and the decisions you’re making are in line with what the organization prioritizes is much more powerful. Those are the incentives that actually shape behaviors. That’s why my second question when evaluating systems is “How do people get promoted here?” The true operations of systems are determined by the day to day decisions of individuals on the ground. Those people are going to prioritize the work that helps them get ahead. The activities at the bottom of that list are where tech debt is going to accrue.

In Small Organizations Too

Although we’re more accustomed to legacy projects coming from large organizations, all large organizations start off as small organizations first. If the organization is too small for a structure (or even worse has given in to the delusions around “flat” organizations) you can still predict the set of problems by looking at who’s on top. What is the background of the CEO? If they have a free hour… are they pushing code or making sales calls?

The Scenarios

Engineering First Organizations

Like Google or Facebook these are organizations where engineering trumps everything else. If you’re not in engineering, your pathway to promotion is limited.

What kinds of problems will you find? There are many paths to the center in both life and software engineering. When engineering runs the show you tend to find multiple paths errr…. implementations of the same thing scattered about. They tend to fall out of sync with each other almost immediately. Systems get rebuilt until the gap in feature parity is too high, then abstraction layers are piled on top. Engineers love metawork, or the engineering work that is supposed to make other engineering work easier, faster, more secure. Midcareer engineers aspire to deploy features, senior engineers aspire to deploy massive frameworks.

So with these organizations you want to look for problems associated with unnecessary complexity. The good news is that the practices one must take on to survive complex systems are good practices regardless — mature on call and incident practices, blameless post mortems, data contracts, etc.

The bad news is that beautiful tech is unrelated to profitable tech. Complexity should be growing to support business growth, not the other way around. If the business isn’t making enough money to support the level of complexity engineering is running, then engineering can’t staff teams at levels that maintain healthy work life balance. Good practices around managing complexity break down and institutional knowledge is lost as people burn out and quit. If this is the case: figure out which teams tend to lose the staffing battles, then assume the low value engineering work is not getting done. Typically this means things like updating dependencies, mitigating security reports, documentation and meaningful code reviews, but what work is devalued is obviously part of the company culture. Look at what people get rewarded for doing, not their marketing copy. Don’t assume that security companies prioritize resolving vulnerabilities anymore than any other company, because not all of them do.

Engineering Reports To Product

Engineering and Product have fundamentally different goals. When one reports up to the other the parent group just ends up suffocating the subgroup. Often people will claim that since all their product people are super technical they do not have this problem. That’s BS, the problems engineering experiences under product are about incentives, not technical correctness.

Engineering wins by producing beautiful and scalable systems, in pursuing that they sometimes build too complex too quickly. Product, on the other hand, wins by putting new stuff in front of customers as often as possible. In doing that they sometimes….. Well…

What kinds of problems will you find? Cut corners start off small and compound over time. Ask yourself: when a prototype or MVP becomes an actual product, how do people know? The answer to that question will tell you a lot about what kinds of technical debt to look for. When building prototypes engineers tend to skimp on the types of scaffolding that make long term maintenance easier. It feels ridiculous to write thorough documentation for something that might not work or might get tossed when it doesn’t find a proper advocate among leadership. Engineers might forego writing Terraform in favor of quickly SSHing into an instance and setting things up manually. They might struggle to figure out the correct way to do something new and settle for a temporary hack instead.

If the transition from proof of concept stage to product in development isn’t formal or could conceivably leave someone out of the loop (like the infrastructure team the needs to help scale it, which happens A LOT) then most of these cut corners will stay cut. And while an application manually installed, without documentation and even hamstrung by unscalable hacks can still be successful, these types of cut corners tend to make it more difficult for new engineers to modify the system moving forward. The irony of the Product First team is that the race to ship as many features as often as possible typically slows feature development down to a crawl over time.

Engineering Reports To Security

I’ve seen this structure a lot in government, once or twice in the private sector. Often it’s a dotted line over a solid one — Engineering does not formally report up to Security, but nothing gets done until Security signs off on it.

What kinds of problems will you find? Security teams tend to stop development work. They don’t mean to do that, but modern day security has to walk a tightrope. You want to keep your software patched — but at the same time problems with malicious code being injected into complex dependency trees are becoming more and more common so maybe you actually want to pin your dependencies to specific approved versions. You want to make security scans a regular part of merging and deploying code, but at the same time static analysis can so overwhelm people with false positives they might just ignore warnings. Perfect security does not exist. Every security professional I’ve worked with has understood that (save for some government CISOs who were really lawyers). But being in charge, means being accountable for failure which naturally increases risk aversion.

Security people spend a huge amount of time preparing for problems that won’t happen (but totally could!) The idea that they might spend all their social capital over threats that never happen and miss the one that does haunts security teams, so when given the power to block change … they frequently do. Even when they understand how counterproductive that is. Better to block and be wrong (or extra sure) than do nothing and be compromised.

Technology built by organizations with security above engineering tends to be frozen in time. New features don’t get shipped, but sometimes upgrades don’t get shipped either. New solutions or approaches have to really prove themselves, and the solutions they are replacing are creeping close to end of life. A general suspicion around other companies may mean that the engineering team is trying to maintain components that could have been replaced with open source, PaaS or SaaS solutions.

Flat Organizations

For a while flat organizations were all the rage among engineers. Egalitarian utopias where all ideas were evaluated on merit and there were no middle managers to gunk things up.

Fortunately for us, the realities of flat organizations are pushing them out of favor and lively conversations about engineering leadership as a craft are taking over. Flat organizations are a nice idea, but they just don’t reflect how people behave. People need trust in order to collaborate. Once an organization is greater than about 150 people, the ability of the individuals in that organization to trust each other based on direct, personal experience breaks down. Trust needs to fall back on something else and it tends to fall back on hierarchies and process. Proponents of flat organizations assume that if they do not define hierarchies and process that they do not exist, but they do … they just grow from internal politics and big personalities.

What kinds of problems will you find? Conway’s Law runs amok in flat organizations. Because communication pathways grow up around personal relationships rather than chains of command, you often see problems in the protocols and data contracts that govern the specific subsystems. There may be duplicate components, or extra layers of complexity being used to patch an organizational silo. Data may go from one schema, to another schema, back to the first schema. The internal logic of the architecture can be difficult to see or explain. As people leave, no one knows who the work should be transitioned over to … or who determines who the work is transitioned over to.

How bad this get depends on how large the organization is and how flat is it. Organizations that are a collection of small teams can and do run high quality technology, but the overhead on individuals to be proactive in their communication is high and burns people out over time — especially software engineers. This shouldn’t come as a surprise, because this kind of people work is literally why managers exist. If you don’t hire managers the need for that work doesn’t go away, it’s just handled with work arounds.

If you want to find problems on systems built by these organizations the easiest way to do it is by asking yourself who isn’t talking to whom and look in places where what those siloed teams are building need to talk to one another.

Systems Out of Balance

If you’re clever you probably spotted it on your own: in each example the tech debt produced pulls the organization further away from what it’s trying to accomplish by taking on that debt:

  • Engineers prioritizing making systems easier and more consistent end up with overly complex, difficult to operate systems
  • Product managers prioritize shipping features and end up making the software development process slower
  • Security prioritizes minimizing risk of change and ends up maximizing the risk of out-of-date software.
  • Flat organizations prioritize open communication and end up with cliques and internal politics.

As the saying goes “all things in moderation.” The best organizations at managing technical debt tend to be the ones that have a thoughtful process in place to adjudicate competing incentives. Everyone has to have a pathway to win. Engineering needs Product to be successful in order to safely grow their beautiful systems. Product needs Engineering to win in order to maintain agility in the market. Security needs people to listen to them, but also needs mere mortals to reign in their paranoia.

There’s a reason why the conventional org structure has these groups as peers rolling up to a CEO or a board. Good technology is built and maintained by balancing these perspectives. Bake in structures that give one group priority over the other and the debt around that relationship will pile up.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK