This is a Patreon-supported essay. Drafts of all of these essays go out a week early to my $10 and up subscribers. Building secure applications is hard, and for organizations that have never done it before, it's often unclear where to even start. Worse, organizations that have some development experience often underestimate the work required to ship secure code. My goal with this series is to make the landscape more legible and give NGOs and other organizations an idea of where to start. This is a part three of a four part essay. You can find part one here, part two here, and part four here.
I have a Patreon, here, where you can subscribe to support my security and systems-focused writing. You sign up for a fixed amount per essay (with an optional monthly cap), and you'll be notified every time I publish something new. At higher support levels, you'll get early access, a chance to get in-depth answers to your questions, and even for more general consulting time.
© 2021 Eleanor Saitta.
This is part three of my guide to secure software development for NGOs and other organizations. You can find parts one and two here and here, and part four here. In the first section, we looked at the lifecycle of software, the organization creating it, and the design process and how it impacts security. In the previous section, we looked at everything that comes between design and actually writing code. In this section, we'll talk about development and security testing.
Once you understand your language and environment and you have a well-analyzed architectural model for your system, you should know what the different modules you're going to write need to do. This is true regardless of whether you're doing all your architecture up front or if this is an initial minimum-viable-system. Either way you need to know what's required of the code you're writing in any given development push before you start, and you'll continue to need to know what's required of each piece of code as development proceeds.
Before you start writing code, it's worth taking the time to extract from the threat model and your standards documentation all of the security requirements for each module and put them in one place. Do this even if your description of what the module needs to do otherwise is informal. At a minimum, you'll already need to know the set of APIs you're interacting with and what the end result will be. However, it's impossible to know if you've accomplished your goals securely without knowing what your code shouldn't do. This is what security requirements come down to here — a list of all the things that should not happen. If you get this list together before you start work, you'll both save yourself writing code that doesn't meet those requirements and make testing faster.
This set of per-module requirements needs to be a living document over the life of the system, especially for agile systems that will see intense refactoring. It should always be possible for a maintenance coder or someone doing refactoring to find the relevant requirements document for the code they're reading. In an ideal world, they should be able to jump directly from the code to the document — you want this to be as easy as possible. You need to keep versions in sync, too, so they can find the version of the requirements that corresponds to the version of the code they're reading.
If you don't keep complete security requirements current and available, it's likely that further development will introduce bugs as developers won't understand what the code needs to prevent from happening. As the role of modules changes or the threat model is updated over time, this needs to be pushed forward into the module requirements. When requirements are updated, code should be checked against the new requirements. When modules are merged or split, you must ensure their requirements are correctly split or merged as well, and again ensure that the resulting code follows those requirements. You need to drive these updates from threat model (rather than working from existing requirements documents) because it may show new possible vulnerabilities between modules, the mitigation of which must be coordinated in multiple places.
Building security documentation like this can be time-consuming, and it will always be tempting to let it slide to meet development schedules. As with doing security design and threat modeling in the first place however, the sooner you can catch bugs the cheaper they are to fix. This kind of discipline up front will prevent them from being introduced in the first place. Every development team will choose a different trade-off here, but care directed toward the pursuit of reliable development is rarely wasted. This is not to say all development formalities are worthwhile — it's entirely possible to have an overly-bureaucratic development process that adds little to the outcome. However, outside heavily regulated industries — banking, healthcare, and avionics, for example — this is a rare failure mode. If you keep the development culture you want your team to have in mind while thinking about these processes, you'll be fine.
Not every system can be built, or even designed, in one go. In many cases, systems that are doing something novel or interesting will need to have a working prototype created first. This prototype may be abandoned completely, or it may be slowly shaped into a production system. Tools that come from a research background, like Tor, often take a path like this. It's easy, as a system becomes more and more useful, for it to slip from research into production. It's hard to justify either scrapping and rewriting a lot of code or going through relatively complex redesign processes for a tool which “works”. Similarly, a tool may have reached the point where its capabilities are useful in production, but further exploratory work is also still needed.
There's no easy answer as to when a slow process of upgrading code standards, doing post-hoc security reviews, and testing the interface and fixing specific pain points is more appropriate than a complete overhaul. Often the decision will be financially-driven. Moving from exploratory development to a production system can be very resource-intensive, and a slow upgrade may be more compatible with an organization's funding model and the needs of people using the system.
In many cases, the underlying architecture of the system and the user experience are more closely coupled in exploratory systems than they need to be. The more novel the system, the more likely this is to be true. One useful intermediate point can be to start over from scratch at the design layer and see how people might use systems that can provide similar security properties. This information can then be used to rework the user experience of the system while keeping much of the core architecture the same. In some cases, this may be accompanied by research forks of the system to let exploration continue. In other cases, especially if a system has been under development for many years, scrapping everything and starting again with the knowledge gained from the first version may be the only practical option.
Many of the most-reviled security tools currently in-use were never intended for non-technical users, but the sunk-cost fallacy has kept them locked into an unproductive development track. If you're planning to build a novel system, it's worth planning up-front for a redesign cycle at a minimum. Separately, you may need to schedule time to rewrite exploratory code in a manner more suitable for production deployment, but this engineering-driven rewrite cannot replace the redesign.
When people think about application security, they mostly think about security testing. As we're seeing, it's just one stage in a long process. From a security outcomes perspective, design is at least as important, if not more so. That said, doing testing well is still important and difficult.
There are two kinds of testing you need to do. First, you need to ensure that the system complies with the security requirements you created. Second, you need to make sure that you haven't created vulnerabilities at the platform, language, or protocol levels. In both cases, for systems undergoing ongoing development (including maintenance) you'd like as much of the testing to be automatically repeatable as possible. We'll talk about automation in detail in the next section.
If you built your threat model correctly, you'll be way ahead when it comes to testing. You'll already have a list of all of the security properties and rules of operation the system relies on, exactly where they're supposed to be enforced, and at least a basic enumeration of all of the ways they could be violated. While you will likely need further unit tests for functional correctness, this list of security properties and rules can drive much of your testing process. You'll also have, as defined by your security objectives, an understanding of how the system is supposed to react if an adversary attempts to implement an attack and mapping between the possible system violations and the desired reactions. Places where those reactions are positive (detect, alert, and log, rather than just prevent or thwart) create another obvious test set.
Sometimes users need to interact with the system in certain ways, known as ceremonies, in order for the system to be able to provide certain kinds of security guarantees. Your security design process should have identified and specified all of the interaction points and failure modes of these ceremonies. Functional correctness testing on all of the interface points supporting those ceremonies and on the logic behind them should be considered as core security tests. Likewise, parts of the interface will show the user if system security has failed or the system has been unable to meet a security guarantee. These should also be enumerated and tested. Separately, user task completion correctness testing should be done with diverse groups of users under real-world conditions. The connection between the system and the human the system is trying to protect is a key security component. This connection must be tested to demonstrate the system can operate securely. While this is well outside of the scope of traditional security testing, it should be considered essential for systems used in the real world. While you should have done this with prototypes during the initial design period, it's necessary to redo it with the real system. Real systems always have subtle differences from prototypes, and these differences can have a big impact on security outcomes.
So far, we've talked about testing security properties and rules of operation at all of their enforcement and potential violation points, both at the per-module and the system-wide levels, testing that the system correctly responds to each possible violation, and testing the interaction points, behavioral correctness, and legibility of all ceremony interface points, the latter both in code and by humans. At this point, you've done all of the testing the threat model can drive directly. However, you also have a list of all of the types of issues your environment and frameworks are vulnerable to that you can't guarantee aren't present, because you generated the list when you were choosing frameworks. You need to test all of these things too. Any time there's a known-possible bug, you should test all the possible ways it could exist. If you've implemented any protocol parsers by hand, you'll need to exhaustively test them to ensure they accept the protocol as specified, interpret it correctly in all cases, and don't accept anything else. Any time you're maintaining state in a protocol, you need to fully explore that state space, in similar ways. As a particular sub-case of this, places where things are named and then looked up by name, especially if multiple different systems or implementations do the naming and the looking up (such as with a filesystem) are a critical point of protocol correctness. Names and references must be interpreted in exactly the same way by all implementations. In many cases, it's best to design away these situations before they happen, but if you're already there, you need to test exhaustively.
Any time you have an interaction with another system where there is a trust boundary, there are potential bugs. Some of these will be caught by your threat model, but some may not. In particular, if the systems on either side of the boundary have different implementation it's possible that the abstractions they use for the meaning of things passed across the boundary are different. As all abstractions are lossy (they lose some semantic detail of the thing represented) it's possible that security-relevant properties will be disrupted in this transition. Figuring how to test issues like this can be hard, and this is one of the reasons to formally specify the protocols used to cross trust boundaries.
Within the scope of a system, there are often actions required to maintain the correctness of the execution environment — things like allocating and freeing memory or triggering garbage collection. Many subtle vulnerabilities happen when errors in environment maintenance interact with errors in protocol implementation. In fact, these errors are possible even if both pieces of code are correct separately but make different assumptions about each other. Ideally, your environment will mostly maintain itself and you won't run into these kinds of issues, but this isn't universal. If it your environment doesn't, fully exercising all combinations between two subsystems, themselves complex, may be difficult or impossible. This brings us to automated testing.
As security bugs are complex they can often be reintroduced as code is maintained. The more quickly bugs are found and fixed, the cheaper they are. Both of these are good process-oriented reasons to automate security testing when possible. Furthermore, some security testing, consisting of large numbers of randomized interactions with the system in hopes of elucidating rare issues that may have complex requirements for being triggered (known as fuzzing) can only be done in an automated fashion.
At a minimum, all of the tests specified by the threat model for functional correctness should be automated. Development teams, especially teams practicing continuous integration or similar development methods where there is a short period of time for pre-release security checks, may find it useful to have these automatically run whenever code is checked into source control. Developers should still run the tests on their own, but test orchestration means fewer chances for errors.
Automation of tests specific to the platforms, protocols, or languages being used may also be useful. There are a large number of tools on the market which do static analysis of source code in an attempt to find vulnerabilities. While these can be used in detailed code review, some of them are also useful as tools to be run automatically on checkin. Tools vary in how many false positives and negatives they generate and can take real work to integrate into a development lifecycle. Determining which tools are a good fit for your development environment may take some time. As with all testing, they're not useful if the results aren't reviewed and acted on.
There are also dynamic automated testing tools. These require the code to be running in a real environment. They can be equally useful, but unless you have orchestrated fully-automated deployment of your system and scripted its use, they're unlikely to be useful in this kind of lightweight, on-checkin mode. However, both these and fuzzers should be used as part of security reviews, as a complement for human testers. Fuzzers are often incredibly effective against large, complex systems — they found over 70% of the security bugs fixed in Windows 10 — but they have their biases too. Just as there are bugs only a fuzzer will find, there are bugs humans are much better at identifying.
Some automated tools claim to not only find bugs, but to fix them, either in the development process (rare) or by guarding against exploitation after code is deployed. While it is critical to have the capability to filter the traffic hosted installations receive in production, among other things so you can deal with platform-level issues quickly before patches are available, you should never rely on these tools. While a vendor may claim that their web application firewall or similar tool will prevent all cross-site scripting attacks, in reality they're rarely an obstacle to a dedicated attacker. It's much better to spend your money on developer time, ensuring you've eliminated entire classes of bugs, than it is to buy black boxes that claim to magically fix them after the fact.
All code should be reviewed and tested before it's released. What this means depends on your development methodology, budget, and the level of assurance you're trying to provide. Many agile processes have code review as an integral part of their development practice, where a developer will sit down with someone else on the team and read through her changes with the other dev. This can help find and fix a lot of issues, including functional bugs, standards compliance, and even security issues. However, the flaw-finding mindset and the development mindset and their respective skills are different and it's not always easy to switch between the two.
For the same reason we have separate QA teams to bring a focused look at code and to catch what someone in the developer mindset may not see, we have separate security review teams. Among other things, a security review team should bring a deep knowledge of the security issues possible on your platform and an instinct for what vulnerabilities look like in general. Not only can they find bugs for you, but they can suggest places where similar bugs need to be checked for and often help diagnose the route by which the bugs came to exist in the codebase. Their process improvement recommendations can be invaluable for improving the maturity of a development organization. In many cases, when they suggest fixes for bugs, they'll be able to suggest framework-level fixes that can help you with entire classes of vulnerability.
If you have a traditional development methodology, a security review is generally scheduled after or in parallel with QA testing. This is late in the process. If you haven't been engaged with security designers and threat modeling throughout the process they may find architectural issues which can be expensive and time-consuming to fix, or worse requirements issues that may undermine the entire project. If you've done your early security work properly, you'll have it easier at review time. This is especially true if you've built out test suites from your threat model and your set of known-potential framework issues. That said, a late review still has the possibility of finding significant issues when they're hard to fix. Ideally, you'll have a security team working with you throughout the length of the development effort, reviewing code as soon as you hit a minimum viable working version of the system. As few small development organizations can afford this, scheduling an early-to-mid-point security review and then a final review is a good way to reduce the pain.
If you're working in an agile methodology, and especially if you're doing continuous integration, you don't have the time in your release schedule to run heavyweight gatekeeper reviews. This means it's difficult to avoid needing security talent on your development team. Cross-training developers with your security team so they can bring some of this knowledge in-house is probably the most effective way to handle this. All developers should receive security awareness and secure development practice training, but here we're talking about training specifically in security bug-hunting techniques. With this skillset inside your team and appropriate scheduling, you can ensure that your normal in-team code review cycles have time to concentrate just on security. Your security team can attend some reviews, spot check others, and occasionally do large-scale reviews of checkpoints in the code, all without slowing the release cycle.
Some teams choose to have security do what is known as a “black box” review, where the security team gets no more information than what an attacker would, in theory, have. The supposed goal in these situations is to “simulate an attack” and get an understanding of how bad the system's security is. These reviews, although they will often find critical bugs, are largely useless. They're based on inappropriate extensions of physical security testing practices to digital systems. In many cases, which bugs are or are not discovered in a black box review is down to the luck of the attacking team. While black box reviews provide a minimum level of gross status information, they don't have the sensitivity to be usefully predictive diagnostics, and the information limitations make them very poor value for testing time. Black box testing requires a great deal of skill, but it does not provide a proper overview of the state of the system.
Alternately, some reviews are conducted as pure code review, where the security team only sits down and reads the code. Depending on the system, this can be quite effective at catching many kinds of vulnerabilities — they're often far more obvious in code than they are in a working system. With full information about how the system works, the security team isn't relying on luck to find things to the same degree. Thorough security code review can be time-consuming and expensive, however, and when teams are working in tight time-boxes to review complex systems, it's guaranteed that they will miss issues.
A combination of these two methods, called “grey box” testing, generally results in the best outcomes. The combination of reading the code, examining the working system, and targeting it with selected automated testing tools provides the maximum understanding of how the system works and where it may fail. In grey box reviews, those issues found fastest through code review can be surfaced that way, and those found fastest through testing against the deployed system, likewise. This does require a security review team capable of both code review and manual and semi-automated testing of the live system, but any competent team should have this skillset. More on this in the next section. Grey-box testing still requires a significant degree of effort for large systems, but this is unavoidable. Technical security correctness is quite complex, and effective reviews always depend on sufficient time and staffing. The kinds of discipline and documentation that we've talked about throughout this series help, but even if you're doing everything right, testing is still time-consuming.
If you have cryptographic protocol (or worse, primitive) implementation code in-scope within the system you created, you'll need to work with security engineers with the specific skillset for reviewing cryptographic code. This is rare, and most security teams will need to subcontract out for this. Even if you have in-house application security engineers, it's unlikely that you'll have an in-house cryptographer unless you're running a very large organization — and even if you do have one, she will need someone else not otherwise involved in the development of the system to check her work.
After-action or lessons-learned sessions are an important component of the review process and should not be skipped. Understanding how every bug happened — especially if it was caught late in the development cycle or in production — is a critical step toward ensuring that next similar bug is prevented. These reviews can also help your team improve the review and development process so less stress and fewer resources are required for the same impact. Lessons-learned sessions do require a certain degree of maturity on the part of the team. If you don't have solid development standards in place and can't track accurately who touched or reviewed the vulnerable code and what requirements documentation they were working from, review will be hard. While you should start doing lessons-learned reviews as soon as you can, there's no point in doing them before you can reconstruct your process or act on the results. If you find you're not ready for these reviews, you should treat this as a critical warning sign that you need to improve the maturity of your organization immediately.
In most cases it makes sense to bring in an external security review team at some point. Reviews from teams outside the core development organization are essential for providing a periodic review of the development team's understandings and assumptions about their own code. If an organization only has a few projects under development at once, it's unlikely it can maintain a security team capable of both broad-spectrum reviews and of staying at the top of their field — the team won't have enough to do. In many cases, it's also useful to have external and publishable reports of the results of a security assessment. These let the user community understand the security status of the code you've released, as assessed by an independent authority. It's important that these reports include vulnerabilities that have since been fixed (and indeed, the reports should not be released until the bugs are fixed), because that's a critical part of allowing experts in the user community make their evaluation.
Good security review work is expensive because competent engineers are rare. It's reasonable to expect to pay US$30,000 for a review of a small application — perhaps 100,000 lines of code, plus libraries. For that, you'll get two experienced engineers for two weeks. If you have a significantly larger system, if your code is doing complex or nontraditional things, if you have cryptographic implementation code in your system, or if you depend on an unusually large amount of external code, the cost may be significantly higher. Review budgets for very large, complex systems run into the millions of dollars. At the end of the day, this is simply an expense, like security design, field testing, or developer time. It must be built into the project plan for every project that would expose users to risk if it was compromised.
Good security teams specialize. While large general-purpose software consulting shops may have a few decent security engineers working for them, it's rare that they'll have the scope to deliver good reviews, and rarer still that you'll actually get their good engineers. They often have security teams not because the teams are core to their mission, but because it lets them offer clients wholesale contracts for end-to-end development and extract a better margin. Boutique consultancies that only deliver security reviews live and die by their reputation and their work and generally turn out more consistent and rigorous results. It's also worth avoiding security teams associated with product companies. In many cases, these teams are little more than glorified sales engineers, mostly showing up to push you into buying their automated security review tool and integrating it into your workflow. While tools can be useful, you don't want your security team biased by their commissions.
Most security firms use their research to advertise. The combination of conference talks, publicly-disclosed bugs, tools released for free, and books written demonstrates their talent more effectively than the glossiest brochure. It's also standard practice to deliver short resumes for the folks who will be doing the review as part of the process, giving you a chance to check that you're not paying for a senior team and getting a bunch of fresh juniors. Expect, however, to get mixed teams on assignments — this is how the industry trains new auditors. Hourly rates should reflect staff experience.
If you're planning on publishing the report for an audit, you need to negotiate this up front. It's common that you will pay for the privilege. Some organizations will not permit you to publish the full report, but will offer a shorter “letter of assessment” or similar. Consultancies spend a lot of money developing reporting templates, databases of bug mitigation advice, and other intellectual property, and they expect to be paid when this is published. Likewise, they're being asked to public stand behind the work they did. While they may be very confident in it, some of every security review is always down to luck and this represents risk for the consultancy.
If you have flexibility around scheduling, this can sometimes make a review cheaper as your work will be fit in around the edges of other projects. These days, however, the industry is operating at capacity and the chances of this being a significant discount are slim. Consultancies always prefer to work with existing clients; new clients are complex to manage and this eats into margins. If you find a consultancy you can work well with and you know you'll have ongoing review needs, this relationship is likely to become more valuable for both parties over time.
Assuming you're doing a grey box review, like you should be, you'll need to hand over a copy of the application source code plus a working test environment for all of the components to the security team. If you have hosted components, they'll need access to the machines hosting them so they can look at how they're configured. It's critical that this environment be as close as possible to the production environment, but it's also critical that it not share any resources or data with it. Security testing can be hard on hosting environments. Machines will crash, databases will be lost, and data stored in them may be exposed; plan accordingly. Never use production data in any test environment, especially a security test environment. The team will also want all of the documentation you've got. If you have a proper threat model, this should be very useful for guiding the team's testing efforts. If you can show them in the threat model what kinds of issues you've already tested for and what those test cases look like, it will make their work that much easier and more effective.
When you get a report back from your security team, you'll have a pile of bugs to fix. It's important to check to make sure that they're fixed correctly, especially for subtler bugs. Ideally you'll bring the same team that reviewed the code originally back for mitigation testing. They're already familiar with your system and they know the bugs they found, so this will be much faster than a new team trying to test the fixes. If you want to do this, talk about it up front so the team can plan and schedule for it.
It's likely you'll also come away with longer-term process recommendations about how to reshape your development process or standards over time. These take longer to implement and should be evaluated carefully, but they're worth taking seriously. A security consulting team sees many different development teams in the course of their work, and can share with you the best practices they've seen across those teams. You may initially feel their recommendations won't work for your particular combination of resources, politics, and culture, but if your security team is any good, they're only suggesting things they've seen work in similar situations. If you need it, many security teams can do process, standards, and policy reviews for you with an eye to helping you through the security process change process.
If you liked this essay, you can sponsor me writing more. I've started a Patreon where you can pledge to support each essay I write. I'm hoping to put out one or two a month, and if I can reach my goal of having a day of writing work funded for every essay, it will make it much easier for me to find the time. In my queue right now are the last part of this series, a piece on team practices to avoid email malware, more updates to my piece on real world use cases for high-risk users, and a multi-part series on deniability and security invariants that's been in the works for some time. I'd much rather do work that helps the community than concentrate on narrow commercial work that never sees the light of day, and you can help me do just that.