This is a Patreon-supported essay. Drafts of all of these essays go out a week early to my $10 and up subscribers. Building secure applications is hard, and for organizations that have never done it before, it's often unclear where to even start. Worse, organizations that have some development experience often underestimate the work required to ship secure code. My goal with this series is to make the landscape more legible and give NGOs and other organizations an idea of where to start. This is the final part of a four part essay. You can find the first three parts here, here, and here.
I have a Patreon, here, where you can subscribe to support my security and systems-focused writing. You sign up for a fixed amount per essay (with an optional monthly cap), and you'll be notified every time I publish something new. At higher support levels, you'll get early access, a chance to get in-depth answers to your questions, and even for more general consulting time.
© 2021 Eleanor Saitta.
This is part four of my guide to secure software development for NGOs and other organizations. You can find parts one, two, and three here, here, and here. In the first section, we looked at the lifecycle of software, the organization creating it, and the design process and how it impacts security. In the second section, we looked at everything that comes between design and actually writing code. In the third section, we talked about development and security testing. In this final section, we'll talk about everything that happens after you've “finished” the development process.
The whole point of doing all of that security design work up front is that security doesn't live in the code, it lives in the effects that using a system to get work done has on people's lives. Given that, it's not enough to just test the security of the code. Field testing should occur throughout the life of the product, starting with early design mockups to make sure the concepts of the product are coherent. Once the interface is finalized, it's important to make sure that people can use it correctly and that they understand the security guarantees it gives them, what they need to do to maintain those guarantees, and that they can detect when the guarantees have failed. During development, doing this in the lab or in the team office is fine, but as the system nears completion, you need to do testing with the real implementation — with all of the quirks that may not have been present in a prototype — and in people's real environments. Ideally, you should work with groups of users who are in a similar cultural position and doing similar kinds of work to the target audience. However, if you're building a tool designed for high-risk users it's important not to put folks at risk with un- or partially-tested code.
Every system carries with it ways of working. It's important to give folks enough time to adapt to the system they're being handed, especially if it works differently than what they're used to. Early in the process it's fine to spend an hour looking at the system with someone, but later on you want to see multi-day tests where people go about their normal business interacting with the system. In some cases, you may want to simulate attacks on the system to see if users understand what's happening and react appropriately. If you're going to do this, make sure your users understand the attacks are simulated and don't represent real hostile acts by their adversaries.
If field trials don't show the system to be usable or show it not giving the security benefits you thought it would, it's back to the drawing board. Please don't release software that your target audience doesn't find useful. Whether it's a major revision or minor tweaks, it's worth it to make sure the system actually works. It's also important to remember that field tests have risks too, and that the people you're working with have rights to data they generate during the test. Treat them and their data appropriately. Finally, it's worth taking note of (and possibly even recording) everything you say to people in the course of field tests, because it may turn out that you're effectively training them in how to use the system. While our goal in many cases should be systems that work with little or no training, it's important, when calibrating security impact, to understand what degree of training (and existing technical background) is required to get certain results.
Congratulations! You've got a finished, tested application that has a positive impact on people's ability to get things done in the world. Now all you have to do is get it to them. Unfortunately, this is hard.
If you've built a web app or are shipping code for a mobile phone or otherwise through an app store, you've already got your answer. You need to take care of the keys that you use to sign builds (and should keep them offline and absolutely not in source control), but beyond that, most of these decisions have already been made for you. Similarly, if most of your users will be installing your code from a Linux distribution's repository, the mechanisms are already in place and you just need to do your part in that interaction. However, in many cases users will need to manually download and install your software. This may even be true in some cases on mobile, especially on Android if users aren't using the Google Play Store.
Fortunately, the basics of this are easy to get right. Your entire site should only be served via SSL, including the place where your installer is downloaded from. Your site needs to be on a host where some care has been taken for host security. You should sign the executable bundle directly so users can verify it has the correct signature on it. Unfortunately, in many cases this means PGP signatures and SHA3 hashes no average user will ever look at. However, not all users are average, and it's still important to provide the option for organizational security teams helping their users. If you're deploying to a platform that supports code-signing for installers directly, do that, and make sure all the individual executables you ship are also signed. Under no circumstances is it ever appropriate to have users pull down unsigned code over unencrypted connections and just run it. Likewise, cutting and pasting complex shell commands from unencrypted web pages is iffy at best.
Your download site should have historical signature data for all the code you've shipped in case someone needs to go back and check whether an old version of a tool they installed was actually compromised. If possible, you want to set up deterministic builds so that someone else with a copy of your application's source code can rebuild it and get something which is bit-wise identical to what you shipped. This lets them verify they're getting a system built from exactly the code you published. If you maintain a public source code repository, it may be worth setting up automation so every single commit is individually signed by its developer. This way, any malicious code found in the repository can be isolated to either an individual bad actor or a single compromised development machine.
Once you've shipped code to people, you'll find bugs or want to add new features and then you'll need to ship them more code. If you're in an app store, again, these choices have all been made for you. If you aren't, you need a way to automatically push new code out. Doing this in a way that doesn't introduce security vulnerabilities or allow you to be forced to compromise specific users is unsurprisingly quite hard. TUF, The Update Framework, solves a lot of the problems of standard code updates, such as users being induced to install stale updates or to rollback security patches. If you choose to use something else, you need to make sure you solve all of the problems it solves.
Designing for coercion resistance is another interesting problem. What will your team do if a government agency that has jurisdiction over you demands that you send out code with a backdoor to a specific set of users via your update system? This is a real problem in many jurisdictions. Deterministic builds and open source help some, as they make these kinds of violations more obvious if someone happens to check that specific update. However, once malicious code has been installed, it's often impossible to reconstruct what happened and very few users will check every update.
Various attempts to fix this problem are underway. One set of experiments are signature verification schemes that make it impossible to create a valid signature for code without the existence of the signature (and possibly the code signed) being visible to all users of the system. Another set are attempts at “gossiping” updates, or propagating them in a peer to peer manner. This tries to ensure it's impossible for the development team to target a specific user without the malicious code “leaking” to other users, one of whom might eventually notice something wrong. Watch this space.
Did you notice during your field trials that people couldn't pick up your new system cold and use it to its full extent? Good — you were paying attention. Almost all new systems require some kind of introduction, whether it's a walkthrough the first time the app is run or a half page of text on the website with some screenshots. Many security critical systems require significantly more training. This is especially true if they're complex or designed to be used under stress or in emergency situations where even simple systems may become confusing. Building systems that require as little training as possible should be a goal of the security design process for most systems, but this doesn't mean it's a reality or that this means no training at all. If a system is being used by targeted or high-risk users, they need a friendly environment to make their initial mistakes in — somewhere where nothing is yet at stake for real. They need this not just for operational reasons, but also because it's hard to learn when you're under stress.
If you have data on what kind of training is required to get people to correctly use your system, it's your responsibility as a development team to create appropriate training material. If you don't have the data, you need to gather it in a hurry. You don't have to deliver the trainings yourself, but working with trainers who come to you is in your best interests. If you do have a complex or emergency-purposed app, it will be useful for you to develop a full training scenario for it and run it enough times that you have a feedback for how both the app and the training work. This can then be used to both improve the application and to give better guidance to trainers working with others.
Don't just have your team deliver trainings themselves — your developers almost certainly don't scale to training your entire potential user population, and training is a separate skillset. The training material for the system you've developed will need to be integrated into larger security or task training curriculums, so you should ask trainers how they talk about concepts and how they structure their training and make sure your material is compatible. In particular, pay attention to flippant use of scary scenarios in describing the security properties of your tool if it's intended for use in high-risk environments. In many cases, your trainees may have some level of PTSD or other trauma, and your documentation can make it harder for them to learn.
If you haven't documented your application and built training to help folks use it, you haven't built an application, you've just built a pile of code. Systems only matter when they're used.
Once you do have people using your systems, they're going to have things to say. Oh boy, are they going to have things to say. As a development team, though, you need to listen to the problems people have. If at all possible, you should have both support teams (possibly paid, depending on your business model) and community management teams available to work with users. The former do two things — they help people use your system successfully, and they figure out what's going wrong and let your development and design teams know. The latter help your users find and help each other, look at what they do with the system that you weren't expecting, and let you know the things users seem to have trouble (or success) with. Any complex tool will evolve a community of practice that helps propagate the working culture implied by how the tool operates, and you want to support that community. Community management is critical for driving adoption, and you do want your tool to be used, right?
If you're shipping your application to users in hostile regions or who are specifically targeted, feedback gets more complicated. High-risk, specifically-targeted users are often busy with whatever's making their lives complicated and don't have time to write detailed bug reports. Also, the mere fact that they're using your tools may be confidential or even possibly illegal in some contexts. Nevertheless, feedback from these users is especially useful for improving the security outcomes your tool yields. Users who see in their lives both the positive and negative security outcomes your tool is providing for may have much more holistic and critical feedback. Taken into account, their feedback may be especially useful in improving the efficacy of your tool. This is where community management and interactions can become quite high stakes, and where a broad network of security trainers can come to your aid. If people are training users in the use of your tool as part of larger security curriculum, not only will those trainers have feedback on what's hard to teach or doesn't work in their specific context, but they'll also often have long-term contact with their users. This contact lets them be the bridge in the feedback chain even across hostile environments.
Often, the feedback you get from users won't be in the form you'd like. It might be nice if everyone opened tickets and provided detailed reproduction steps for every issue they found. However, unless the only people you're expecting to use your system are QA engineers, this will not happen. As a team, you need to meet your users where they are and figure out how to take in and track informal feedback. Responding to someone's desperate plea for help by pointing them to a bug database submission form and telling them you'll accept a working patch isn't just rude, it hurts your development process (and possibly your users).
Sooner or later, someone will find a bug. It may be a bug in your code, it may be a misconfiguration of a server you host, or it may be a vulnerability in a third-party library or tool you depend on. If you're lucky, someone notices it in the code and lets you know privately, you develop a fix, it goes out in your next release, and after it's out you can tell everyone what happened (transparency is important for trust). If you're unlucky, you wake up to pagers going off and screaming users because your services have been compromised and everything's down. If you're really unlucky, the vulnerability is found quietly by folks who don't like you, users get compromised, and, in the high-risk space, a lot of really horrible things happen to a lot of good people before you ever hear about it. This is one of the reasons you need to be open to feedback.
It's mandatory, in 2015, that every shipping app have a security contact that supports some kind of encrypted communication, and that that contact is monitored close to 24/7. If someone reports a vulnerability to you, it's customary that they'll give you enough time to patch and ship that patch to the people who use your system before announcing the bug publicly. However, this will probably only be 30 days at most, and for critical or actively-exploited vulnerabilities it may be less. While negotiation is reasonable, you have a responsibility to meet their timeframe. Taking more than 30 days to patch serious vulnerabilities is not acceptable, regardless of your situation.
When a report of a vulnerability comes in, whether directly to you or via the security mailing list of a system you depend on, the first step is to triage the problem. The more serious it is the more resources you need to put on fixing it. At this stage, you may need to work mostly on the basis of what the report claims. The next stage is verification. The development team looks at the evidence at hand and at the source of their system and attempts to reproduce the vulnerability. If the team can't reproduce the issue, it may be there's no actual bug. However, if there's the possibility of a vulnerability (because this isn't a class of bugs you can guarantee don't affect you at the framework or environment level), it's often worth introducing extra hardening around the potential issue regardless. Making your system more secure than needed is better than a vulnerability you ignored because you couldn't reproduce it. If you have specific reports of users being compromised, it doesn't matter if you can't reproduce the bug exactly — you need to make sure it cannot happen.
Once you've reproduced the bug, you come up with a fix and test your system to make sure everything still works. Automated security and functional testing suites are a critical way to save time here. Next, you document the fix and ship the patches to your users via your update system. Once everything is over, you want to sit down with everyone involved and talk about how the process went. Incident response is a rare event for most smaller teams, so it's worth collecting lessons learned carefully. You'd also like to know how you came to ship that bug in the first place, so you can make sure you stop it from happening again. A shipped security bug may be a good trigger for bringing in an external security team for an additional review or for process change work.
If the call you get is about a compromise in hosted systems, things get more complex. Ideally, you'd like to both preserve the evidence of the compromise on those systems and also get your services up and running again as quickly as possible. It's important not to just wipe and re-image your servers. If you do, you'll have destroyed all of the evidence of the compromise and those re-imaged systems will just be owned again. The process of figuring out, from a running system or from a disk image, what happened to a compromised host is called forensics, and it's its own separate security specialty. Many security consultancies can do forensics work for you, sometimes on a retainer basis. Your local CERT (Computer Emergency Response Team) may also be able to help. Once you've figured out how the attackers got in, you can fix the vulnerability. You also need to make sure that you've completely removed all of their access. Depending on the scale of the intrusion, this may mean re-imaging your entire production (and possibly also test and development) environment(s) with fixed versions of your systems and restoring data from backups. This is one of many reasons that automated deployment and good backups are critical.
So far, all of this is pretty much the same as the traditional application security response process. However, if you're working with high-risk or specifically-targeted users there may be other considerations. In some cases, your application security team may need to coordinate with external emergency response teams and emergency funds that can work to e.g. help get compromised journalists at risk of arrest out of hostile countries or to safe places. Activist groups or NGOs you're working with may have their own internal security teams. While your security team is unlikely to be pulled into their operations, they may want more and different kinds of information from you than you'd be asked to provide under other situations. The degree of cooperation you give them should depend on your available resources and on whether you think you might be putting other users at more risk. These kinds of operational considerations may have some impact on disclosure and fix timeframes.
If you have high-risk users and you're engaging with a CERT or with law enforcement, tread carefully. You have a duty of care to your users to be certain that you're not unintentionally giving information about those users to people who will pass it to intelligence organizations or to their adversaries. In some cases, this may mean choosing to proactively wipe systems before they can be seized, an action which can have serious legal consequences in some jurisdictions. You should consult a lawyer about this at the very least before you bring a system where this might become an issue into production, and this should probably be a consideration during your security design process.
As with all parts of the security process, doing incident response well depends on you having a plan and a process. You should ensure that everyone who may be pulled into a response operation knows what the plan is and what their part in it is. In some cases, you may want to rehearse the communications and coordination scenarios that occur at the beginning of the incident response process, where time is most critical. It's important that you maintain operational security during incidents, so you're better off ensuring that your everyday communication systems are sufficiently secure that you can use them under these conditions.
Development teams themselves have security concerns, as we've touched on in a few places. Access to source code, signing keys, and embargoed security information about vulnerabilities being fixed must be controlled. A full discussion of operational security is out of scope for this piece, but all teams should implement basic security practices like hard disk encryption, off-site encrypted backups, and the use of encrypted communication channels for managing security-related information. Physical access to source control, build, software update, and any production hosting servers should also be controlled. Likewise, all signing keys should be kept offline on devices never connected to a network. If you have a large enough team, you may want two-person control of your keys so no single developer can compromise them. The exact operational security requirements for your team will depend on whether you host services or only provide software to users, on the kinds of users you're working with, and on the kinds of software you're providing them with.
Software, even free software, is a commitment — free as in a puppy, as the saying goes. When you build a system and provide it to a group of users, you need to budget for maintenance and improvements over time. Bugs will be found, libraries will have vulnerabilities, and issues will be found where people can't use the tool the way you thought they would. All of these things need to be fixed. If you're providing a tool to a community, it's important to understand the time-frame on which you're going to be able to support its development and communicate that to them clearly. Suddenly abandoning a project with little warning can cause real harm to a community that's depending on it. If tools are open source, that helps a little bit, but only if a new development team can be formed to take it over. If your tools depend on hosted services, you need to be very explicit with users about the conditions under which those services are provided and about how much lead time you'll give them if you're going to shut down services either temporarily or permanently. Numerous real-world problems have been caused when high-risk users depending on tools operated by people in their community who should have known better suddenly found their systems nonfunctional.
Long-term maintenance has some specific security-related challenges. As developers move on, standards that were not well-documented will not necessarily be followed by maintenance coders. Documentation may fall out of date. New features are not always developed with an understanding of the user experience work and trade-offs that led to the existing feature set. All of these things can have security implications. Worse, as you upgrade libraries or new versions of operating systems come out, behaviors that you've designed your system to rely on may change. If you're lucky, this will break the system in obvious ways which can then be fixed, but subtle security issues may also appear. It's important to not only maintain standards, documentation, and existing tests over time, but to also pay attention to changes coming from new platform or dependency versions and think about how they might introduce security issues to your system. For especially long-lived code, new developments in attack techniques may mean that code which was previously assumed to be secure is suddenly understood to be vulnerable. Periodic re-audits may be necessary to catch this. Cryptographic primitives are another particular concern. As key lengths increase and algorithms are understood to be weaker, your system will need to adapt.
System security is a hard problem. Developing secure software is much more than hiring a couple of folks to write code in a room for a few months and then dropping something on the Internet. Organizations that provide systems to high-risk users have an onus to do due diligence and ensure they're not putting their users at risk. Security is and will always be a process, however, and the most important thing is to start that process today and begin improving the systems you build. Hopefully this roadmap has made that process more legible for you.
If you liked this essay, you can sponsor me writing more. I've started a Patreon where you can pledge to support each essay I write. I'm hoping to put out one or two a month, and if I can reach my goal of having a day of writing work funded for every essay, it will make it much easier for me to find the time. In my queue right now are the last part of this series, a piece on team practices to avoid email malware, more updates to my piece on real world use cases for high-risk users, and a multi-part series on deniability and security invariants that's been in the works for some time. I'd much rather do work that helps the community than concentrate on narrow commercial work that never sees the light of day, and you can help me do just that.