Introduction
It has been estimated that Free and Open Source Software (FOSS) constitutes 70-90% of any given piece of modern software solutions. FOSS is an increasingly vital resource in nearly all industries, in the public and private sectors, among tech and non-tech companies alike. Therefore, ensuring the health and security of FOSS is critical to the future of nearly all industries in the modern economy.
In February of 2020, The Linux Foundation’s Core Infrastructure Initiative (CII), in partnership with the Laboratory for Innovation Science at Harvard (LISH), released the preliminary results of an ongoing study ‘Vulnerabilities in the Core,’ a Preliminary Report and Census II of Open Source Software.` This report represents the first steps towards understanding and addressing structural and security complexities in the modern-day supply chain where open source is pervasive, but not always understood.
The initial report from the Census II study identifies the most commonly used free and open source software (FOSS) components in production applications. It begins to examine the components’ open source communities, which can inform actions to sustain the long-term security and health of FOSS. The stated objectives were:
- Identify the most commonly used free and open source software components in production applications.
- Examine for potential vulnerabilities in these projects due to:
-
- Widespread use of outdated versions;
- Understaffed projects; and,
- Known security vulnerabilities, among others.
- Use this information to prioritize investments/resources to support the security and health of FOSS
What did the Linux Foundation and Harvard learn from the Census II study?
The study was the first of its kind to analyze the security risks of open source software used in production applications. It is in contrast to the earlier Census I study that primarily relied on Debian’s public repository package data and factors that would identify the profile of each package as a potential security risk.
In order to gain a better understanding of the commonality, distribution, and usage of open source software used within large organizations, the study used software composition analysis (SCA) metadata supplied by Snyk and Synopsys. SCA is the process of automating visibility into any software, and these tools are often used for risk management, security, and license compliance.
With this metadata, the study was able to create a baseline and unique identifiers for common packages and software components used by large organizations, which was then tied to a specific project. This baselining effort allowed the study to identify which packages and components were the most widely deployed.
The top-scoring, most widely deployed projects were the ones that came under additional scrutiny and became the prime focus of the preliminary study, which were examined for the total lines of code, total number of contributors, and frequency of commits during the 2018 calendar year.
Observations and analysis of these specific metrics led the study to come to certain preliminary conclusions. These were:
Software components need to be named in a standardized fashion for security strategies to be effective. The study determined that a lack of naming conventions used by packages and components across repositories was highly inconsistent. Thus any ongoing effort to create strategies for software security and transparency without industry participation would have limited effect and slow such efforts.
Developer accounts must be secured. The analysis of the software packages with the highest dependents found that the majority were hosted with individual (personal) developer accounts. Lax developer security practices have considerable implications for large organizations that use these software packages because they have fewer protections and less granularity of permissions that are associated with them. For example, organizational accounts frequently employ the use of multi-factor authentication (MFA), which individual developers might not, potentially exposing larger organizations to attack.
Project atrophy and contributor abandonment is a known issue with legacy open source software. The number of developer contributors who work on projects to ensure updates for feature improvements, security, and stability decreases over time as they prioritize other software development work in their professional lives or decide to leave the project for any number of reasons. Therefore, as time goes by, it is much more likely that these communities may face challenges without sufficient developers to act as maintainers.
Legacy open source is pervasive in commercial solutions. Many production applications are being deployed that incorporate legacy open source packages. This prevalence of legacy packages is an issue as they are often no longer supported or maintained by the developers, or they have known security vulnerabilities. They often lack updates for known security issues both in their codebase or in the codebase of dependencies they require to operate. Legacy packages present a vulnerability to the companies deploying them in their environments. In essence, it means they will need to know what open source packages they have used and where so that they can maintain and update these codebases over time.
What tools exist to understand better and mitigate potential problem areas in open source software development?
The Linux Foundation’s community and other open source projects initiatives offer important standards, tooling, and guidance that will help organizations and the overall open source community gain better insight into and directly address potential issues in their software supply chain.
Understand the vulnerability vectors of your software supply chain
Concurrent with the publication of the findings of the Census II study is the Open Source Supply Chain Security Whitepaper. This publication explores vulnerabilities in the open source software ecosystem through historical examples of weaknesses in known infrastructure components (such as lax developer security practices and end-user behavior, poorly secured dependency package repositories, package managers, and incomplete vulnerability databases) and provides a set of recommendations for organizations to navigate potential problem areas.
Focus on building security best practices into your open source projects
For open source software developers, the Linux Foundation develops and hosts the Core Infrastructure Initiative’s Best Practices. This initiative was one of the first outputs produced as a result of the Census I, completed in 2015. Since that time, over 3,000 open source software projects have engaged, started, or completed the process of obtaining a CII Best Practices Badge.
Projects that conform to CII best practices can display a badge on their GitHub page, or their web pages and other material. In contrast, consumers of the badge can quickly assess which FLOSS projects are following best practices and, as a result, are more likely to produce higher-quality and secure software. Additionally, a Badge API exists that allows developers and organizations to query the state of CII best practice score of a specific project, such as Silver, Gold, and Passing. This means any organization can do an API check in their workflow to check against the open source packages they’re using and see if that project’s community has obtained a badge.
More information on the CII Best Practices Badging program, including background and criteria, is available on GitHub. Project statistics and criteria statistics are available. The projects page shows participating projects and supports queries (such as a list of projects that have a passing badge).
Gain better insights into the community developing your open source software
We encourage organizations and projects to join the CHAOSS community, whose work on tooling and risk metrics was leveraged in the Census II study. CHAOSS is a Linux Foundation project which is focused on creating analytics and metrics to help define the health of the software community. The community is working on open source tools, including:
Augur, which is a python library and REST server that is used to mine metrics from git transactions, such as the number of committers and commits over a historical period.
Grimore Lab is an open development analytics platform that allows for automatic and incremental data gathering from almost any tool related to contributing to open source development, such as for source code management, issue tracking systems, forums, mailing lists, and others. It also provides a data visualization dashboard that allows for filtering by time range, project, repository, and contributor.
Equally as important, the CHAOSS community has brought academics and corporate practitioners of open source best practices to develop common CHAOSS Metrics that any organization can adopt and begin using. The Metrics project includes metrics focused on Risk (including Security), the Evolution of a project, and many more useful metrics leading open source organizations to monitor their open source software supply chain.
Gain better insight into the open source software being used in your organization
The second major initiative by the Linux Foundation is Automating Compliance Tooling (ACT), which was launched in 2019 and comprises five major projects, which as of today include:
FOSSology an open source license compliance software system and toolkit, allowing users to run license, copyright, and export control scans from the command line. A database and web UI are also provided to provide a compliance workflow. License, copyright, and export scanners are tools available to help with compliance activities.
OSS Review Toolkit (ORT) enables highly automated and customizable Open Source compliance checks the source code and dependencies of a project by scanning it, downloading its sources, reporting any errors and violations against user-defined rules, and by creating third-party attribution documentation. ORT is designed for the CI/CD world and supports a wide variety of package managers, including Gradle, Go modules, Maven, npm, and SBT.
QMSTR, Also known as “Quartermaster”, this tool creates an integrated open source toolchain that implements industry best practices of license compliance management. QMSTR integrates into the build systems to learn about the software products, their sources, and dependencies. Developers can run QMSTR locally to verify outcomes, review problems, and produce compliance reports. By integrating into DevOps CI/CD cycles, license compliance can become a quality metric for software development.
Software Package Data Exchange (SPDX) is an open standard for communicating software bill of material information (SBOM) that supports accurate identification of software components, explicit mapping of relationships between components, and the association of security and licensing information with each component.
Tern is an inspection tool to find the metadata of the packages installed in a container image. It provides a deeper understanding of a container’s bill of materials, so better decisions can be made about container-based infrastructure, integration, and deployment strategies.
The ACT projects are also working with initiatives within other open source communities to support accurate identification and sharing of software metadata in the ecosystem. Of particular note are:
Clearly Defined, which is part of the Open Source Initiative (of which the Linux Foundation is an Associate member), is a shared repository of licensing provenance information through a common database that helps organizations understand the risks associated with using open source software.
Software Heritage, which is being developed in collaboration with UNESCO, is committed to collect, index, preserve and make readily available the source code of the software that lies at the heart of our culture. As with the Internet Archive (“Wayback Machine), which seeks to preserve the history of the Internet and its content, Software Heritage is a historical preservation of open source software and packages that anyone can search and browse through.
Conclusion
The preliminary findings of the Census II study strongly indicate that open source projects require supporting toolsets, infrastructure, staffing, and proper governance to act as a stable and healthy upstream project for your organization.
The Census II study shows that even the most widely deployed open source software packages can have issues with security practices, developer engagement, contributor exodus, and code abandonment.
Addressing these challenges is where an organization like the Linux Foundation and other nonprofit organizations can offer significant assistance and support for the project community using a low overhead model. The Linux Foundation is uniquely suited to not just providing tools, but also the governance, fundraising, and support programs that critical open source projects require in order to maintain a stable, secure and reliable release model. These support programs include:
-
- Providing funding support through membership models or securing one-off contributions through crowdfunding, leaving the complexities of managing the legal entity, financial oversight, and regulatory filings to professionals that are highly experienced and dedicated to their administration.
- Providing base policies that offer a known framework for commercial organizations to collaborate, including an antitrust policy, trademark policy, templates for a code of conduct, and more.
- Providing entity management for maintaining the core administrative support infrastructure that enables communities to interact, including hiring leadership and community support personnel, in order to facilitate and guide projects on an ongoing basis.
- Supporting community events for face to face opportunities, as well as marketing and communications support to grow a project’s community audience and help people learn about the great things they contribute to.
- Eliminating the burden of managing software releases through hiring neutral release engineers that support the maintainers.
- Providing a free platform in the form of CommunityBridge to address common challenges with fundraising, mentoring, security vulnerability scanning, and managing automated Contributor License Agreements (“CLAs”).
- Providing training and professional certification support that enables building an ecosystem of skilled professionals in order to use, implement, and manage solutions based on a project’s technology.
- Providing support for license compliance, export control, and security by the routine scanning of project repositories in order to help the community identify license and security problems before an official release proliferates issues to downstream users.
In summary, the Linux Foundation supplies communities with a repeatable, proven governance model as well as value-added support programs to help communities maintain and scale. The ultimate goal is that our communities become healthy upstream projects that your organization can rely on as secure, and well-maintained upstream open source projects in your software supply chain.