unlike words carved in stone it can be deleted or get corrupted
to access information and a fundamental part of human heritage
preserves software source code for present and future generations
We collect and preserve software in source code form, because software embodies our technical and scientific knowledge and humanity cannot afford the risk of losing it.
Software is a precious part of our cultural heritage. We curate and make accessible all the software we collect, because only by sharing it we can guarantee its preservation in the very long term.
We harvest publicly available source code from many software projects and keep up with development happening there. As of today our archive already contains and keeps safe for you:
In-browser navigation of the content of the archive is available via the Software Heritage web application.
The web application allows to search which software origins (repositories, source packages, etc.) we have already archived and when we have visited it, implementing a “wayback machine” for source code. Once an origin of interest has been identified, the web app then allows to browse through it as you usually do with version control system browsing interfaces.
Programmatic access to the content of the archive is available via the Software Heritage API.
The API allows to navigate the archive as a graph of development-related objects, such as file contents, directories, commits, releases. With the API developers can lookup individual objects by their IDs, retrieve their metadata, and jump from one to another following links — e.g., from commits to the corresponding directories or parent commits, from releases to released commits, etc. The API also allows to retrieve crawling information, such as tracked software origins and the full list of visits performed on each of them. This allows, for instance, to know when snapshots of a specific Git repository where taken and, for each of them, where each branch was pointing at the time.
Software is so pervasive in our lives that its preservation concerns all of us. Our mission and the archive we are building will serve the needs of the many, from cultural institutions to scientists and industries.
Everyone can help us achieving these ambitious goals.
Software is an important part of human production. It is also a key enabler for salvaging our entire digital heritage.
We collect, preserve, and make accessible source code for the benefits of present and future generations.
Science relies more and more on software. To guarantee scientific reproducibility we need to preserve it.
Amassing source code at this scale will be challenging, but will also enable the next generation of software studies.
Software is present in all industrial processes and products.
The universal source code archive we are building will help industry with provenance tracking, long-term archival, and software bill of materials.