The VERIS Community Database (VCDB)

Information sharing is a complex and challenging undertaking. If done correctly, everyone involved benefits from the collective intelligence. If done poorly, it may mislead participants or create a learning opportunity for our adversaries. The Verizon RISK Team supports and participates in a variety of information sharing initiatives and research efforts. We continue to drive the publication of the Verizon Data Breach Investigations Report (DBIR) annually, where we have an unprecedented number of new data-sharing partners, and we are committed to keeping the report publicly available and free to download. We regularly receive inquiries about our dataset, and our ability to share further, but we are limited in what data we can share in raw format due to agreements with our partners and customers.

Go straight to the data on GitHub

The Problem

While there are a handful of efforts to capture security incidents that are publicly disclosed, there is no unrestricted, comprehensive raw dataset available for download on security incidents that is sufficiently rich to support both community research and corporate decision-making. There are organizations that collect—and in some form—disseminate aggregated collections, but they are either not in a format that lends itself to ease of data manipulation and transformation required for research, or the underlying data are not freely and publicly available for use. This gap has long hampered researchers who are studying the problems surrounding security incidents, as well as the risk managers who are starved for reliable data upon which to base their risk calculations.

Our Contributions to the Solution

To address this problem that has plagued the community, we are pleased to announce the VERIS Community Database (VCDB), which aims to collect and disseminate data breach information for all publicly disclosed data breaches. The data are coded into VERIS format and we also provided the dataset in an interactive visualization available for public use. We encourage you to visit the site and interact with the data. The initial release had just over 1,200 incidents, primarily from 2012 and 2013. Data sources include the Department of Health and Human Services (HHS) incidents, the sites of the various Attorneys General that provide breach notification source documents, media reports and press releases. We intend to continue to augment this dataset to capture as many incidents as possible so that others can benefit. Given the initial makeup of the data, care should be taken when basing decisions on it until it has become more comprehensive and representative. The data are currently biased towards the Health sector since nearly half of the incidents came from the HHS publications. Subsequent updates have brought the dataset to over 3,600 incidents coded and thousands of breaches still waiting to be entered.

We realize that while a graphic interface is useful to most of the potential consumers of this data, some would like to be able to get the details of each incident. We are also releasing the same data in JSON format from our GitHub repository. There you will find individual JSON files for each incident in the dataset, including the original URLs we used when coding the cases.

What can you do with this data?

You can “ask” it questions—think of something you’d like to know and start looking into the data to answer the question. Prove or disprove an assumption you have made in your own work. You can make direct comparison between the findings in the DBIR and the public data to see how they differ. You can filter by industry and organization size and see how your organization stacks up against companies of the same size and industry. If you use VERIS in your workplace, you can make comparisons against your own data as well (which is a good reason to look at adopting VERIS). Eventually, this will become a rich, freely available data source for conducting this type of ad hoc research.

So try it out—click on the graphic above to launch the interactive visualization. This is a work in progress and we are committed to making the resource investment to keep it updated regularly. Note the tabs across the top—they show more detailed views of each of their respective topics. We will be adding additional views as we develop this resource, and we’d love to hear from you—the users of this tool and data—of additional views you’d be interested in.

Future Work

We have plans to eventually branch out this dataset and also capture more than just data breaches—which are defined as incidents that compromise the confidentiality attribute of the data in question. We are interested in looking at all types of security incidents and continue to disseminate our data publicly to the community. Ideally, we will be able to build a dataset that captures the incidents that impact any of the security attributes in VERIS. If you are interested in getting involved, let us know—we want to encourage community involvement in this project.