OPEN DATA HANDBOOK California Health and Human Services Agency

4. Disclosure

Disclosure Considerations

CHHS collects, manages, and disseminates a wide range of data. As departments classify data tables and catalog their publishable state data, they should be mindful of legal and policy restrictions on publication of certain kinds of data. Following are general guidelines regarding disclosure to consider as departments begin to identify and review data tables.

The CHHS Data Subcommittee commissioned the development of Agency-wide guidelines to assist departments in assessing data for public release. The CHHS Data De-Identification Guidelines are focused on de-identification of aggregate or summary data. Aggregate data means collective data that relates to a group or category of services or individuals. The aggregate data may be shown in table form as counts, percentages, rates, averages, or other statistical groupings. Refer to the CHHS Data De-Identification Guidelines for the specific procedures to be used by departments and offices.

Security, Privacy, Regulatory & Aggregate Data

The public release of some department data might result in the violation of laws, rules, or regulations. Some data may not be appropriate to release because it can compromise internal departmental processes, such as procurement. Other data may contain personally identifiable information. Finally, even if detailed data appear innocuous, it may be possible to combine it with other public information to reveal sensitive details (commonly known as the mosaic effect). Before disclosing potential personally identifiable information or other potentially sensitive information, departments and offices must make a ‘best effort’ to consider other publicly available data – in any medium and from any source – to determine whether some combination of existing data and the data intended to be publicly released present any risks or would make the publication inappropriate.

Before disclosing potential personally identifiable information or other potentially sensitive information, departments and offices must consider other publicly available data – in any medium and from any source – to determine whether some combination of existing data and the data intended to be publicly released present any risks or would make the publication inappropriate. Common kinds of data with personal information include: real estate records, individual licensing databases (MD, RN, contractors, lawyers, etc.), marriage records, news (and other) media reports, commercially available databases (data brokers, marketing), court documents, etc. See the ‘Publicly Available Data’ section in the CHHS Data De-Identification Guidelines for more information.

Even if there are no legal impediments to publishing the data, releasing it may have unintended or undesirable effects. For example, posting anonymized arrest records on a weekly basis might inadvertently reveal where police are concentrating enforcement efforts.

Thresholds

Various statutes and regulations, such as HIPAA and California’s health information privacy laws, have very exacting requirements for determining whether data have been sufficiently de-identified so as not to compromise individual privacy. For example, the presence of medical conditions by geographic location might constitute high value, useful, and sought-after data; however, exposing it might identify individuals and their medical conditions.

Another example is the Family Educational Rights and Privacy Act of 1974 (FERPA). Under FERPA, the Federal Government has established guidelines for data privacy to prevent individuals from being identified indirectly from aggregation of data. Departments that deal with student educational data should be aware of guidelines that restrict publication of some data.

Even in the absence of specific legal prohibitions, government entities should beware of outlier conditions or rare events that could lead to identification of individuals. For example, identifying a single arrestee who is a minor of a certain age in a certain county without providing any other information, might nonetheless serve to identify that particular individual.

All data needs to be assessed for potential risk of identification of individuals represented in the data for whom there are laws that protect the privacy of those individuals. Laws include both federal and state laws. In order to assist departments and offices in this process, the CHHS Data Subcommittee commissioned the development of the CHHS Data De-identification Guidelines. These Guidelines discuss various methods for assessing potential risk associated with data sets proposed for release and various statistical methods that can be used to mask data and protect individuals from being inappropriately identified in the data tables. For example, if a cell in a particular data table goes below a certain number of individuals, the value in that particular cell may be hidden. It is important to balance desires to publish accurate, complete, and valuable tabulations against the need to guard against unwarranted invasions of personal privacy. Refer to the CHHS Data De-Identification Guidelines for the specific procedures to be used by departments and offices to assess data for public release.

PRA Applicability

Under the Public Records Act the presumption is that government records shall be open to the public, unless excludable under a narrow set of specific exemptions including such concerns as invasion of personal privacy, impairment of contractual or collective bargaining negotiations, exposure of protected trade secrets, interference with law enforcement or judicial proceedings, endangering life or safety, and others. Government entities should confer with their PRA officers for advice as to whether a data table might cause the harms described in the PRA law, and therefore would not constitute “publishable state data” for the CHHS Open Data Portal.

Ownership Rights

In some circumstances, a CHHS department or office may not possess all the necessary rights to be able to publish a specific data table. For example, if the data were collected or compiled by a third party, there may be a contractual or intellectual property limitation which prevents it from being made public. Another example would be when a data table includes a partial data table collected or compiled by a third party. In these cases, the appropriate permission must be secured from the sourcing entity, and additional disclaimers may be required. Departments and offices should ensure that their legal counsel is aware of a potential ownership issue and/or that the data were compiled or collected by a third party when vetting a data table through the approval process.

Other Considerations

Organizational Resistance: This could be a risk in terms of deployment costs and the time it takes to implement an open data portal. The experience of other states and several counties is that no additional human resources have been required to implement and maintain an open data portal. CHHS has chosen a vendor-based product that is anticipated to make deployment as easy as possible.

Inaccurate Data: Despite the participating Department’s best efforts, it is possible that some data will be inaccurate and analyses may turn up issues that the public was unaware of and the press covers. When any concerns about inaccurate data are brought to the attention of the participating Department, the department will look into the matter and corrections will be made as appropriate.