OPEN DATA HANDBOOK California Health and Human Services Agency

3. Guidelines

Publication Guidelines for CHHS Departments and Offices

Publishing data on the CHHS Open Data Portal involves a collaborative multi-step process (see Figure 3: Guidance Summary). In identifying publishable state data, State entities should include analyses from their executive and program staff, data coordinators, PRA officers, data stewards, IT, public information officers, security and privacy officers, and legal counsel.

CHHS departments and offices vary widely in terms of size, personnel, functions, responsibilities, mission, and data collected and maintained. As such, the identification and prioritization processes may vary across entities. These guidelines serve to provide assistance across a broad spectrum of State entities, with the stipulation that State entities look to their governing laws, rules, regulations, and policies in identifying and making available publishable state data.

Figure 3: Guidance Summary

Figure 3: Guidance Summary

1. Data Table Identification

Within a CHHS Department or Office, any number of individuals can and should consider identifying data tables for which they may self-identify as stewards of that data. In addition, subject matter experts and leaders within the Department or Office may also identify data tables that could fulfill strategic needs by sharing on the CHHS Open Data Portal. After identification, all suggested data tables should be assessed and prioritized

2. Data Table Assessment/Prioritization

In creating a data catalog for the CHHS Open Data portal, departments should assess the suggested data tables for value, quality, completeness, and appropriateness in accordance with the definition of publishable state data. High value data are those that can be used to increase the State entity’s accountability and responsiveness, improve public knowledge of the State entity and its operations, further its mission, create economic opportunity, or respond to a need or demand identified after public consultation.

Sections A and B below are neither exhaustive nor applicable to all State entities, but rather serve to provide a framework for identifying potential data tables for publication on the CHHS Open Data Portal. For each question in Section A, State entities must assess whether the data fall within the definition of publishable state data and respective disclosure considerations.

A. General Questions to Identify High Value, High Priority Publishable Data Tables

What “high value” data are currently publicly available?

Departments and offices may already publish a considerable amount of data online; however it may not necessarily be accessible in bulk, or available through machine-readable mechanisms. Reviewing weekly, monthly, or quarterly reports which are frequently accessed by the public, or public-facing applications that allow visitors to search for records, are excellent starting points.

What underlying data populate aggregate information in published reports?

Published reports are often populated with data which is compiled or aggregated from internal systems. For example, a weekly public report may indicate that a department has closed 25 projects in that week. The internal system, which has details of each case, may have additional details which can be made public.

What data do State entity programs use for trending and statistical analysis?

Similar to published reports, trend and statistical analysis is often performed using data from various sources. Those sources can be reviewed for data which can be made public

What data is the subject of frequent PRA requests? What data is the public or news media requesting?

There are multiple methods by which the public requests data from State entities. For example, some PRA requests may seek to obtain data tables or records which are to be provided in digital format. These requests (particularly repeated requests for the same data table) might be fulfilled by making the data table(s) available on the CHHS Open Data Portal.

What data are our different stakeholder groups interested in?

Consider engaging with the public for feedback. Options for obtaining public feedback may include, but are not limited to leveraging existing channels for public engagement and community feedback. Connecting with citizens and developers could ensure that data releases are maximally impactful. In addition, the CHHS Open Data Portal could provide a mechanism for constituents to request data not yet published.

What data are frequently accessed on the department or office’s website?

Website traffic and trends analysis will determine frequently accessed data.

What data have not been previously published but meet the definition of “high value”?

Publishable state data that can be used to increase the covered State entity’s accountability and responsiveness, improve public knowledge of the entity and its operations, further the mission of the entity, create economic opportunity, or respond to a need or demand identified after public consultation.

Do data further the core mission or strategic direction of the department or multiple government entities?

Publishing aggregated data (statistics, metrics, performance indicators) as well as source data can often help a department advance its strategic mission. In addition, the CHHS Open Data Portal will serve as a conduit for efficiently sharing information with other departments.

Do the data highlight State entity performance, or might publication of the data benefit the public by setting higher standards?

The department or office might be in the forefront of standards for government performance, where exposing the data might cause other State entities to raise their performance.

Does availability of the data align with federal initiatives or release of federal data?

There may be higher value in the department’s data if synergies exist with federal data efforts.

Do the data support decision making at the state, local, internal or other external entity level, or contain information that informs public policy?

Publishing such a data table publicly can be a powerful method for fostering productive civic engagement and policy debate.

Does availability of the data align with legal requirements for data publication?

There may be statutorily required reporting which can be satisfied by publishing data tables, without necessarily producing an additional extensive narrative report. If the data are collected and compiled by the department to fulfill statutory reporting requirements, then the department’s governing laws have already determined that the data are of high value for that department.

Would availability of the data improve department-to-department communication? Certain government functions may involve multiple departments requiring access to similar data. Making the data available would support administrative simplification and efficiency.

Could availability of the data create specific economic opportunity?

In many cases, this will be unknown to the department in advance. Some of the greatest successes of the open data movement have involved government data being commercially appropriated in useful ways, such as weather data. To the extent the department can anticipate significant commercial use of the data, the department may wish to prioritize publication of such data more highly as it creates its schedule.

Could the data be used for the creation of novel and useful third-party applications, mobile applications, and services?

Software applications often leverage data from multiple sources to provide value to their customers. Making department data tables available can support the delivery of greater value (and impact) through those applications.

Are the data needed by the public after-hours?

Generally when there is demand outside normal business hours (that is known and quantifiable), such data tables should be ranked, where applicable, as high value.

Do the data have a direct impact on the public?

The data are likely of higher value if it is already apparent that there is a deep impact and interest by the public (e.g., public safety inspection results).

Are the data of timely interest?

Announcements of progress or success – or reactions to public criticism - can be strongly supported by publishing related data, should it exist.

B. Do the data tables represent discrete, usable information?

In identifying data tables, State entities may be concerned that users of the CHHS Open Data Portal will not understand their data or, if distilled to its most raw form, the data might lose utility. There are no hard and fast rules about what level of detail is sufficiently granular to add value to a government data table. Whenever possible, State entities should resist the temptation to limit data tables to only those the department or office believes might be understood or useful. Entities should be wary of underestimating the users of the CHHS Open Data Portal. CHHS Open Data Portal users may come from a variety of fields and specialties, who can envision a use for the data not anticipated by the state entity. A better practice (as described in the section on Pre-Publication) is for State entities to ensure that the metadata associated with each data table is complete, including comprehensive overview documents describing the data, uniform data collection, data fields, and the suggestion of potential research questions to maximize the usefulness of the data.

Prioritization

When creating a schedule for publication of a particular data table, departments and offices must make an assessment based upon a number of factors. State entities should use the general guidance below (in conjunction with the Data Prioritization Survey) to determine the priority for each data table. Prioritizing initial and ongoing publication will entail balancing high value data with the data tables’ level or readiness for publication. Each State entity shall create and provide schedules prioritizing data publication in accordance with the guidelines set forth herein. Prioritization shall be done in a timely manner, recognizing that it may take time for departments to prepare high quality data (noting that data tables vary in complexity and, as such, can significantly vary in preparation time). Approvals for the prioritization plan and scheduling will come from the department/office executive leadership team.

In prioritizing data for release, therefore, departments and offices must account for time to: identify data, assess and validate the data (i.e., ensure consistency, timeliness, relevance, completeness, and accuracy of the data), ensure completeness of the metadata and data dictionary, prepare visualizations and talking points, and obtain all necessary approvals to publish the data (Figure 4). The CHHS Priority Scoring Template can help departments and offices prioritize open data for publishing.

Figure 4: Prioritization

Figure 4: Prioritization

3. Pre-Publication

Prior to publishing a data table on the CHHS Open Data Portal a number of steps must first be completed to ensure a high quality and usable product. Data tables will be formatted in a machine-readable format. CHHS Departments and Offices have chosen Comma Separated Values (CSV) as its standard format for publication. Accompanying the data table will be complete metadata and a data dictionary that provides descriptions and technical notes as necessary for every field in the data table. Departments and Offices are also encouraged to include with each data table one or more visualizations of the data (graphs and/or maps) as well as one or more potential research questions of interest to the Department/Office as a way to encourage public engagement and innovation related to strategic goals. Each data table, as a part of the approval process, will be reviewed for quality assurance, compliance with the CHHS Data De-Identification Guidelines, and consistency of the data over time.

4. Publication

The publication process, initially, involved the development of an open data portal website that fulfilled all of the requirements of the Department/Office. Considerations included branding, usability, design, accessibility (e.g. Americans with Disabilities Act compliance). Each data table being published on the portal requires appropriate categorization and tags (key words) to provide ease in searching for the data to ensure facility in searching for the data. Furthermore, Departments and Offices may consider for each dataset sharing its publication via social media, a press release or other communication method.

Standardization

The way data consumers interact with and use the CHHS Open Data Portal is greatly influenced by the way the data are published. The CHHS Open Data Portal requires departments and offices to present the data in a machine-readable format (CSV) to enable software tools, applications and systems to process it. However, there are many different types of standardization that can be found within the CHHS Open Data Portal including: metadata, data dictionary, file naming conventions, demographic categories, and navigational categories and tags. Wherever possible, standards and associated guidelines have been developed to ensure consistency and facilitate automation and reuse of the data.

Metadata

The portal will support a common and fully described core metadata scheme for each hosted data table and Application Program Interface (API) within the data catalog. API refers to the method of how one software component instructs another software component to interact. The metadata scheme will allow data publishers to classify selected contextual fields or elements within their data table as well as adhere to common Meta attributes that have been identified portal-wide, empowering the data consumers to build automated discovery mechanisms at a granular-level. Using a common metadata taxonomy will allow CHHS Open Data to convey and increase discoverability of high-value data tables.

Open Data adheres to core components of the Dublin Core standard for metadata (http://www.dublincore.org/documents/dces/). The ability to search and find information is enhanced by the adherence to metadata standards required with each data table. Metadata includes subject categories and keywords which provide for more precise searching and document management. Adoption of the Dublin Core, together with standards for CHHS Open Data, maximizes adaptability and interoperability.

The Dublin Core Metadata Initiative (DCMI) is a non-profit organization hosted at the National Library Board of Singapore. Its lists of elements, glossary, and frequently asked questions (FAQs) were last revised in 2005, but an effort to update its User Guide is being developed at the wiki page http://wiki.dublincore.org/index.php/User_Guide. CHHS Open Data uses the current set of elements, which are required to accompany each data table.

Descriptive Information

CHHS Open Data serves as a portal to present machine-readable data, so that end-users may process, access, discover, extract and combine data elements to reveal new insights, observations, and utility regarding the data. In furtherance of CHHS’s commitment to high quality, CHHS Open Data requires departments to submit metadata and supplemental documentation with each data table (e.g., data dictionaries, overview documents, etc.). This ensures data are fully described to maximize the public’s understanding and interpretation of the data and facilitates interoperability.

Categories, Tags and Keywords

The CHHS Open Data Portal supports a model that allows data publishers to identify data tables as belonging to a broad category (e.g., health and human services, public safety, and education). Then, using a schema that includes both standardized and category-specific tags and keywords, the CHHS Open Data Portal helps data consumers to search and retrieve data tables readily and uniformly.

Data Standards

The CHHS Open Data Portal supports the following formats:

  • Tabular Data: Comma Separated Values (CSV), MS Excel file extension (XLS)
  • Geographic Data: Geospatial data are usually organized as a collection of features that define a layer. Layers can be overlaid on top of one another, allowing visualization of spatial relationships, spatial queries, and analysis. The Open Data Portal supports two data formats for geospatial information (tabular or shapefile). The appropriate format is dependent on the specific characteristics of the underlying geographic data:
    • Points: Tabular file format or shapefile. Tabular formatting of points requires either columns for latitude and longitude, or complete address information (house number, street, village/town/city, state, and ZIP code) that can be geocoded.
    • Lines: Shapefile.
    • Polygons: Shapefile.

A shapefile is actually a collection of several files with the same file name, but differing extensions. For the CHHS Open Data Portal, each shapefile should contain (at a minimum) the following files:

  • .shp: defines the geometry (shapes)
  • .dbf: defines the attribute table
  • .prj: projection, ensures the feature locations are accurately rendered on the map
  • .shx: shape indexing file, for efficient processing

Note: Shapefiles which use projections other than WGS-1984/Web Mercatur will require conversion which may result in a minimal loss of accuracy. In some cases this conversion can be handled by the Open Data Portal; in other cases it must be done by the participating department.

Other supported geospatial formats include Keyhole Markup Language (KML/KMZ).

Geocoding

The CHHS Open Data Portal supports geocoding services which convert human-readable address information into map coordinates (i.e. latitude and longitude).

Updates to Published Data tables

Data tables on the CHHS Open Data Portal must be kept up-to-date. Specific guidance regarding updates will be addressed in technical and working documents as they are developed. Four mechanisms are supported for refreshing a data table.

  • Replace: All existing records are removed and new records are inserted
  • Append: New data table records are inserted
  • Update: Existing records are modified
  • Delete: Existing records are removed

Each department or office will be responsible for updates to their data tables based on their internal data governance model. Periodic internal review for the participating department or office is highly recommended. The posting frequency for updates is included in the metadata for each data table and indicates how often the data table will be refreshed (e.g., annually, monthly, daily).

Narrative Content

While the concept of open data is best suited to tabular and geographic data tables, we anticipate that there may be a desire to access narrative types of content. Currently, if a department or office develops extensive narrative reports about published data, then those reports should be accessed via the department’s website. The department or office may choose to provide a link to the associated published data table on CHHS Open Data Portal (which departments and offices must keep current). If opportunity arises to provide narrative content on the CHHS Open Data Portal, all due consideration will be given.