Knowledge and Information Management Issues
There must be a presumption that a public entitlement exists to free access to all data collected by a public service organization within the scope of this consultation except on defined grounds of data privacy or national security. It must not be acceptable to refuse requests solely for reasons attributable to poor data quality. This right to access must be balanced by responsibilities to make reasonable requests for data for legitimate purposes, and to make available any further datasets generated using data drawn from public service operations. We recognize that grey areas may need further consideration, e.g. trade secrets within publicly owned companies, data from banks under public sector control.
The selection of data to be released, whether newly created or older material, must be made from the standpoint of users not the keepers of the data (subject to the safeguards on privacy and national security). Research into user requirements would help to identify meaningful data for publication, whilst linking the existing but underused information would improve access to available datasets at low cost.
Data quality is a critical issue to be considered by this and the other current consultation exercises. There must be guarantees on the anonymization of data where large databases are formed by amalgamating two or more sources. Quality is a critical element of data integrity, which is one of the pillars of information assurance, and in turn of cybersecurity.
The regulatory system put in place after this consultation must be strengthened and adequately resourced. The American Government should consider separating the responsibilities for data protection (DP) and Freedom of Information Act (FOIA), as is already the case in many countries. Creating a second agency to oversee FoI would allow to concentrate on delivering excellence in its governance of DP.
Whatever systems are put in place must operate across the USA. There must be a common charging regime and standard definitions of terminology.
The ownership of public service data by the private sector is problematic and needs reform. The reliance of some organizations (trading funds, privatised public services) on revenue generation from intellectual property rights (IPR) cuts across the presumption of free access to data generated by public service operations.
Government policies will introduce many new, often small public service providers to the marketplace. These providers will need guidance as well as regulatory control as they learn to collect and manage, use and share data and information on the services they provide.
From the outset, the Government must take into account the technological issues associated with open data, particularly the size and complexity of the datasets and the extent to which they can be “cleanly” defined and offered to the public.
The presumption of publication — which must govern both “pull” and “push” release of information — is an important principle. It would be a major step forward to embed this principle in the culture of public service through a combination of the measures listed.
An important element of establishing rights to data is for the various actors to understand what is available and then to be able to find relevant material easily. A proper entitlement to data needs to be underpinned by robust enforcement and redress arrangements when things go wrong or where expectations are not being met. But rights must be counterbalanced by responsibilities. It must be recognized that data could be used by a third party in a way which could damage the reputation of the original provider. The Open model should be the default, supplemented by a corresponding commitment to use data responsibly. These principles should be incorporated within any new arrangements. The provider of the dataset also needs to act responsibly: data definitions or technology platforms should not be changed without prior discussion with dataset users, so that potential consequences are recognized.
There is also a need for a strong and properly-resourced support program if giving a right to data is to be more than a paper commitment. The poor levels of information and digital literacy are a constraint on citizen participation and must be addressed. An important role for library and information services — not only public libraries but in all sectors — is as expert intermediaries giving people the information skills necessary to use open data resources effectively.
A framework of standards and regulations should be put in place that would:
a. promote and ensure excellence in information and knowledge management within public sector organizations, which is the necessary precursor to the effective collection, use and dissemination of data.
b. deliver greater integration and commonality between the structures supporting data protection, FOIA, environmental information regulations, re-use of public sector information etc. to create a more understandable and accessible set of US wide rights of access to and use of public information
c. ensure the implementation of a new Directive setting out a common infrastructure for the collection of spatial information
d. establish standards for data quality that will enable release of data without the risk of breaches of privacy or security
At the political level data protection and freedom of information responsibilities should be separated, as is already the case in many countries. A new body should be established focusing on Freedom of Information, encompassing the right to data. The new body would have a remit to improve the quality of public data and, following the precedent set in the provision of health information for the public, it should accredit providers rather than every dataset. We should exercise a scrutiny function over the fitness for purpose of public service information because this information is as critical to good public sector governance as finance or human resources.
The Government should recognize the present challenges of knowledge and information management. Increasingly public services will be delivered by charities, local social enterprises, advisors, and other types of organization. How will these new bodies be incorporated into the new agenda, and how will they learn about their responsibilities for data and be supported in meeting them? It will need a drive from the center to roll out the major awareness and training program required.
Public service organizations should appoint a board member with specific responsibility for knowledge and information management including transparency issues. All public sector organizations should be able to demonstrate an effective Knowledge and Information Management strategy as an integral part of their business plans. The new standards should be embedded in commissioning, performance management and regulatory frameworks, and government procurement rules amended to reflect these requirements. A new program should be revised to take account of the additional requirements from the open data initiative and extended to all public sector bodies. It should have enhanced powers to follow up improvements required and to refer forward cases of non-compliance for possible sanctions and must be adequately resourced for its responsibilities.
A similar requirement to publish an appropriate strategy should be placed on public service providers outside the public sector. Public service Heads of Service must be accountable for effectiveness of the strategies including the delivery of the Open Data agenda. Job descriptions and appraisal systems should reflect this accountability.
The release of large numbers of datasets raises implications for privacy. It is uncertain whether existing privacy measures and safeguards to protect personal data will provide for the adequate regulation of Open Data. Reports of breaches in security, mostly due to human error or lack of data security training, are reported on a fairly regular basis.
We support the recommendation that privacy protection should be embedded in any transparency program. These reforms bring major challenges regarding the effective and ethical management of information resources by many small and often new organizations that have little or no knowledge of data protection or freedom of information requirements. Their lack of experience may lay them open to unscrupulous practices by users seeking their data in order to identify individuals by data mashing. These organizations must be underpinned by effective knowledge and information management, which demands the expertise and skills of information professionals.
Too much public sector data is simply not fit for purpose and therefore can mislead and misinform with potentially disastrous results. All public service providers should be under an obligation to specify the datasets that they hold, together with a minimum set of metadata about the collection, compilation and validation of the data, and the frequency with which some or all elements of the dataset are revalidated or revised. It should be the norm for each public data set to have an introductory profile setting out key parameters of the data and where necessary the provider should be expected to give health warnings as to the limitations and shortcomings of its quality.
The publication of raw (“unpolished”) data offers benefits for innovators and researchers. Research informs data collection by indicating areas of concern and provides a route to valuable syntheses and sense making. These research activities in turn benefit public service through policy impact assessments and other outputs.
Academic research therefore needs to have access to public service data without incurring a charge. In return users of the raw datasets should undertake to release polished, linked data at a later date. A drive to improve quality and introduce a robust regulatory framework should start immediately, although it is clear that this will take some time to deliver.
It is important that public service providers select data for publication taking into account the potential relationships between datasets held by the sector as a whole. Data held by Agency A may have greatest value only when combined with data from Agency B, but this will be lost if Agency A only considers its own activities. Higher Education research should be monitored so that it informs this area.
External requests to access particular datasets will provide an order of priority for cleansing and otherwise improving the quality of an organization’s datasets. Structured research into public and business expectations and desires would further help to determine which data to publish. To obtain fullest benefit from public investment in open data, levels of digital literacy must be raised in the US. Information professionals are in a position to make an important contribution to achieving this objective.
The workings of government are already complex: as public services are increasingly provided by local organizations, they are likely to become more complex still. An early priority must be a comprehensive guide to government and public service delivery for the ordinary citizen, allowing him or her to make informed and meaningful requests for data, and to interpret that data meaningfully. This need not (and ought not to) be a priced printed publication; the most effective improvement would be to upgrade the search function to make it more context sensitive, and taxonomy based.
This change would greatly increase the chance of finding relevant content through site search, reducing the cost to government of helping questioners to refine FOIA and open data requests and increasing the likelihood of value-added publication of mashed data. Use of the same taxonomy would make search consistent across the public sector portals.
The challenge of ensuring effective information management across the plethora of new public service providers that the Government wants as part of its agenda is immense. The program needs to be strengthened and better promoted but could already act as exemplar for similar programs in other parts of the public sector. Citizenship courses in schools and for those applying for American citizenship should include modules on government organization, sources of relevant data and information and a citizen’s right to data.
Government does not necessarily need to lead market making in data aggregation and publication. The emergence since around 2010 of user-generated websites based on the available public sector information demonstrates that organic development is likely once suitable data is released.
Our understanding is that we will focus on encouraging and stimulating innovation and creativity in the use of public sector data, especially in regard to core national reference datasets and those public sector information providers operating trading funds. Again, there may be lessons to be learned from knowledge transfer initiatives in higher education.
It is difficult and perhaps unnecessarily complex to define the key terms precisely. We believe that the situation should be kept under review so that appropriate action can be taken if the feedback from early adopters suggests that greater clarity is required.
There should be a presumption of openness unless there are overwhelming reasons for a dataset to remain closed. These reasons might be grounds of national security, or because the dataset contains personal data that cannot reliably be excluded or redacted. However, the mere presence of personal data in a dataset must not be the sole reason for refusing to make the data open: the quality of some data is poor, meaning that personal data may be entered wrongly or in unexpected fields, but this is an argument to improve data quality not to restrict openness.
A set of agreed criteria should be developed to establish tests for release. There must be particular consideration of the use of cost as a reason to withhold release so that sensible common limits are set, including an agreed definition of vexatious requests. But, we repeat, the presumption must be to release data through both “push” and “pull” unless there are overwhelming reasons against. The requisition process for datasets to be opened should include the opportunity for applicants to indicate their intention to create further value through the release of new open data resources incorporating the requested data.
It is inherent in a request to provide data that the requestor considers whatever costs are incurred by the data custodian to be reasonable and to represent value for money in the general public’s view. Any value for money test must be considered primarily in the view of the public, not of government.
The presumption must be that requested data will be published in the absence of overwhelming reasons of national security or data protection; an exemption will not be available on the grounds that extensive cleaning of poor quality data is required; a commonly agreed definition is used to define vexatious requests that need not be answered; and that if a ceiling cost is imposed it will be set sufficiently high that the vast majority of requests (perhaps 98–99%) will be met.
Further work will be needed to establish the value chain for each type of data user. A fair price to be paid for a complete dataset by a publisher (integrator, or intermediary) who will add value and derive ongoing profit from public service data is clearly a different price from that to be paid by an individual member of the public or a not-for-profit group or academic researchers with an interest in a small segment of the available data.
The range must be inclusive rather than exclusive and must comprise all central and local government organizations unless there are obvious grounds for exclusion. Even here, such as the national security agencies, the government should issue an instruction to consider whether any data is suitable for release. The greatest problem is with public data created, gathered or held by arm’s length bodies and in the private sector on behalf of government. Some of these datasets are of critical importance to academic research, to business administration and to commercial R&D as well as in their original government contexts.
As we have already emphasized, there must be a presumption to openness of all datasets created in the course of public business unless reasons of data privacy or national security make a convincing case against. We envisage a number of approaches to encouraging or ensuring publication of data:
- We should continue to challenge departments and agencies to release data or to justify withholding.
- Reference to other initiatives should rapidly identify what data is being collected and where it is being stored: in this context it will be important to demonstrate proper disposal of records where databases are closed or frozen.
- A list should be regularly published of databases for which requests for release have been denied or where release has been delayed beyond the normal deadline. The service provider should be required to state the reason for the denial or delay in response. In the case of public services that are provided by private sector organizations, performance against these criteria should be an element of performance reviews.
- In all cases it should be possible to refer to an independent authority if the enquirer considers the response to be unsatisfactory.
Concluding Comments
There are some other issues that we wish to raise that don’t fit neatly into the formal questions posed in the consultation. The first is the general observation that the consultation paper focuses rather more on central government than other parts of the public sector. It will be important to ensure that the open data agenda is adopted across the whole public sector. Therefore, just as the other initiatives mentioned in this response will feed into the approach to open data, so the experience of other parts of the public sector may raise further issues and challenges. This is the start of an important program and much remains to be learned.
The consultation document also raises important questions about the ownership of various databases of public service information. It is unsatisfactory that ubiquitous and important national data are the intellectual property of privatized public services or government trading funds, and that this data represents an important revenue stream that they are naturally reluctant to forego. As well as being a major outlay in the provision of public services, this situation remains a potential barrier to creation of data mashing applications of the type envisaged. The issue must now be resolved urgently within a national open data strategy.
Therefore, it is important to realize that such data does not enjoy database rights. Only data that is specifically created in its own right and for possible resale enjoys such protection. The implication of this is that much public sector data will not be covered by database rights and will not require a license (free or at a price) for others to use it.
The consultation focuses on two types of user — the entrepreneur and business wanting to add value to official data for a fair return and the citizen able to exercise choice more effectively and hold public service providers to account. It is worth identifying a third community — the research community.
Government data is an important source for much research, some of which will support assessment of policy impact and provide valuable new insights into policy areas. We have already mentioned the contribution that the higher education sector could make to the open data agenda through their experience in establishing open archives and through projects facilitating knowledge transfer between universities and businesses.
To this can be added an interest in the data itself suggesting that higher education should be seen as an important partner in developing an open data program for public sector data.
Jeff C. Palmer is a teacher, success coach, trainer, Certified Master of Web Copywriting and founder of https://Ebookschoice.com. Jeff is a prolific writer, Senior Research Associate and Infopreneur having written many eBooks, articles and special reports.