What is Public Data?
Public data is digital information collected by government agencies. This type of data can be accessed by citizens and is often used to identify areas of need, track progress on social goals, and provide data-driven advocacy for positive change.
It’s important to remember that different countries have different legal definitions of what constitutes public data. Essentially, some countries have broad definitions that use public data and open data as synonyms, while others consider public data to be a subset of open data.
Techopedia Explains the Public Data Meaning
Getting everyone to agree upon a single public data definition at the start of a project will help avoid confusion and help prevent misaligned expectations about data accessibility.
For the sake of convenience, many international organizations use the NIST glossary entry for public information as a starting point for discussion.
“The term ‘public information’ means any information, regardless of form or format, that an agency discloses, disseminates, or makes available to the public.”
How to Use Public Data
If you want to access and use public data, the first step is to locate the data you need. Luckily, most government agencies and many public organizations have online portals specifically designed for sharing public data.
Once you locate the right datasets, your next step will be to read the dataset’s documentation and metadata to learn how the data is formatted. During this step, you should also check and make sure there aren’t any restrictions on the data’s use that will interfere with your objective.
After you acquire the data, you will need to pick the right tool to analyze it. Depending on the data’s structure and format, you may have to use spreadsheets, statistical software like IBM SPSS, or programming languages like Python or R to explore the data and look for patterns.
Visualization tools like Tableau or Power BI can be used to help you and your stakeholders understand the patterns and trends you discovered during analysis. (During this step, you may need to consider the limitations of the public data and how they may affect your interpretation.)
Who Can Use Public Data?
The idea behind public data is that it’s accessible to everyone. While openness is the goal, some types of public data may have restrictions.
For example:
- You might need to register or agree to terms of use before you can access certain datasets.
- Data containing sensitive information may be released in an anonymized format to protect privacy.
- There may be access fees for access to niche, high-value datasets.
- Free public data may have restrictions that limit or prohibit commercial reuse.
Examples of Public Data Repositories
A public data repository is a centralized platform that stores, manages, and distributes public data. Most repositories link back to the original source and provide information about the data’s formatting and acceptable use policy.
Here are some examples of popular repositories where you can access public data:
Provides access to a wide range of datasets from United States federal agencies. Topics include agriculture, education, health, and transportation.
Offers access to open data published by European Union institutions and member states. Includes datasets related to economics, environment, and society.
While primarily a platform for code hosting, GitHub also hosts a number of public data repositories in fields such as machine learning (ML), natural language processing, and data science.
Google Dataset Search is a search engine that helps users discover datasets hosted across the web.
Types of Public Data
Data on public data portals is typically categorized by the dataset’s source, its content, its usage restrictions and the formats it can be downloaded in.
Categorization allows users to locate raw or processed datasets relevant to their research questions or projects.
Raw data is gathered directly from the source. It is often incomplete or unstructured and might contain errors or redundancies. In its raw form, data might not be suitable for making decisions or analyses because it could be too complex, voluminous, or difficult to understand.
Processed data, which may also be referred to as cooked data, is the result of refining raw data to make it usable. The processing might involve cleaning (removing errors or duplicates), transforming (converting data into a different format or structure), and organizing it in a way that makes analysis and decision-making easier.
Public Data vs. Open Data, Private Data
Public data is information that is collected by government agencies and made available to the public.
Public datasets are generally accessible, but there may be restrictions regarding access requirements and data reuse. Examples of public data include government spending data, public health statistics, and meteorological data.
In some countries, open data is considered to be an umbrella term that includes both public data collected by governments and data that civic organizations and research institutions collect.
In this context, open data describes publicly available datasets that are free to use and don’t have restrictions. Open datasets can usually be accessed in machine-readable formats like comma-separated values (CSV) or JavaScript Object Notation (JSON) for easy analysis.
Private data is information that is intended for internal use, business operations, or personal needs. Access to private data is restricted to authorized users.
Public Data Use Cases
Public data can be used for a wide range of purposes.
It can be used to help:
- Healthcare professionals track diseases and identify potential health risks.
- Finance professionals identify market trends and investment opportunities.
- Machine learning engineers (ML engineers) and data scientists train large language models (LLMs).
- Government officials make evidence-based policies.
- Non-profit employees analyze public data to track progress on social and environmental goals.
Public Data Pros and Cons
Public data offers numerous benefits, but it also has certain drawbacks. The potential benefits generally outweigh the drawbacks when proper data quality standards, responsible handling practices, and awareness of potential issues have been put in place.
Pros
- Promotes transparency and accountability
- Supports evidence-based policy decisions
- Can inspire creative solutions to challenges
- Fosters partnerships across different government and economic sectors
- Can be used to drive positive change in communities
Cons
- Data may contain errors, inconsistencies, or outdated information
- Data can be misinterpreted or misused
- Requires responsible handling
- Can have restrictions that limit access, use, or redistribution
- Can be resource-intensive
Challenges in Accessing Public Data
While public data with no restrictions usually has the highest potential for widespread use and positive impact, public data with restricted access can still have significant value.
Many countries have established workarounds that allow citizens to request access to government records and other information held by government agencies or public authorities.
In the United States, for example, many types of public data that can’t be accessed through a government portal can be accessed through a Freedom of Information (FOI) request.
The Bottom Line
Public data is a powerful tool for driving positive change across many aspects of society. Its use fosters collaboration and empowers individuals and organizations to use government data for social good.
It’s important to remember, however, that public data is not the same as open data in some countries.
The bottom line is that while all open data can be referred to as public data, public data is not always open data. This distinction is important for understanding the rights and responsibilities associated with using public datasets.
FAQs
What is public data in simple terms?
What type of data is public?
How does public data work?
What is the difference between open data and public data?
Is public data free?
References
- Public Information – Glossary | CSRC (Csrc.nist)
- IBM SPSS Statistics (Ibm)
- Business Intelligence and Analytics Software | Tableau (Tableau)
- Power BI – Data Visualization | Microsoft Power Platform (Microsoft)
- Data.gov Home – Data.gov (Data)
- The official portal for European data (Data.europa)
- GitHub – awesomedata/awesome-public-datasets: A topic-centric list of HQ open datasets. (Github)
- Dataset Search (Datasetsearch.research.google)
- What is raw data (source data or atomic data) and how does it work? | Definition from TechTarget (Techtarget)
- FOIA.gov – Freedom of Information Act: How to Make a FOIA Request (Foia)