Question

Why is it important for data scientists to seek transparency?

Answer
By Justin Stoltzfus | Last updated: August 5, 2019

Transparency is essentially important in data science projects and machine learning programs, partly because of the complexity and sophistication that drives them — because these programs are “learning” (generating probabilistic results) rather than following predetermined linear programming instructions, and because as a result, it can be hard to understand how the technology is reaching conclusions. The “black box” problem of machine learning algorithms that are not fully explainable to human decision-makers is a big one in this field.

With that in mind, being able to master explainable machine learning or “explainable AI” will likely be a main focus in how companies pursue talent acquisition for a data scientist. Already DARPA, the institution that brought us the internet, is funding a multimillion-dollar study in explainable AI, trying to promote the skills and resources needed to create machine learning and artificial intelligence technologies that are transparent to humans.

One way to think about it is that there is often a “literacy stage” of talent development and a “hyperliteracy stage.” For a data scientist, the traditional literacy stage would be knowledge of how to put together machine learning programs and how to build algorithms with languages like Python; how to construct neural networks and work with them. The hyperliteracy stage would be the ability to master explainable AI, to provide transparency in the use of machine learning algorithms and to preserve transparency as these programs work toward their goals and the goals of their handlers.

Another way to explain the importance of transparency in data science is that the data sets that are being used keep becoming more sophisticated, and therefore more potentially intrusive into people’s lives. Another major driver of explainable machine learning and data science is the European General Data Protection Regulation that was recently implemented to try to curb unethical use of personal data. Using the GDPR as a test case, experts can see how the need to explain data science projects fits into privacy and security concerns, as well as business ethics.

Share this Q&A

  • Facebook
  • LinkedIn
  • Twitter

Tags

Emerging Technology Machine Learning Data Science

Written by Justin Stoltzfus | Contributor, Reviewer

Profile Picture of Justin Stoltzfus

Justin Stoltzfus is a freelance writer for various Web and print publications. His work has appeared in online magazines including Preservation Online, a project of the National Historic Trust, and many other venues.

More Q&As from our experts

Related Terms

Related Articles

Term of the Day

Synthetic Data

Synthetic data is input that is generated mathematically from a statistical model. Synthetic data plays an important role in...
Read Full Term

Tech moves fast! Stay ahead of the curve with Techopedia!

Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.

Resources
Go back to top