Share this post:
Blockchain and Big Data are among the top emerging technologies tipped to revolutionize several industries, radically changing the way businesses and organizations are run. One might assume that these technologies are mutually exclusive — each forging unique paths and applied independent of one another.
But that will be off the mark.
Blockchain — just like data science — is gradually transforming the way several industries operate. And while data science focuses on harnessing data for proper administration, blockchain ensures trust of data by maintaining a decentralized ledger.
The question is, Is there a place these two concepts intercept?
What will be achieved when these two technologies are concurrently applied?
Simply put, how can blockchain disrupt data science?
To answer these questions, it will be helpful to get a better understanding of Blockchain and Data Science separate from one another.
What is Blockchain?
Blockchain is basically a distrusted ledger that records economic transactions such that they cannot be manipulated. The technology came into prominence as a result of the interest in bitcoin and cryptocurrency in general but has since found relevance in recording not just cryptocurrency transactions but anything of value. Knowing the capabilities of this emerging technology, developers and tech enthusiasts have gone to work fashioning out use case after use case for blockchain.
High Demand for Blockchain Developers
The demand for blockchain developers has swelled in the last few years just as projects working on different applications of blockchain. Reports from freelancing platforms like UpWork have retained blockchain skills as the most demanded skills. In a similar manner, professionals in other areas like Legal studies are said to have a major advantage if they have blockchain skill — or at least have an understanding of the technology.
What is Data Science?
Data science seeks to extract knowledge and insights from structured and unstructured data. This field encompasses statistics, data analysis, machine learning and other advanced methods used to understand and analyze actual processes using data.
Data is often described as the new oil in economic parlance, reason why leading businesses including the famed GAFAs (Google, Amazon, Facebook, and Apple) are in control of loads of data. Some common applications of data science is seen in internet engine protocols, digital advertisements, and recommender services. Data analysis, a key aspect of data science, has been found relevant in the healthcare industry to track patient treatment and equipment flow; in travel a gaming to improve consumer experience; for energy management as well as many other sectors.
High Demand for Data Scientists
There’s also a seemingly insatiable demand for data scientists who can provide more insights with data and help solve more problems. This is even more pronounced when considering big data, an advanced aspect of data science which deals with extremely large amounts of data which cannot be handled by traditional data processing methods.
The relationship between Blockchain and Data Science
Unlike in areas like Fintech, healthcare and supply chain where blockchain is now very familiar, the technology has not been explored extensively in aspects of data science. To some, the relationship between the concepts are unclear if not non-existent.
For starters, both blockchain and data science deals with data — data science analyses data for actionable insights, while blockchain records and validates data. Both make use of algorithms created to govern interactions with various data segments. A common theme which you will soon notice is this, “data science for prediction; blockchain for data integrity.”
Blockchain impact on data
Data science, just like any technological advancement has its own challenges and limitations which when addressed will unleash its full capabilities. Some major challenges to data science include inaccessible data, privacy issues, and dirty data.
The control of dirty data (or erroneous information) is one area that blockchain technology can positively impact the data science field in no small measure. According to 2017 survey of 16,000 data professionals, the inclusion of dirty data like duplicate or incorrect data was identified as the biggest challenge to data science. Through decentralized consensus algorithm and cryptography, blockchain validates data making it almost impossible to be manipulated due to the huge amount of computing power that will be required.
Again through its decentralized system, blockchain technology ensures the security and privacy of data. Most data are stored in centralized servers that are often the target of cyber attackers; the several reports of hacks and security breaches goes to show the extent of the threat. Blockchain, on the other hand, restores the control of data to the individuals generating the data making it an uphill task for cybercriminals to access and manipulate data on a large scale.
How Blockchain Can Help Big Data?
If big is the quantity, Maria Weinberger of Janexter says, blockchain is the quality. This follows the understanding that blockchain is focused on validating data while data science or big data involves making predictions from large amounts of data.
Blockchain has brought a whole new way of managing and operating with data — no longer in a central perspective where all data should be brought together but a decentralized manner where data may be analyzed right off the edges of individual devices. Blockchain integrates with other advanced technologies, like cloud solutions, Artificial intelligence (AI) and the Internet of Things (IoT).
Furthermore, validated data generated via blockchain technology comes structured and complete plus the fact it is immutable like we mentioned earlier. Another important area where blockchain generated data becomes a boost for big data is in data integrity since blockchain ascertains the origin of data though its linked chains.
5 Blockchain Use Cases in Big Data
There are at least five specific ways blockchain data can help data scientists in general.
- Ensuring Trust (Data Integrity)
Data recorded on the blockchain are trustworthy because they must have gone through a verification process which ensures its quality. It also provides for transparency, since activities and transactions that take place on the blockchain network can be traced.
Last year, Lenovo showcased this use case of blockchain technology to detect fraudulent documents and forms. The PC giants used blockchain technology to validate physical documents which were encoded with digital signatures. The digital signatures are processed by computers and the authenticity of the document is verified through a blockchain record.
Most times, data integrity is ensured when details of the origin and interactions concerning a data block are stored on the blockchain and automatically verified (or validated) before it can be acted upon.
- Preventing Malicious Activities
Because blockchain uses consensus algorithm to verify transactions, it is impossible for a single unit to pose a threat to the data network. A node (or unit) that begins to act abnormally can easily be identified and expunged from the network.
Because the network is so distributed, it makes it almost impossible for a single party to generate enough computational power to alter the validation criteria and allow unwanted data in the system. To alter the blockchain rules, a majority of nodes must be pooled together to create a consensus. This will not be possible for a single bad actor to achieve.
- Making Predictions (Predictive Analysis)
Blockchain data, just like other types of data, can be analyzed to reveal valuable insights into the behaviors, trends and as such can be used to predict future outcomes. What is more, blockchain provides structured data gathered from individuals or individual devices.
In predictive analysis, data scientists’ base on large sets of data to determine with good accuracy the outcome of social events like customer preferences, customer lifetime value, dynamic prices, and churn rates as it relates to businesses. This is, however, not limited to business insights as almost any event can be predicted with the right data analysis whether it is social sentiments or investment markers.
And due to the distributed nature of blockchain and the huge computational power available through it, data scientists even in smaller organizations can undertake extensive predictive analysis tasks. These data scientists can use the computational power of several thousand computers connected on a blockchain network as a cloud-based service to analyze social outcomes in a scale which would not have been otherwise possible.
As has been exhibited in financial and payment systems, blockchain makes for real-time cross border transactions. Several banks and fintech innovators are now exploring blockchain because it affords fast — actually, real-time — settlement of huge sums irrespective of geographic barriers.
In the same manner, organizations that require real-time analysis of data in large scale can call on a blockchain-enabled system to achieve. With blockchain, banks and other organizations can observe changes in data in real time making it possible to make quick decisions — whether it is to block a suspicious transaction or track abnormal activities.
In this regard, data gotten form data studies can be stored in a blockchain network. This way, project teams do not repeat data analysis already carried out by other teams or wrongfully reuse data that’s already been used. Also, a blockchain platform can help data scientists monetize their work, probably by trading analysis outcomes stored on the platform.
Blockchain, as have been noted, is in its nascent stages though it may not appear so due to the hype the technology have gotten in a short period. One would expect that as the technology matures and there are more innovations around it, more concrete use cases will be identified and explored — data science being one area that will benefit from this.
That being said, a few challenges have been raised about its impact in data science especially in big data which requires exceptionally large amounts of data to be handled. One concern is that blockchain application in this regard will be very expensive to pursue. This is because data storage on a blockchain is expensive compared to traditional means. Blocks deal with relatively small amounts of data compared to the large volumes of data collected per second for big data and other data analysis tasks.
How blockchain evolves to address this concerns and proceed to disrupt the data science space will be particularly interesting because, like we have seen, the technology has huge potential to transform how we manage and use data.
From time to time, we invite industry thought leaders, academic experts and partners, to share their opinions and insights on current trends in blockchain to the Blockchain Pulse blog. While the opinions in these blog posts are their own, and do not necessarily reflect the views of IBM, this blog strives to welcome all points of view to the conversation.