Dell, EMC, Dell Technologies, Cisco,

Monday, December 5, 2016

Blockchains for Big Data

#Bigdata arose in the early and mid 2000s to meet internet-scale computation needs: #ZooKeeper at #Yahoo, #BigTable and #MapReduce at #Google, #Cassandra at #Facebook; and so on. Then came open source projects like #Hadoop File System ( #HDFS ), Hadoop MapReduce, Cassandra, and more. By the late 2000s and early 2010s, startups like #MongoDB, #Cloudera, and #DataStax had created businesses to transform the open source successes into enterprise-grade offerings. Now, big data technology is quietly transforming every enterprise backend on the planet. For example, in many places “data warehouses” of relational databases are getting replaced by “data lakes” running big data software. More than $100B annually is going towards big iron compute clusters, the software on top, and the services to keep it all running smoothly. Big Data Challenges But big data has its challenges, which include control, data authenticity and monetization. First, who controls the infrastructure when there are multiple actors involved? For example: If you’re a multinational enterprise, how do you share data around the planet? If you have multiple copies, how do you know which one is the most up-to-date? How do you reconcile a different system administrator role at each regional office? If you’re an industry consortium, how to share control of the ecosystem infrastructure among the companies in your consortium? This is especially hard if those companies are competitors! Why can’t there be data just “out there” as a single shared source of truth that no one on the planet owns or controls, per se? Rather, data would be a public utility like electricity or the internet itself. Second, how well can you trust the data? For example: If you generate the data yourself, how do you prove you were the originator? If you get data from others, how do you know it was truly them? What about crashes and malicious behavior? Machines crash, glitches happen, bits flip. Zombie IoT toasters might be inputting garbage. So after all your fancy Spark calculations, is it still just garbage out? Finally, how do you monetize the data? For example: How do you transfer the rights of the data, or buy rights from others? There’s a long standing dream of a universal data marketplace; how? A New Tool For Big Data: Blockchain Technology The recent surge in blockchain technology was sparked by Bitcoin. Technically, all blockchains are simply databases, but databases with “blue ocean” benefits: decentralized / shared control, immutability / audit trails, and native assets / exchanges. By modern database standards, traditional blockchains have terrible scalability and don’t even have query languages; nonetheless, the blue ocean benefits were enough to capture the imagination of the globe. Better yet, more recent technology — the BigchainDB blockchain database— combines the benefits of distributed databases (scale, queryability) and blockchains (decentralized, immutable / audit trails, assets / exchanges). This new blockchain database technology has the scalability needed in big data environments, by building on top of best-in-class distributed databases like MongoDB. This unlocks the potential for highly interesting applications in big data: shared control of infrastructure, audit trails on data, and the possibility for a universal data exchange. Let’s explore both in detail. Shared Control of Big Data Infrastructure Summary. Being a blockchain database means that control of the database infrastructure is shared across the entities, whether within-enterprise, within a consortium, or across the planet. How. A big data blockchain database like BigchainDB is decentralized, which means that its control can be shared

http://bravenewcoin.com/news/blockchains-for-big-data/

No comments:

Post a Comment