Gigaom.com published a research paper last month highlighting the next generation of data integration technology for the changing cloud market . It listed the following technologies that got kicked off around 2012 and are continuing to evolve:
- Data replication (a bit older)
- Semantic mediation
- Data cleansing
- Mass data migration
Data replication continues to be widely used in production database environments while semantic mediation of data is always very difficult in any environment. Data cleansing continues to be a painful;but often, a necessary part of dealing with complex, heterogeneous datasets.We’re really just getting started on tera- and petabyte-scale data migration to cloud.
What attracted my attention most on this paper is the section on the evolution of data integration for cloud (I sometime call it “data cloud” that is coming around the horizon). In my opinion, data cloud is going to encompass a few key concepts like:
- Intelligent data services discovery
- Data virtualization
- Data orchestration
- Data identity
These four technologies, according to the author, are poised to cause the rise of cloud-based data services over the next few years.Interestingly enough, a recent conversation with a successful Silicon Valley entrepreneur taught me that “data center” is called “data” center and not “compute” center or “storage” center or “server” center for a reason – they’re supposed to be the epicenters of data for an entity (whether it’s a business, government or other type of organization) and everything else must be built around it. Genesis of data centers go back to the invaluable military, government, financial and personal data centralization that needed to happen in order to avoid the scattered and unmanageable pieces of information in the 70s and 80s. Early data centers were all built with mainframe computers (often a single mainframe machine constituted a data center that occupied significant space, power and resources). Internet boom gave rise of data centers around the globe for non-stop, 24×7 operations for web companies in the late 90s. Then came along cloud computing around mid 2000s…
Well, that doesn’t quite resonate with the way cloud computing evolved – does it? It’s computing – so, where’s the place of data in cloud? And with all those fuss about security and privacy around data on cloud platforms – is cloud really about data?
While origin of cloud platforms may very well be in expanding compute capacity as well as cheap storage – future of cloud will not be. Future of cloud will be all about transformation of traditional, non-elastic and non-flexible data centers into data clouds. The Gigaom.com research paperpredicts that the big cloud migration will begin around 2017, which is when data clouds in various shapes and forms will become mainstream and ubiquitous. The ICT world will start looking beyond the initial migration and the Enterprises will take cloud platforms more seriously than ever. They will look beyond the initial security and privacy concerns, lack of skillsets as well as data migration challenges.
Hmmm…. It’s time to define what a data cloud really is! A Data cloud can be simply thought of a cloud platform that supports various forms of elastic data-centric services, such as:
- Data ingestion: Very high-speed, low-latency and load sensitive
- Data processing: Ultra scale out and map reduce
- Data storage: Files, Objects and Databases
- Data warehousing: Transformation, conversion and aggregation
- Data backups and archiving: Resiliency, redundancy and compliance
It turns out that migrating some of these decades-old, very complex data services on newer platforms is really not all that simple for an Enterprise or a business. It is not only a technological challenge; but also an adoption, a usability and a migration challenge. Hence, the simple argument that’s used by cloud proponents frequently of reduced CAPEX and OPEX is not sufficient. When it comes to data clouds or migrating data services to cloud – it is much more complex than just CAPEX/OPEX calculations and financial/cost reduction justifications. It often touches some key parts of the data lifecycle of a businessthat may require new ways of thinking about data and how to turn data into meaningful business value and growth under the new paradigm of data clouds.
Hence, adopting data clouds and migrating data into cloud services is almost always a multi-step, multi-phase and multi-year process for most businesses. To some respect, public cloud providers may not be fully ready with the core technologies required to trigger the big cloud migration. To some degree, private cloud vendors are just getting started with those technologies that would eventually inspire the big cloud migration. And there’s everyone else in between. Btw, seeing a data bellwether like Oracle jump the cloud bandwagon with specific “data cloud” and “enterprise cloud” slogans is very encouraging and good news for the wider cloud industry (Oracle OpenWorld 2014 — go there and listen to Larry).
Nonetheless, many enterprises have their data stuck in legacy databases and systems that will require upgrading and modernizing before they can move to data clouds. Many of them have silos and scatters of data that will require consolidation before they can move to data clouds. We’ve seen the rise of big data over the past couple of years and we’re seeing an explosion of connected devices and Internet of things this year. We know there’s just so much, so much data out there and getting meaningful analytics and intelligence out of those data require very powerful data services that will only be possible on super powerful data clouds. Big data systems like Hadoop or Spark may be just partial answers and not the full answer sets that most businesses need from their data treasure trove.
In summary, we’re fast approaching the need for “data clouds”!