An Enterprise Data Fabric (EDF) is a distributed, operational data platform that sits between application infrastructures (such as J2EE or .NET Framework) and back-end data sources. Java Platform Enterprise Edition or Java EE is a widely used platform for server programming in the Java programming language It offers data storage (caching), multiple APIs for data access, reliable data distribution and real-time data analysis. All these features are designed with scalability and performance in mind.
Contents |
An EDF is relevant in today's information architectures because traditional infrastructure tools such as a databases, data warehouses and Enterprise Messaging System cannot handle the real-time needs of today's applications. A Computer Database is a structured collection of records or data that is stored in a computer system A data warehouse is a Repository of an organization's electronically stored data An enterprise messaging system (EMS is a set of published Enterprise-wide standards that allows organizations to send semantically precise messages between computer systems Most of today's architectures suffer from:
1. It’s about operational data management: Unlike a data warehousing system where terabytes (or petabytes) of data is consolidated from multiple databases for offline data analysis, the EDF is a real-time data store specifically optimized for working with operational data subsets needed by real-time applications – it can be referred to as the “right now” data, or the data accessed by many processes and applications. A data warehouse is a Repository of an organization's electronically stored data It is a layer of abstraction in the middle tier that collocates frequently used data with the application and works with backend databases behind the scenes.
2. Distributed persistence via distributed caching: An EDF stores data by utilizing main-memory distributed caching, which makes it many times faster than the traditional disk based DBMS. It harnesses the memory and disk across many clustered machines to co-locate data with consuming applications and provide unprecedented data access rates and scalability. Highly concurrent main-memory data structures are utilized to avoid lock contention. Different policies can be applied to different data subsets in different locations, making the data more application-centric as opposed to the other way around and isolating a user from implicit technology characteristics. Persistence becomes an attribute of all parts of the system, not just concentrated in the database. High availability or consistency of data is not compromised, as a configurable policy dictates the number of redundant memory copies to be maintained, and failure detection models built into the distribution system ensures data correctness. The in-memory data layer can be backed with a disk persistence layer that can be configured to receive data synchronously or asynchronously based on the usage scenario.
3. Key database semantics are retained: Quite like a database management system, distributed data in an EDF can be managed with transactional integrity, queried, and recovered from disk. This is unlike simple distributed caching solutions that provide caching of serialized objects and simple key-value pairs managed in hashmaps that can be replicated to your cluster nodes. An EDF also provides support for multiple data models across multiple popular languages – data can be managed as objects, XML documents or as relational tables and accessed via programmatic APIs (such as Java, C++, or C#) or query languages such as OQL, Xpath, and SQL, etc. Don't change "Extensible" C++ (" C Plus Plus " ˌsiːˌplʌsˈplʌs is a general-purpose Programming language. Object Query Language (OQL is a Query language standard for object-oriented databases modelled after SQL. XPath (XML Path Language is a language for selecting nodes from an XML document Unlike a DBMS, where all updates are persisted and transactional in nature ACID, EDF relaxes the constraints allowing applications to control when and for what kind of data you need total ACID characteristics. In Computer science, ACID ( Atomicity Consistency Isolation Durability) is a set of properties that guarantee that Database transactions are In Computer science, ACID ( Atomicity Consistency Isolation Durability) is a set of properties that guarantee that Database transactions are
4. Active data management: Data in an EDF is a dynamic entity, which changes rapidly and is updated by many processes in a distributed environment. Thus in addition to the request-reply paradigm (ala databases), an EDF supports an event-driven model where applications are notified when events of interest are being generated in the fabric. Such a model is accommodated through a combination of ad-hoc querying (request-reply) and continuous querying (event-driven). In the continuous query model, applications can register queries representing complex patterns of interest. Unlike a database system where queries have to be executed on resident data, in an EDF data (or events) is continuously evaluated by a query engine that is aware of the interest expressed by hundreds of distributed client processes.
5. Messaging like Semantics for Data Distribution: While dealing with data management across distributed applications, developers expect reliable and guaranteed Publish-Subscribe semantics, quite like what is offered by messaging systems in the market. An EDF incorporates these messaging-like data distribution features on top of what looks like a database from a data access/storage standpoint to a developer. The system has knowledge about active subscribers and provides different levels of message delivery guarantees to those subscribers. Unlike traditional messaging where applications have to deal with piecemeal messages, message construction, incorporating contextual information in messages, managing data consistency across publishers and subscribers, an EDF enables a more intuitive approach - one where applications simply deal with a data model (Object or SQL) and subscribe to portions of the data model. When data publishers make updates to the business objects or relationships, subscribers are simply notified of the changes to the underlying distributed data fabric, and they can choose to access the relevant data instantaneously from the fabric.