Loading...

change the model
change the world

there is a lot more to cloudmode than good looks
Home/Blog/Article 01

It's all about the model

Data modeling is arguably the most boring discipline in the world of computer science. Code is far more exciting. Graphical representations and UI are downright sexy. Yet the data model is the very foundation of all computing. From the conceptual to the logical to the physical, models determine how data is written, stored, read, retrieved and used. The data model you use has a significant effect on cost, performance, security and usability.

Since programmers use and deal with objects, and data resides in tables related by common fields (keys), there is usually a mismatch or "impedance” youtube crash course. This increases over time. The simple fact of life is that data relationships change. Enterprises evolve and therefore code is constantly changing. This is why solving the object relational impedance problem is often called the Vietnam of Computer Science. see Liam McLennan talk on ORMs. The bottom line is that code comes and goes but data is forever.

CloudMoDe begins with a revolution in the data model. In computer software a data model is a schema used to organize data.  A data model includes the process of applying formal descriptions to physical data, including the specification of data tables and relationships between database tables. These elements and structures facilitate queries used to retrieve or summarize data. The main aim of a data model is to support the development of information systems by providing the definition and format of data. The design and implementation of data models can have a significant impact on getting data in, integrating data and getting data out.

CloudMoDe is a platform based on a unified semantic binary model of data, where the semantics of the model are expressed as binary relationships (and collections of binary relationships) between the attributes of the model. This elegant structure forms the conceptual foundation and determines, at the logical level, the manner in which data can be stored, organized and manipulated in a data storage system. It’s infrastructure is the model by which data is implemented in a physical data model and thus defines the usability of operations that can be performed on the data. This structure offers a simple, natural, implementation-independent, flexible, and non-redundant specification of information. It makes it possible to implement a capability based security system. We began by inventing a unified semantic data model that was robust enough to model all kinds of data. This will have a significant effect on cost, performance, security and maintenance.

THE COST OF BRITTLENESS

There are multiple components to the cost of design and implementation of a data model: requirements gathering (understanding what data is going to be modeled-the conceptual schema), requirements analysis (understanding how data will be stored and accessed-the logical schema), software implementation (implementation of the data storage mechanism, the physical schema, and design and implementation of the transformation model that translates data between physical and logical schema), and software  integration testing (testing and verification of the data model (physical and logical) before and after it is integrated into it's target application(s). Hidden costs associated with development of interfaces used to share data between systems account for 25-70% of the cost of current systems (West & Fowler). Other hidden costs include 'brittleness' (where the data model breaks when attempts to extend it into other application areas are attempted). You can measure brittleness in a concrete manner as missed opportunity costs. This is when inadequate data models 'hide' data and relationships from users, due to the data model's inherent inability to reliably capture and store relevant data. Brittleness is also evident when data cannot be shared between customers and suppliers due to non-standard formats and models.

A unified semantic model of the physical and logical schema would result in significant cost savings in software development. Requirements gathering and analysis are reduced to identifying where attributes should be placed in the model, instead of creating a new structure of attributes and relationships with each application. The implementation phase (data storage, logical-physical mapping and translation) becomes a simple coding exercise, placing the attributes in the appropriate location in the model. Testing and verification of the data model prior to integration testing is greatly simplified also, requiring only the verification of data using existing (and proven) test software. The problem of brittleness is eliminated, since the underlying model is robust and easily extensible. Hidden data relationships do not exist, and data is easily shared between customers and suppliers, since the model of the data is shared by default.  A unified semantic model hardens the system in that it eliminates brittleness and this has a tremendous impact on development costs.

THE INVISIBLE COST OF MAINTENANCE

Data model designs and implementations evolve and grow as new information becomes available in a particular domain. Very few data models remain static. This evolutionary process involves the same steps as the development of the original model (requirements gathering, requirements analysis, software implementation, software test, integration test), only these must be done in the context of an existing model. Most often these tasks are undertaken by engineers that were not involved in the original model design and implementation. If the data model is not completely documented (and the documentation maintained) the personnel responsible for evolving the data model may not be aware of assumptions implicit in the design and/or implementation of the data model. Often these assumptions are uncovered only during the software and integration testing phases (or worse once the applications have been deployed), requiring the data model design/implementation process be repeated. When a data model is moved to a new technology platform (to take advantage of cost/performance advances), the model may need to be changed in order to take advantage of the new technology, perhaps even requiring a complete re-implementation of the model. These maintenance costs can easily exceed the cost of the original data model design and development by a factor of 2. (E. Burt Swanson)

A unified semantic model of the physical and logical schema would significantly reduce maintenance costs. Maintenance could be reduced to identifying where attributes should be placed in the model. In fact all of the cost saving benefits described in the COST section are even more relevant in the domain of software maintenance. For example, since the data model is semantic in nature, documentation requirements of a specific implementation can be significantly reduced, minimizing costs up front, and costs downstream. Once developers are comfortable with the standardized semantics of the unified model, it's attributes and relationships, specific implementations are much easier to interpret, understand, maintain and extend.

PERFORMANCE, PERFORMANCE! PERFORMANCE.

The deployment costs associated with a particular data model are heavily influenced by the amount of translation/transformation between the physical and logical schema. These costs are a function of the complexity of the model (number of attributes, and number of relationships between the attributes). For example, if we have a data model with 10 attributes, and 10 relationships, the addition of a single attribute and 10 relationships (a relationship between the the new attribute and each of the existing attributes) will double the amount of computing resources required to transform/translate between logical and physical models. Traditional data storage systems optimize for storage space and for the high performance access of the physical schema. This comes at a cost when this physical schema must be translated/transformed into the logical schema. This transformational process (from physical to logical when reading the data, and from logical to physical when writing the data) is a difficult problem, because the physical schema rarely maps to the logical schema in any straightforward manner. This transformation model has been referred to as the “Vietnam of Computer Science”, requiring significant engineering resources to accomplish in an effective and efficient manner. An example is the complex task of transforming a relational table structure (the physical model) to an object model that is designed to model the real world (the logical schema) (Neward, Ted (2006-06-26). "The Vietnam of Computer Science". Interoperability Happens. Retrieved 2011-01-08).

A unified semantic model of the physical and logical schema significantly affects the performance of the data model in terms of speed.  Performance savings are realized first in the cost of the hardware and software systems required to provide reasonable response time when storing and accessing the data by the application. Hidden performance costs are reduced, which are the result of the amount of time users spend waiting for system to respond to both requests to store and retrieve data from the data model. The semantic modeling of the context of the the data resources reduces the distance to be traversed when moving from one modeling space to another (logical to physical, physical to logical). Typically, a logical schema must create a specific, domain specific model of the context which is transformed into the physical data model. By incorporating the context directly into both the semantic and physical data models, this transformational step is eliminated, and the result is an embodiment that is robust enough to model all contexts (without specialized domain or application modeling). Thus eliminating redundancy in the models, which is a natural attribute of separated data models.

A unified semantic model of the physical and logical schema would significantly offer performance benefits over traditional data models by 1) keeping data replication to an absolute minimum, therefore reducing storage requirements, 2) eliminating the translation/transformation step between logical/physical and physical/logical data, and 3) minimizing hidden costs associated with users waiting for data or time to market risks.

SECURITY & THE ELUSIVE SNIPE

Advances in security are taking place all the time. Being able to implement them however is an altogether different matter. Especially when you have complex data models connected to MVC (model-view-controller) and dependent (hard wired) UI’s, whose logical and physical schema are already set in stone, so to speak. Traditional security for files is based on what is called an access control list or "ACL", a list of permissions attached to a file. Capability based security improves file security by implementing a set of privileges associated with a file that specify the access rights of the capability. An ACL specifies which users are granted access to files, and what operations that user is allowed on that file. The file is accessed by a filename, which is a 'forgeable' reference, typically in plain text (for example a path name like '/Users/franksmith/Documents/mykitty.png'. This pathname does not specify any type of access rights for the file. Instead, these access rights are maintained in a separate structure (an inherently insecure structure, which in all operating systems can be attacked or exploited in a myriad of ways), which requires the operating system to validate any attempt to access the file. If the operating system in charge of the ACL is compromised in any way, the files controlled by the OS and ACL are inherently compromised. The semantics of ACLs have been proven to be insecure in many situations, e.g., the confused deputy problem, where a computer program is innocently fooled by some other party into misusing its authority.

Capability-based security models achieves improved file security not by implementing a set of privileges associated with a file, but by specifying the access rights of the capability, and associating that capability with the file in a separate structure. Think of a capability like a car key. It works in a specific car (designates a particular object or file) and anyone holding the key can perform certain actions (open the car door, start the car). Just like a valet key can specify a subset of actions from the master key (can't open trunk, console or glove box), capabilities can designate the same object/file, but authorize different sets of actions. And just like a car key, a capability can be delegated (handing the key/capability to another trusted individual) and copied. A car key is generally not forgeable, meaning I can't go someplace and have a key made for my neighbor's car. If I find that someone has a key to my car, I can have my car re-keyed. Capabilities are easier to share than keys, as they are just data. They are more difficult to bypass (unlike ACL and password schemes). You have to have a valid capability to access the file. And capabilities can be revoked when they are compromised, by creating a container (a containing object) that contains the referenced object. By deleting the container, the referenced object can no longer be accessed, because the key of the container is invalid throughout the entire system. A unified semantic model of the physical and logical schema makes it possible to apply capability based models of security. New security schemes can be added retroactively without compromising the system.

WHAT DID WE LEARN

The data model is really important. It determines how far you can go. Programmers deal with objects that need data. Data is usually stored in tables. Mapping the relationship between objects and data can get really complex. Having a model that is robust enough to handle this complexity and scale has a lot of value. Cloudmode solves this with a unified semantic data model that is robust enough to model all kinds of data relationships and do not break as changes occur in relationship between data. It therefore, in and of itself, will have a significant effect on cost, performance, security and maintenance. It does this by:

  • keeping data replication to an absolute minimum, therefore reducing storage requirements,
  • eliminating the translation/transformation step between logical/physical and physical/logical data,
  • minimizing hidden costs associated with users waiting for data or time to market risks.
  • enabling new security schemes can be added retroactively without compromising the system
Computer applications are all about data therefore a revolutionary new way of organizing that data, storing it, accessing it, searching it, presenting it, moving it around and keeping it safe, is obviously going to impact lots of domains.

Conclusion

Computers are all about data: acquiring it, storing it, accessing it, searching it, presenting it, moving it around and keeping it safe. A revolutionary new way of organizing that data, such as the CloudMoDe Operating System means a fundamentally different way of doing all those tasks. This has a significant effect on cost, performance, security and usability of databases. The less obvious but more profound side effect of this is the ability to extend the data model out to the User Interface Model, which defines the objects that a user can view, access and manipulate through the user interface. This dramatically reduces development time of software. The implications are far to many to go into here but suffice it to say that a unified semantic binary model is going to change everything. Understanding this is the foundation of cloudmo.de

Comments(disqus)