Loading...

The Ghost Protocol

yet another internet protocol
Home/Blog/Article 04

The ghost protocol

The micro-services used to build CloudMo.De, and the appliances that communicate with CloudMo.De services, use a custom protocol called the GHOST protocol. It is an internet protocol, in that is built on top of Internet Protocols (specifically TCP/IP), and is by design NOT a web protocol such as http, smtp, web sockets, SAML, OAuth etc. In the vernacular of the OSI network protocol model, the GHOST protocol is known as an application layer, i.e. the layer closest to the end user. This means that both the application layer and the end user interact directly with the software application. Thus, the GHOST protocol, while it remains as invisible as a ghost, to some degree, must be directly manipulable by human beings. In this article I will explain why we built yet another internet protocol.

What is a protocol?

To understand why GHOST (Yet Another Communications Protocol), was built, we must first understand, what a protocol is, what it’s used for and how it’s used. The Wikipedia defines a Communications Protocol as:

Communicating systems use well-defined formats (protocol) for exchanging messages. Each message has an exact meaning intended to elicit a response from a range of possible responses pre-determined for that particular situation. Thus, a protocol must define the syntax, semantics, and synchronization of communication. -- attribution?

Communications is “the imparting or exchanging of information” between systems, which in CloudMo.De are the appliances and micro-services that support these appliances. So a protocol in this context is the “well-defined format” used for exchanging messages.

Any single message exchanged between systems using a specified protocol then has an exact meaning. This message must elicit a specific response based on the context of the message, therefore the message must provide sufficient context for the recipient to respond in the expected manner.

Since the message has an exact meaning, it follows that the message has semantics (meaning, or interpretation of meaning, here a uniform way to model relationships), syntax (a set of rules whereby something is put together or an analysis of this) and information used for synchronization of the message. Synchronization is important in concurrent, distributed systems, specifically because message responses must be paired with the messages that are being responded to.

What is a protocol used for?

A protocol is used to exchange messages between systems and sub-systems. Different protocols however, are used for different purposes. The SOAP (Simple Object Access) Protocol is used for exchanging structured information between web services. The SMTP (Simple Mail Transfer Protocol) Protocol is used to negotiate the exchange and transmission of electronic mail messages. HTTP (Hypertext Transfer Protocol) is a request-response protocol used for transmitting data between web browsers (the client) and web servers (the server) in a client-server computing model. The SMS (Short Message Service) Protocol is an example of a protocol that allows fixed line or mobile phone devices to exchange short text messages.

A protocol is used to exchange messages between systems, where such messages can (but not necessarily) result in the transmission of structured data in the form of status, files, or simple messages.

How is a protocol used?

According to the Shannon-Weaver model of communication - (1949) communication necessarily embodies an information source, a message, a transmitter, a signal, a channel, noise and a receiver, information destination and whole lot of electrical engineering details like errors, rates, capacity, etc. We will focus on the source (the sender), the message, and the receiver, allowing the other six layers of the OSI protocol stack to handle the other details.

In computing then, a protocol is used to send messages from a sender to a receiver, with the sender expecting a response of some type from the receiver. This model is the foundation of modern communication and information theory. Protocols are at the very foundation of all computing and they are the foundation upon which the internet has been built. There are literally thousands of protocols that are in daily use in our communications systems, each one tailored for a specific task in digital communication.

The case for standard protocols

According to statista there were 1.2 million apps in Apple’s App store in 2014. Since there are thousands of protocols, it follows that many of these apps share a common set of protocols. In fact the great majority of iOS apps use a very small set of protocols, known as the Internet protocol suite.

In addition there a number of architectural protocols (specifically REST, for Representational State Transfer and WS* for Web Services Interoperability Standard) that are used for different applications. REST is typically used when exposing a public API, where WS* (built on top of SOAP) offer a significantly greater level of functionality, for example transaction control with two-phase commit and reliable messaging.

REST is the most popular, built on top of HTTP and the great majority of mobile apps use some version of a REST api. Since it is built on top of HTTP, REST apis are limited to HTTP verbs (GET, PUT, POST, DELETE) and typically assume that results are cacheable (copies are retained by the client to avoid multiple requests for identical content). REST apis are ideally stateless, although few actually meet this ideal, meaning a security token needs to be transmitted with each request, or cookies are used to maintain state, both of which are particularly vulnerable to man in the middle attacks.

The most important attribute of a REST api is that it solves the locality problem inherent in any server built on top of a hierarchical system. Specifically, the challenge of locating a resource that can move within the hierarchy. REST apis provide a consistent namespace for resources, where the server can locate and deliver a named resource, independent of the state of the system and the context of the request. In this light, REST apis can be viewed as fundamentally a namespace solution, developed as a defense against that quirk of the hierarchical model that allows named resources to move around in the system.

WS* protocols are built on top of SOAP, which is an extensible XML based protocol with automatic error recovery. SOAPs major disadvantage is the associated complexity of and performance penalties associated with transmission and parsing of the large datasets implicit in XML based protocols. While WS* offers a more expressive set of functionality, the performance burdens and therefore issues of scalability are significant. SOAP based protocols also require a set of tools that are not well developed for all platforms.

The argument for using standard protocols is based on their availability (language, platform), that they are potentially more reliable than something newer (time in the field), and that they already have a broad audience (developer learning curve). These arguments must be weighed against the assumptions they make about the application using them (cache or no-cache, security) and how the limitations (performance, expressiveness, complexity) each present can be overcome.

Of particular note, is that neither REST or WS* are technically speaking protocols, but architectural communication solutions that leverage existing protocols like HTTP or even raw TCP sockets, and therefore each is constrained by their underlying protocols.

Another family of protocols are the RPC (Remote Procedure Call) protocols like JSON-RPC, XML-RPC, MsgPack-RPC, ProtoBuf-RPC, etc. This family of protocols are designed to allow a client program to request a named service from a remote server, and can either be synchronous or asynchronous and are particularly powerful in concurrent programming systems. In general, an RPC based approach offers a tighter integration with the applications programming language than REST or WS*, as the programmer need not be aware of ‘exiting’ to the network to communicate with remote services.

A new model

Now that we understand what a protocol is, what a protocol is for and how protocols are used, as well as the features and benefits of standard protocols and architectures, we can begin to develop the requirements for a communications protocol for use with the CloudMoDe model. As we have seen in previous articles, CloudMoDe offers a new organizational model, based on containers that enables both developers and end-users to focus on answering questions about how things are related. This basic syntax or way of organizing things is robust enough to model almost anything. So the first requirement of a protocol, is that it be able to express the syntax of the model. The unified semantics of the CloudMode data model, enable programmers to focus on the relationships within the data, thus the protocol needs to support the management of these relationships and the efficiencies that emerge from these unified semantics. This feature alone will increase the usability of the protocol, while simultaneously reducing the time it takes to learn both the model and the protocol.

We have also described how this consistent syntax, and unified semantics contribute to a new level of addressability, thus the protocol must enable programmers to address storage, navigation of the relationships modeled, and search, from any given point from within their application. Programmers can’t be expected to switch from a storage level protocol to a search protocol in the middle of an application workflow.

Finally, the CloudMoDe data model delivers an entirely new level of accessibility with it’s concept of Structural Control, providing a very fine level of control for how a particular component or container is not just accessed, but used. This accessibility is not based on files or cacheable chunks, but streams of data. Therefore, our protocol must directly support these structural controls centered around streams of data, and automatically disallow caching of any sort.

Protocol as Domain Specific Language

“In a sense, protocols are to communication what algorithms are to computation…a communication protocol allows one to specify or understand data communications without depending on detailed knowledge of a particular vendor’s network hardware” Comer 1995.

One of our goals in developing CloudMoDe has been to deliver a domain specific language for it’s data model. Such a domain specific language (or DSL) would provide programmers with the ability to understand and manipulate the models directly without having to map the model to a less expressive network protocol. An RPC style protocol, where the semantics of the protocol map to the semantics of the CloudMode data model would allow us to deliver this domain specific language, directly in the programming language used for application development.

The benefits for the programmer are profound. A custom DSL, implemented as a protocol, allows direct manipulation of the relationships and data structures of the model, in the native programming language of the application. There is no need for the programmer to map actions to a subset of verbs (as in REST) that have no obvious mapping to the action being taken in the program. Additionally, the programmer need not endure the complexity and performance degradation associated with encoding/decoding of XML based SOAP, not to mention the expense and complexity of SOAP tools. An RPC based DSL requires only learning the semantics of the CloudMode model, not understanding how to map the model to a potentially constrained and unexpressive network protocol.

Protocol as Arbiter/Enabler of Control

CloudMode introduces the idea of a contextual, Structural Control mechanism. This Structural Control mechanism associates a UsePolicy with a relationship or resource that defines a set of instructions for monitoring and enforcing the use of the relationship and resources identified by the relationship. With this new model of control, control emerges from communication in a context (the protocol). This Structural Control concept is fundamentally different and separate from security, where security is defined as “protection from destructive forces, and the unwanted actions of unauthorized users”. With Structural Controls the need for security is reduced or at the very least, radically transformed.

Such a Structural Control mechanism is based on communication in a context. The communications protocol is the channel which is responsible for maintaining the integrity of the context, whether it be per message, or across messages. This implies that the channel should be stateful, and in fact the channel (connection) can be responsible maintaining the state of the identity of the sender, where the sender is both the identity of the application/appliance (controlled by the Catalog in the CloudMode model) and the identity of the User (controlled by the uniqueness quantification method used in authorization).

The Structural Control model of CloudMode includes monitoring and enforcement, an explicit feature that is automatically included in every message that is transmitted over the channel. This monitoring and control feature is employed to guarantee uniqueness of the transaction defined in the message. The implementation of this monitoring and enforcement feature is greatly simplified with a stateful, persistent connection.

Finally, the Structural Control model delivers streams rather than files, and requires real-time messaging capabilities during the streaming process, and therefore a full duplex (bi-directional) persistent connection is required.

Protocol Performance

The proliferation of the REST architecture has created a certain bias agains stateful connections, or alternatively, a bias for stateless protocols. The argument for this ‘stateless’ communication, is based on a need for scalability. Specifically, the need for horizontal scalability, the ability to increase server capacity horizontally in an ad-hoc manner, with little or no impact on the service (and clients of the service) being offered. Horizontal scalability by contrast would increase capacity by adding a larger cpu, with more memory with increased network throughput. This horizontal approach could impact services when one server is being taken down and it’s replacement is placed into service.

The argument, is that a stateless message can be directed/routed to any number of servers, and it is entirely independent. The assumptions implicit in this argument are 1) that the stateless message can or is creating a new connection for each request, and 2) that there is indeed no state associated with the request. The reality is, due to performance considerations, that REST connections are almost always persistent. The second reality is that while the REST API request may provide all the context the server needs to satisfy a particular request (alternatively the session identifier may be carried in a cookie), for anything but the most general non-user specific service, some kind of identification token must be included in the request to verify that 1) the user is authorized, 2) the user is authorized to access the resource being requested, and 3) to gather user specific data that is need to identify/compute the resource.

A decade ago, when the REST architecture was formulated, persistent connections were expensive. Today, commodity hardware with inexpensive 10GB ethernet adapters can easily support upwards of 1 million connections per server. In fact these connections are so cheap, that web browsers will open multiple simultaneous connections to a server during page loads. So one of the historical reasons for avoiding persistent connections has been eliminated. But the biggest sale point (stateless) requires potentially expensive re-authenticating of each request, and are vulnerable to man-in-the-middle style attack vectors.

REST also includes a significant overhead in terms of payload size: every request and response must include a full HTTP request or response header, transmitted in plain text. Finally, REST, because it is implemented on top of HTTP (essentially file copying/sharing protocol), does not natively support streaming, nor the ability to implement interactive controls of the stream.

A review of the Netflix micro services architecture shows that while the micro services themselves use a stateless REST api, their is a client facing API gateway that “runs client-specific adapter code that provides each client with an API that’s best suited to it’s requirements”. The Netflix client facing API gateway is a specific response to the overhead issues associated with clients having to use non-performant REST APIs to move, that transmitted redundant or unused data to clients.

Communication Synchronization

From our original definition of protocol, the third primary task of a protocol is communication synchronization, or more simply, a protocol must match a response with the sender of the message. In the case of a basic, synchronous request-response client-server application, there is a single, synchronously delivered response for every request. That is how HTTP is implemented. This does not allow for a request that can result in multiple asynchronous responses, say, a query that requests 100 thumbnail images from a server. The first images may be streamed immediately, but others may be delayed for reasons of server capacity or client network bandwidth. In the case of HTTP, 100 thumbnails means 100 requests and 100 responses. A protocol that supports asynchronous responses would cut that literally in half, 1 request, 100 responses, a significant reduction in both bandwidth use and server capacity utilization.

Every Behavior has a Target

The CloudMoDe model of semantic computing does not naturally map to the linguistic and semantic conventions that have been evolved to serve the Web, HTTP, REST and SOAP (and it’s popular stepchild AJAX). Early in the last decade, RPC based protocols lost popularity due to the expense of persistent connections (see above), and the unreliability of networks, particularly 3G mobile networks. We’ve come a long way in terms of lowering network and connection costs, as well as improving the reliability of the underlying network itself, and RPC based protocols are becoming more popular (witness JSON-RPC, MsgPack-RPC, etc). The most popular networking protocol in use on the iOS platform is AFNetworking, and it supports a JSON-RPC client out of the box.

The general form of an RPC protocol identifies a procedure to be called on a recipient. In the vernacular of CloudMode, “every behavior has a target”, where the behavior is the procedure/function name, and the target is the recipient of the call. In most programming languages, particularly object oriented languages, the recipient (object) receives a method (function) call/message. In CloudMode, we wanted to make it more like natural language, where the verb (the behavior) precedes the subject (the target), as in “make user” or “fill subject”. This approach also allows the specification of verbs (behaviors) that more naturally express the behavior that is to be performed. Verbs like “make”, “fill”, “stream” and “destroy” replace, PUT (or is that POST?), POST (or is that PUT?), GET and DELETE. Our goal is to use the protocol to remove any ambiguity possible in the language, reducing errors, and radically reducing the learning curve. If the protocol directly expresses the actions to be performed on the model, there is no learning curve for the protocol, only the model.

In addition the GHOST protocol and it’s concept of a low-overhead, persistent and stateful connection provide the perfect mechanism for automatically managing one of the most onerous tasks any application developer has to face: authentication and identity management, the dreaded login screen. It’s no secret for example, that implementing OAuth in an app takes twice as much code as a custom login implementation. CloudMode’s new password free, secure login approach called ProfileTray, can be installed in any client, in any language in less than ten lines of code. The persistent GHOST connection automatically handles errors, connect and reconnection responsibilities, letting the programmer focus on programming.

Finally, the GHOST protocol is the delivery platform for CloudMode’s Structured Control mechanism, with built-in streaming and the clients that can manage, monitor and enforce it’s advanced system of Structural Controls.


WHAT DID WE LEARN

the summary

  • ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
  • ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
  • ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
  • ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.

Conclusion

Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam.

The origins of the term ‘GHOST’ as a name for the protocol. The GHOST protocol in fact predates said film, which was a really funny coincidence. We used the term to mean, a thankless job that no one wants and therefore you operate as if your own. When Michael, our CTO, decided to write a protocol. As CEO, I reluctantly agreed because writing a protocol is one of those things in life that are important but it's a thankless job. If you win no one knows. If you lose no one cares. Yet it is critical and once you take the job, you cannot fail because everything is depending on it but there is no credit. It is the work of a Ghost. It is ghost work. That is where the name comes from. Then the movie came out and then the name was cool, so it stuck.
--Dhryl Anton"

Comments(disqus)