Even though I wouldn’t say that it’s got Redis shaking in its boots, the news of microservice caching from Pivotal is still interesting, all the same.
And if I were Redis, I’d definitely take notice.
Even though I wouldn’t say that it’s got Redis shaking in its boots, the news of microservice caching from Pivotal is still interesting, all the same.
And if I were Redis, I’d definitely take notice.
Even though one could say that it was more of an abstract presentation, I watched and enjoyed the InfoQ talk about microservice patterns by Mike Amundsen. Basically, he introduced patterns for microservices that could help design and transform them into hypermedia services. Even though the talk about hypermedia services isn’t new (especially with the increased presence of HATEOAS-driven REST APIs), it’s true that there hasn’t been enough discussion about the design patterns for the servers and clients which implement them. In particular, I liked his idea for a Representor pattern, which addressed a problem that I hadn’t even thought about. If you’re curious about the next evolution of communication within distributed architectures, then I would definitely recommend checking it out.
However, along with design patterns, another aspect of hypermedia services hasn’t been discussed in detail. In particular, there’s a general lack of proposed designs for code development and for database schemas (with the latter being the storage for the services’ vocabularies mentioned by Amundsen). Hmmm…what could we use to address this particular problem…Wait a minute! I know. What about MDD? In fact, I think that somebody has already written an article about this very subject of hypermedia!
…Sorry. I couldn’t help myself.
Having low expectations due to being old and jaded, I downloaded the Slack-alternative package known as MatterMost and then proceeded to install it onto Red Hat with their instructions. (Slack would seem like an obvious choice, but it’s a less viable candidate for enterprise due to a number of possible reasons: contracts with competing vendors, auditing, corporate secrets, etc.). First and foremost, I appreciated the readability of their straightforward description regarding the prerequisite steps, even before installing the actual product. (Since my teenage days of being a part-time administrator are long gone, I need somebody to hold my hand.) The installation of the product itself was also fairly easy, and I was able to get it working quickly thereafter. If I hadn’t been busy with a few other things at the time, it might have taken me only a few minutes to prepare everything. So, kudos to Mattermost for that! Plus, just like Slack, I was able to communicate with the server through a number of available clients.
But I wasn’t here for just an inhouse chat server, though that’s pretty cool. Instead, as I mentioned in my last post, I was here to investigate the possibility of deploying useful bots for my team. In the same way that Atomist’s bots help software companies manage their projects, I’d like to create my own bots that can help monitor and control the custom distributed architecture that exists inside our walls. For the more interactive bots, they have two types of web hooks: incoming and outgoing. Since I wanted to keep it simple, I went with the Slash commands. Now, you have to pay close attention here, since they have explicit instructions on how to implement the callback correctly. (Do not stray from the instructions, or you will face certain doom.) However, if you follow them exactly, the Slash command will work without fail. I created a simple bot that returned markup text for a table, which displayed the status of several daemons running on our servers. Again, I was impressed!
So, now, what’s next? Well, long ago, I wrote an InfoQ article about creating web apps that were driven by a MDD architecture. In this case, I don’t see why the same couldn’t be employed with a set of bots. With a little bit of planning, I can envision simple yet effective tools available through such a network. Constructed within a MDD framework, their functionality could be adjusted with just a few modifications to metadata. So, much like a Silicon Valley project manager can get the status of a software build while texting on his phone with a colleague, it’s now possible for me to chat and run a few commands against my distributed architecture. (Finally, I won’t have to whip out the laptop and dial into the VPN every time something happens.) And all while still fitting into corporate compliance! Well, there’s still a lot of work to be done, but it’s now looking much more feasible now than a few weeks ago. And I still have to sell it to the brass…but I got my fingers crossed.
Since I’m in the process of porting our MDD framework to Red Hat Linux, I was curious about the various new tools available to me. (It also feels good to get dirty with another environment again, especially an old friend like Linux. It’s refreshing.) I found some interesting new packages, including OpenShift (which could be useful if we decide to consolidate applications to only a few servers.) However, personally, the most interesting package was something that I’ve been seeking for a while. Behold MatterMost!
A couple months ago, I had complained that bots weren’t generally useful. However, I had found one useful scenario for myself, if only a given platform would support it. Basically, I’d like an interactive bot that could give me status information about my distributed system (currently running processes on multiple machines, database row contents, etc.), all through a secure messaging client on my phone. And apparently, MatterMost is just that platform! According to the site, I can install the software onto RHEL, run my own messaging service (with native mobile clients), and then integrate bots into that running instance of the service. And at a fairly modest price…Now I just need to get the brass to sign off on it, in terms of both budget and network integration. It’s a long shot, but at least it’s now a distinct possibility.
Well, as I said before, I’d get back to metadata-driven design…and here we are!
So, as I was perusing InfoQ one day a few weeks ago, I stumbled upon an interesting video by Vinicius Carvalho of Pivotal. Basically, within the video, Vinicius (okay, I’ll admit it – it’s a cool name that I wish I had) addresses an issue familiar to anyone who creates web services : how does one evolve a payload’s schema without breaking the clients of users who referenced the old schema? For example, if we were returning a payload with a property/tag called ‘Price’ and if we wanted a new version of the schema to replace that tag with ‘PubPrice’, how could we do that without requiring every user to change their client/consumer app? These kinds of presentations are my favorites, since they address real-world problems.
So, in his presentation, Vinicius goes about demonstrating how one can create a solution to such a dilemma. Since he does work for Pivotal, he uses the Spring platform to present a scenario where a web service has an original schema that needs to be altered in its eventual evolution. (Granted, he’s probably a fan of the Spring framework, which means that he’s a fan of event-driven frameworks…but we won’t hold that against him. I’m kidding, I’m kidding…take it easy, Spring zealots.) For the first few minutes, Vinicius focuses on format, which is important when discussing web servers that return payloads. (And, yes, I agree with him: JSON is bad, mmmkay.) Even though I’ve never used it since we require verbosity (i.e., XML/JSON) from my stakeholders, he does make a compelling case for the Avro protocol; it does appear to be an impressive format, one that can be very powerful in capable hands.
However, the most interesting part is when Vinicius begins to talk about the actual mechanism for resolving the focus of this presentation: schema servers and their registries of schema versions. Basically, using features within Avro, a schema server allows the creator of a web service to register their original schema and any subsequent versions of it. When a new version of the payload’s schema is conceived, the new schema can be submitted to the schema server in order to test whether it breaks (i.e., is not backward-compatible with) any older versions of the schema. Plus, accompanied with markup language, the new schema can indicate any tags which will replace tags existing in the previous version. So, when an older client submits data to the updated web service, the web service can use the schema server as a translation device. Nifty!
I love this solution, and I want to commend Vinicius and his colleagues for sharing a solution to a common problem. However, what if we wanted to evolve this solution for more complex scenarios, where the changes to the payload are more involved? For example, what if we wanted to split one tag into several others? Or what if we wanted to replace an entire composite with another one? In this case, simple markup language wouldn’t be sufficient enough to indicate how the schema server could transform the data from one form to another. In this scenario, you would need to create a tool that could help you define such transformations systematically, and you would need the right methodology in order to build it. You might know where I’m going with this one…Yes, I think that this is where the application of MDD could produce the guts of the schema server and make it even more powerful!
Given, if we stayed with Avro, it would be difficult to create such a translation service; the functionality built into Avro is likely difficult to extend (if even possible). However, if we use XML (which, in my line of work, we are most apt to use) as our API payload’s vehicle, we could use something that Vinicius mentions in his talk: XSLT. Even though I can’t say that I never cursed when using it, it can be a helpful tool in certain cases…and in the case of creating a MDD server for schema translation, it fits perfectly! Using MDD, we could create a schema server that could generate the appropriate XSLT and then perform complex conversions from one XML schema to another. I have a few ideas on how to make such a thing work…but that’s for another time.
In the last post, we had finally finished gathering all of the metadata ingredients required for our recipe, and we were now planning on our cake’s final layer: the permissions algorithm. Of course, we could simply code the algorithm in the flowchart displayed beforehand in Part 1, using the metadata in a few modules to enforce our security paradigm. And then we could call it a day. Sure, we could finish there…but that would be an antithetical conclusion to all of the work done up to this point. After all, a major argument for MDD is to reduce the need for recompilation and/or redeployment of software; we’ve already used it to create an unique ORM framework that has been leveraged by such a design. So, instead of cementing these rules of our permissions algorithm into our code base, could we make them more dynamic as well?
As it turns out, yes, we can. In a previous InfoQ article, I described how one could create a business rules engine driven by a simple pidgin, a black box with a lexicon whose terms are determined by the very same metadata that drives the main infrastructure of the architecture. In addition to business rules, we can take advantage of the same approach when we want to implement the permissions algorithm, and for the sake of clarity, we will call this black box a permissions engine. We can then compose an enterprise DSL that incorporates both the Attributes and the contextual data describing our records. Since it should be comprehensible to a literate business user, we should design it with the priority of being easily reviewed and maintained by various stakeholders. As an example, let’s handle the scenario depicted in the flowchart from Part 1 by creating and showcasing a snippet using such a DSL:
In this sample (and much like the scenario described in the InfoQ article), our engine expects two records to be supplied in order to successfully execute the permissions algorithm. There is the Current instance of a record and its Attributes on the system, and there is an Incoming instance of the record with new data (which has been submitted by IncomingUser). While the overall system has the responsibility of loading and then passing along these relevant records, it becomes the duty of the permissions engine to understand and then enforce the rules of this presented sample. In this case, our permissions engine will use both records and any potential contextual data to determine whether or not to save the incoming “Price” Attribute.
Even though it is possible that the “Price” Attribute deserves special consideration, it’s more than likely that we will want to treat most or all of these Attributes in the same manner when processing a single record. In that case, we could use this sample as our default algorithm and wish to associate it with many (if not all) of our Attributes. By substituting “Price” with a placemarker that will eventually be replaced by our target Attribute name, we could reuse this same DSL snippet above and eliminate the pointless redundancy of multiple copies. However, if there are indeed special cases, we still have the option to create separate algorithms. All of these algorithms can then be stored as documents within an optimal retrieval database (like MongoDB) and pulled later by the permissions engine during its initialization. With just a bit of organization, we are now able to create a truly customized permission system for each data point that is submitted to the system.
With our permissions engine, this subsystem will then determine the applicability of each data point and whether it has the clearance to be persisted into our database. On top of this important functionality, we can reuse this subsystem in a number of ways. It’s not outside the realm of possibility that certain qualified stakeholders might want a preliminary report on the levels of clearance for their data before submission, so that they can get a preview of its fate. (For those who are fans of the movie “Minority Report”, the term precog certainly comes to mind.) In some cases, such intel could prevent wasted attempts at saving certain payloads (which can be a time-consuming process of multiple validation and correction phases) and could give those stakeholders the chance to address whatever potential obstacles might block their way. For example, Bob Smith might have the price locked on a few records, and Sue Doe might need Bob to lift the lock temporarily in order to submit a few emergency updates. In that case, we could create a user-facing service that simply invokes this subsystem, in order to generate and return such a report to the calling stakeholder. In that way, Sue does not need to learn about this discrepancy minutes later, when she receives the final status of her submitted record batch; she can quickly get this information beforehand, saving valuable time.
At the start of this series, it was my intent to present the feasibility of creating a permissions subsystem using MDD. Hopefully, at this point, I have at least shown that. With some ingenuity, we could likely repurpose this subsystem for yet other uses. Better still, we can clone this subsystem and adopt it as a solution for other similar dilemmas. It would fulfill my youthful dream of applying such a method to a filesystem, and I still think that such a strategy is possible. However, I will leave the onus of such a chore to a younger version of myself with more spare time…or, better yet, to posterity.
You can find the first post in this series here.
Since we are discussing the specifications that could drive such a subsystem of an architecture, we need to first identify a methodology that provides optimal benefits when building it. We need a design method that provides both a great deal of flexibility and incorporates a granular approach to dealing with data. In such a case, I would turn to metadata-driven design, on which I have expounded in the past.
So, what is metadata-driven design (i.e., MDD)? For the sake of brevity, MDD can be thought of as an increment to domain-driven design, where metadata provides the blueprint for the storage, the data structures, and the functionality inherent to an enterprise-scale application. Through the addition of more rows of metadata, stakeholders can extend the scope and functionality of the application with little to no additional software development. However, if any actual software development is required to enhance the platform due to some unforeseen complexity, it should not present that much difficulty; this increment should also be able to use additional sets of metadata, existing as extra layers on top of the original set(s). These additional layers can be thought of as dimensions, and much like a communications protocol, the set of these layers can be thought of as a stack.
So, let’s showcase an artifact from the InfoQ article that started it all, which provides an introduction to MDD:
In this image, we have an example displaying a set of metadata that describes a logical group of data points (i.e., Attributes); we can address them as a group, especially since they reside on the same table. Using this metadata, we could generate static data structures, but by using “flexible” structures, we can gain the benefit of not having to recompile or redeploy any of the code for our servers. So, we will create flexible data structures that act more like a series of nested containers, using the metadata to determine the hierarchy of this container.
For example, these attributes could be collected in a hash table of “[GroupName] > [Attributes]”, where the key “GroupName” is a string and the value “Attributes” is another hash table; the “Attributes” hash table could contain the actual pairing of each Attribute to its value. This particular metadata helps to construct the base layer of our stack. On top of that initial dimension (and presented in the bottom of the image), we have a new set of metadata that describes permissions of users in relation to each Attribute; by doing so, we have created the preliminary parameters for our permissions schema. However, as stated earlier, we need more information in order to have a more complete perspective.
So far, we have a way of packaging the actual data, and we have a way of storing user permissions in relation to the data points. Now, in order to finally create our permissions subsystem, we need a set of information that describes the state and history of each data point. Let’s assume that whenever we persist data to our system, we also write records to an auditing repository that describe the events that have just occurred. (After all, it’s recommended to have such a recorded history on hand; it’s our protective shield in the face of the dragon known as SOX.) For example, if the stakeholder Bob Smith changed a price from $3.99 to $2.99, we would log just that, along with any other edits from years prior. When we need to know who made the last edit for this product’s price, we could scour this huge table, with many rows listing such details from months and years ago…but for the advantage of performance and general reliability, we should designate a place to put particularly vital information, like data about the most recent edit. So, we will create yet another dimension to the metadata and add it to our stack.
Much like the initial dimension that described the structures of the actual data, this dimension will describe a structure that represents the state and history of each instance of an Attribute (i.e., per record), and this information will also be persisted to a table. We will call this information contextual data:
Now that we have a definition of contextual metadata, we can start to employ it when we persist main records, writing context records in parallel. Take the following set of contextual data as an example instance, which follows the definition in our metadata:
This contextual data lists a number of important properties regarding the “Price” Attribute on the record with EAN 1234567890. It tells us that Bob Smith locked this record’s price on 5/10/2016, and it tells us that Bob Smith also edited the price on the same day. Using some simple queries, one can find this data within our immense auditing repository. However, by updating this simple table while simultaneously depositing to the huge vault of our auditing, we can improve performance by having such pertinent information quickly returned via simple queries. Now that we have our final, requisite ingredient for our recipe, we can finally take the steps to create our desired subsystem, of which we will do in Part 3.