While reading Slashdot news the other day (I know, I know…all the cool kids read Hacker News), I stumbled upon some news about RethinkDB that didn’t seem all that relevant to me at first. However, upon reading some of RethinkDB’s documentation, I realized that such a database might be another useful tool in a MDD-constructed engine for XML schema evolution. From time to time, I still think about an engine capable of enabling complex translations between different versions of a XML schema; essentially, this engine would generate and then use XSLT files based on the metadata inside certain database tables. If we put this metadata inside tables of a RethinkDB database, RethinkDB’s functionality of posting messages to a queue (when a table’s structure or data has been changed) would be a neat way to trigger the rebuild of these XSLT files!
So, after my last post, I got curious: is there any software out there that performs XML schema evolution, even if it’s proprietary? Oddly, after searching for a few minutes, the answer “no” seemed to be coming back from the web. Now, Oracle and IBM do offer a service to update your current XML documents according to a new schema…but only if it doesn’t invalidate the old schema. Basically, their “evolution” functionality allows you to further refine your schema’s rules, like changing the maximum/minimum of a tag’s occurrence or adding a new required tag. That’s hardly any sort of evolution; it doesn’t even provide the ability to automatically rename tags/properties like Avro! So, the claims of Oracle and IBM might be more marketing than engineering.
But I guess that marketing and buzzwords are all too normal in software…After all, whoever coined the term string interpolation definitely took some severe liberties, since it’s sure a long way off from real interpolation. In any case, there seems to be an opening for a niche market here, one which could be somewhat lucrative. However, these days, all the big bets of towering chips are on the table of machine learning, big data, and AI. In the eyes of the major league, anything that deals with XML (i.e., old-school data processing) should go play the slot machines.
Good for me…I don’t mind being stuck alone in a dark corner! Reminds me of playing Street Fighter 2 by myself in the back of a pizza parlor and having a blast…In any case, I was looking for tools that could help build an engine for XML schema evolution. Interestingly, I found an open source project by Dmitry Pekar that can convert both ways between XML and Avro. That could help by extending the functionality already in Avro…but besides the simple renaming of tags/properties, it doesn’t satisfy my proposed requirements. (Plus, your distributed architecture would have to ultimately use Avro, which would be a refactoring headache in some instances.) I haven’t found anything else yet, which makes me suspect that my handcrafted MDD approach might be the only viable option.
Well, as I said before, I’d get back to metadata-driven design…and here we are!
So, as I was perusing InfoQ one day a few weeks ago, I stumbled upon an interesting video by Vinicius Carvalho of Pivotal. Basically, within the video, Vinicius (okay, I’ll admit it – it’s a cool name that I wish I had) addresses an issue familiar to anyone who creates web services : how does one evolve a payload’s schema without breaking the clients of users who referenced the old schema? For example, if we were returning a payload with a property/tag called ‘Price’ and if we wanted a new version of the schema to replace that tag with ‘PubPrice’, how could we do that without requiring every user to change their client/consumer app? These kinds of presentations are my favorites, since they address real-world problems.
So, in his presentation, Vinicius goes about demonstrating how one can create a solution to such a dilemma. Since he does work for Pivotal, he uses the Spring platform to present a scenario where a web service has an original schema that needs to be altered in its eventual evolution. (Granted, he’s probably a fan of the Spring framework, which means that he’s a fan of event-driven frameworks…but we won’t hold that against him. I’m kidding, I’m kidding…take it easy, Spring zealots.) For the first few minutes, Vinicius focuses on format, which is important when discussing web servers that return payloads. (And, yes, I agree with him: JSON is bad, mmmkay.) Even though I’ve never used it since we require verbosity (i.e., XML/JSON) from my stakeholders, he does make a compelling case for the Avro protocol; it does appear to be an impressive format, one that can be very powerful in capable hands.
However, the most interesting part is when Vinicius begins to talk about the actual mechanism for resolving the focus of this presentation: schema servers and their registries of schema versions. Basically, using features within Avro, a schema server allows the creator of a web service to register their original schema and any subsequent versions of it. When a new version of the payload’s schema is conceived, the new schema can be submitted to the schema server in order to test whether it breaks (i.e., is not backward-compatible with) any older versions of the schema. Plus, accompanied with markup language, the new schema can indicate any tags which will replace tags existing in the previous version. So, when an older client submits data to the updated web service, the web service can use the schema server as a translation device. Nifty!
I love this solution, and I want to commend Vinicius and his colleagues for sharing a solution to a common problem. However, what if we wanted to evolve this solution for more complex scenarios, where the changes to the payload are more involved? For example, what if we wanted to split one tag into several others? Or what if we wanted to replace an entire composite with another one? In this case, simple markup language wouldn’t be sufficient enough to indicate how the schema server could transform the data from one form to another. In this scenario, you would need to create a tool that could help you define such transformations systematically, and you would need the right methodology in order to build it. You might know where I’m going with this one…Yes, I think that this is where the application of MDD could produce the guts of the schema server and make it even more powerful!
Given, if we stayed with Avro, it would be difficult to create such a translation service; the functionality built into Avro is likely difficult to extend (if even possible). However, if we use XML (which, in my line of work, we are most apt to use) as our API payload’s vehicle, we could use something that Vinicius mentions in his talk: XSLT. Even though I can’t say that I never cursed when using it, it can be a helpful tool in certain cases…and in the case of creating a MDD server for schema translation, it fits perfectly! Using MDD, we could create a schema server that could generate the appropriate XSLT and then perform complex conversions from one XML schema to another. I have a few ideas on how to make such a thing work…but that’s for another time.