Category Archives: StackOverflow

Galvanize This: Hanging Out with Redis

So, I attended a Redis workshop a few days ago, at the New York “campus” (which is a buzzy, DB-esque industry word that I loathe) of Galvanize. Even though I usually don’t go for these meetup/workspace type of places, I’ll admit that this one was rather pleasant. Unlike other places, it did a good job of finding that fine line between casual and professional. For example, no beanbags anywhere. Because as much as I love beanbags myself (i.e., I have two at home), we can’t look at your screen together unless I get on my knees or I crawl onto the beanbag with you…which might be uncomfortable in many ways for the both of us. Plus, the space had reliable WiFi for most of the time I was there, unlike some other places.

Even though I’ve already been dealing with Redis at work for a little while now and generally impressed with its performance, it never hurts to try and learn something from the masters. (Unfortunately, AntiRez himself did not leave Sicily and fly over to teach us.) I was curious how they were going to showcase the tech and if we were going to just sit there and watch, when they instructed us to download Docker. As it turned out, we were going to learn the lesson via containers with Jupyter Notebooks, which I had never heard of before. And since I yearn for the era of interactive documentation, I couldn’t have been happier. (On a side note, I only recently learned about KataCoda, which I love just as much, if not more.)

Even though the second half of the day was your familiar salespitch for Redis Enterprise and Redis Cloud (which did seem to be an appealing purchase), the first half of the day was when they taught about the product itself. For the most part, it wasn’t anything new to me, aside from the occasional bit of trivia. (Lua is the language used by the Redis CLI? Huh. It’s come a long way since being just the scripting language for WoW skins.) I did learn a few tidbits about the data structures (like the existence of HyperLogLog), but since I use Spring Caching in our microservices, we don’t pay that much attention to them.

Instead, it was interesting to learn about the features of Redis that we weren’t even leveraging yet at work. For example, you can write your own extensions to Redis using C. Which I’d be tempted to do just because, since I miss writing in C…Also, it was interesting to learn about the various modules that were already available for Redis, with functionality ranging from machine learning to bloom filters. And that’s when I recognized a pattern seen before. Much like every other tech company, there seems to be the desire to get into whatever is hot, to survive as a company by being more horizontal. However, I would implore Redis to be careful and to never neglect your core mission. I, for one, don’t really need machine learning, but I’d like Redis to work with Spring (i.e., Pivotal) to further develop the Spring Data Redis layer and make it configurable, so I can easily direct reads to slave nodes. I need that, not machine learning. So, even though I didn’t learn a great deal about Redis by attending, I got to see the general direction of the company. In that way, I’d say that the trip to Galvanize was worth it.

That, and spears of fresh fruit.

You just can’t argue with fresh fruit spears, where the fruit is cut into various geometrical shapes. It just plucks the right strings of geeky hearts.


Resist Bad Data, Part I: The Horrid Pain of Incomplete XHTML Entities and Encodings

The avant-garde of the software world may have migrated to greener pastures, munching on more lush hardware and grazing on more dynamic software. However, if you work on software in a more “established” industry like retail or publishing (or, as in my case, the intersection of them), then you’re accustomed to the entrenched practices of institutions. A scant few are worse than others, refusing to give up their fax machines in exchange for scanners and PDFs. However, most of these mentioned ancients do move, although at a slower pace. In that sense, there are many more days until XML becomes discarded as the standard format for data exchange. Take ONIX, for example.

And since I support the consumption of this XML standard, I must anticipate the various issues that might be encountered with it. For those of you who don’t deal with this type of madness, the rest of this post probably means nothing to you. For those of you that do, however…you are my brothers and sisters, my comrades in the trenches. You have my full empathy. And due to our shared bond, I am compelled to help you.

Of course, when you receive a XML file, you want to validate its structure (and, in some cases, its content). “But who would send improperly formatted data, especially if you have a business relationship? Surely they would have validated it before releasing it onto the world?” Oh, how I wish that were true. On the plus side, most providers of XML data do get the basics down. For example, they have opening and closing tags, and they know how to spell the name of their own company in the comments. On the negative side, they may not understand the XML standard completely, and since they don’t run a XML package to validate their own files, the content (i.e., the inner text within the tags) can cause the whole file to be invalid in the eyes of a XML parser. I’m sure that you know what I’m talking about, my comrades.

Take for example the following ONIX XML:

<TitleText>&#9996; I Don’t Know How to Create a XML File Properly &#9996; I Should Just Color Books with My Fellow Ni&#x000F1;os for 3 A&#x000F1;os &#9996; - Help Me Color Ni&#x000F1;os - &#Xae Ni&#x000F1;os! - Ni&#x000os!</TitleText>
<Subtitle>Yay&#9996Yay All Play and No Work for Me &gt; &sum; Just Play D&D and D & D &#99 with My &#9996 Boys - Moy Fun &#8364; (Spanish Edition) &#x000F1; &#</Subtitle>

There are a few incomplete encodings here (like “&#Xae” and “&#9996” and “&#”) that will cause the file to fail validation. (And, no, “&#sum;” would not fail here, since it’s valid in the eyes of the ONIX DTD.) And since I don’t like manually combing through a 600 MB file and fixing each grotesque instance, we should create an automated solution, using something dangerously powerful. Yes…I am talking about regular expressions. Of course, this issue isn’t exactly a new one, since developers have been talking about it again and again for a while. However, most of the solutions don’t address all the issues at once, like presented above.

So, after spending the good part of a day desperately trying to remember the idiosyncrasies of regular expressions (control groups, etc.), I came up with a more encompassing solution:

$line =~ s/(&#?x?[A-Za-z0-9]+;)|&#\d*/$1/g;

If applied via Perl to the sample XML mentioned above, it results in the following:

<TitleText>&#9996; I Don’t Know How to Create a XML File Properly &#9996; I Should Just Color Books with My Fellow Ni&#x000F1;os for 3 A&#x000F1;os &#9996; - Help Me Color Ni&#x000F1;os - Xae Ni&#x000F1;os! - Nix000os!</TitleText>
<Subtitle>YayYay All Play and No Work for Me &gt; &sum; Just Play D&D and D & D with My Boys - Moy Fun &#8364; (Spanish Edition) &#x000F1; </Subtitle>

And voilà! Your validation issues are all gone, and the rest of your data has not been mauled or decimated. Well, not terribly, anyway. Plus, it’s pretty darn fast. (Unless of course you’re running Perl on Windows. Then you might as well take a long lunch and a nice nap before it’s finished.) Now, in my case, I wanted to only remove the numeric encodings via the second control group (i.e., &#\d*), and I wanted to keep the hex encodings (like “&#x000”) and alphabet encodings (like “&#Xae”) for further analysis. So, you may want to modify the expression if you want to handle the latter two in a different way. Also, it should be noted that it does not handle incomplete HTML entities. For example, if the provider gives you something like “&gt” where it’s missing the semicolon, this expression will not help you. Also, if the provider gives you an incorrect value for an encoding (like “&#x000;”), it definitely won’t help you. However, you could modify it or use it as a template for an expression that targets those problems specifically.

In future posts, I’ll talk about other options for this kind of situation, by making use of either C# or Java. Hopefully, this post will save you the hours that I had to spend. And if you have any useful advice for how to address such bad data, I’d be welcome to hear it. Since data providers will always issue bad data, we’ll always need more tools at our disposal. Unfortunately, though, I am forced to ignore any tips on arson or demolition, since violence is never the answer.

At least, that’s what I’ve been told.

.NET and XML: Mortal Enemies

In the past, I’ve had my issues when dealing with the compatibility between C# and XML, so much so that I’ve been inspired to write prose. At this point, I feel that it’s my duty for posterity to document the various instances where Microsoft drops the proverbial XML ball:

  1. As stated before, DTD validation of XML files (with the C# XmlReader) is broken. And when I say broken, I mean that it performs as well as a thirty-year-old VCR pulled from the ocean and then placed in a vat of hydrochloric acid. Now plug that VCR in. You’re going to get the same results.
  2. If you’re hoping to create a XSD from available XML or classes with the Visual Studio Tool (i.e., XSDT), good luck. For one, it doesn’t allow you to specify any complex rules. Also, if you’re generating a XSD schema from classes, you should know that it doesn’t work with Nullable types or any Dictionary properties, since they couldn’t budget it into their schedule.
  3. “Hey, I have a container that’s set to null. So I’ll tell C# not to mention that tag at all upon serialization, since it’s just wasted space and since I might not want that tag visible in certain cases.” It is possible…but under specific circumstances. If you plan on serializing your class as is and if you plan on not using any C# attributes on your properties, maybe. Otherwise? Nope. C# says “No soup for you!”
  4. If you want to serialize an array of strings, you get a free bonus upon serialization: all of the tags will be prefixed with a namespace (i.e., the namespace for array serialization). If you don’t want that ugly prefix, just create a wrapper container for a string array. Now do this for every literal type you can think of.
  5. Let’s say that you have a XML doc with a string that is a HTTP link. You now want to parse that XML and read that value. Of course, if that link contains a query string, all of those ampersands are already converted into instances of “&amp;“, in order to be compatible with XML standards. “But I want my data to be read incorrectly!” Well, you’re just in luck, friend! Because when you use XmlDocument.Load() to deserialize that XML, that query string will contain “amp;” and not “&amp;“, essentially turning the link into garbage. Huzzah!

These are only a few instances, but there’s more. (Don’t even get me started on deserializing payloads from a ASP.NET web service.) If anyone out there has their own experiences to add, I’d love to hear them.

Quick Tangent: An Open Standard for Indoor Navigation

“Since when did this blog become solely about indoor navigation? I thought that this thing was supposed to be about metadata?” Well…I can’t argue with you. I need to get back into that at some point.

In any case, after sampling different platforms for indoor navigation, I’ve come to notice something: there is no open source standard for indoor navigation yet. Of course, there is much discussion about indoor navigation within the gaming industry, especially within the community for Google’s Project Tango…but there isn’t as much talk when it comes to API standards. Considering that open source standards seem to be falling from the sky in the last few years (for cloud computing, for automobiles, etc.), it seems fitting that one of the bigger players (like Indoor Atlas or Estimote) should take this opportunity to lead the way with an open standard. As IoT invades our lives, indoor navigation will probably become more prevalent, and more development standards would be beneficial to the industry. (Of course, these Northern Europeans are probably too busy slugging it out, since that’s part of the travails of being a startup company.) I suppose another unmentioned company could also take this lead, but I’ve found that many are not as developer-friendly as Indoor Atlas and Estimote; most require any interested parties to fill out an application before even allowing access to their documentation.

Now, it wouldn’t have to be an all-encompassing standard, but it should probably take into account each of the strengths in the current set of available platforms. In addition to the practice of emulating Apple’s Location Manager (which they all seem to do in their iOS SDKs), an open standard could include interface methods for functionality like:

  • The ability to overlay the indoor navigation map over an actual world map (which is offered by Indoor Atlas).
  • The ability to generate a map of the indoor space dynamically (which is offered by Estimote).
  • The ability to raise a signal (email, text, etc.) when someone enters the navigable area (which is offered by both Indoor Atlas and Estimote).
  • The ability to generate an account with the vendor’s services programmatically (which I could see as being helpful to developers who want to incorporate these services into their own products).

Maybe I’ll create a sample and upload it to Github in the near future, as a more verbose example.

On a side note to this new standard, we can leave out some of those confusing and misspelled messages that are generated by Apple’s Location Manager. Who comes up with these things?

Further Experiments with Bluetooth

So, while I’m still waiting for the Android port to Raspberry Pi, I did a little more homework, making some notes about calculations that would eventually be needed for my project. More importantly, I started to reconsider the whole idea of having a platform-dependent solution to the sensor for the Haunted House game. I mean, there has to be something out there, right? Something cross-platform like JavaScript…hey, wait, Google has the Bluetooth Chrome API! That’s perfect!

Excited yet again, I started to look over the documentation, and I started to play with the samples. So far, so good! Using the API, I could detect my smartphone when the Bluetooth was enabled…and I didn’t even need to pair it! Now I just needed to read the RSSI value on the Device class, and using the calculations, I could create a proper sensor capable of estimating the distance to nearby Bluetooth devices. “Wow, this is going to be so great!” But then I noticed that there was no value being populated for that property. Hmmm…oh well…maybe I wasn’t doing something right. So, I poked around on the Chromium site, and it revealed to me…oh no…

So, the developers of the API are debating how to properly set the RSSI value for the signal, since there are nuances when it comes to the detection of the device. They haven’t come to a consensus yet…so, for now, the property remains unpopulated. Which brings me back to square one yet again.

I’m getting used to the disappointment…

…but the darkest hour is just before the dawn! And that was confirmed when I found Estimote while preparing to jump off a cliff. My manic ride is at an apex once again, and I’m looking forward to performing a few experiments in the future.