-
YAML formats are not lossless
Recently I was using YAML files to package up content to send across the wire. A part of this included a checksum and signature of the contents to ensure that it wasn’t modified, and that it came from a trusted source.
I ran into an issue where some of the content wouldn’t sign properly on the receiving side. A lot of digging later turned up that parsing yaml content in ruby with YAML::load will strip empty lines, depending on what type of quotes you use when writing the file. WHAT?
require 'yaml' File.open('strip.yaml','w') do |f| f << {:content => "\ncontent\n \nmore content\n"}.to_yaml end data = YAML::load IO.read('strip.yaml') puts data # this will produce: {:content=>"\ncontent\n\nmore content\n"}As you can see the empty spaces on the second line were stripped out. Doing a signature check on the client is disastrous if the trusted source signed it when the content had those missing white spaces.
However, doing a little tinkering along with the help of VI’s syntax highlighting I found out that using single quotes around the content we are writing to file gives us completely different behavior:
require 'yaml' File.open('strip.yaml','w') do |f| # notice how the content string uses single quotes now f << {:content => '\ncontent\n \nmore content\n'}.to_yaml end data = YAML::load IO.read('strip.yaml') puts data # this will produce: {:content=>"\\ncontent\\n \\nmore content\\n"}Notice here how the spaces are retained, but we also don’t have true new line characters either. So there are really two things at play. One, the type of quotes you use makes a difference (even if the content isn’t using the #{} operators). Two, YAML::load doesn’t always respect your data, especially lines with nothing but white space.
-
Ruby 1.9.1 modifies request headers
I just ran across something in Ruby 1.9.1 that definitely violates the least-surprise principle (at least to me). It is in Net::HTTP#get.
def get(path, initheader = {}, dest = nil, &block) # :yield: +body_segment+ res = nil if HAVE_ZLIB unless initheader.keys.any?{|k| k.downcase == "accept-encoding"} initheader["accept-encoding"] = "gzip;q=1.0,deflate;q=0.6,identity;q=0.3" @compression = true end end request(Get.new(path, initheader)) {|r| if r.key?("content-encoding") and @compression @compression = nil # Clear it till next set. the_body = r.read_body dest, &block case r["content-encoding"] when "gzip" r.body= Zlib::GzipReader.new(StringIO.new(the_body)).read r.delete("content-encoding") when "deflate" r.body= Zlib::Inflate.inflate(the_body); r.delete("content-encoding") when "identity" ; # nothing needed else ; # Don't do anything dramatic, unless we need to later end else r.read_body dest, &block end res = r } unless @newimpl res.value return res, res.body end res endAt the top, you can see it sets “accept-encoding” in the headers if it isn’t set. When I didn’t expect this, my data source packed the response and the feed broke. I was trying to parse a DOM from it and it failed. Hard setting the accept-encoding header to UTF-8 fixed this issue.
That assignment also modifies the header parameter passed in from the outside scope, so any redirects will have this problem too. I don’t think it should do this, but I haven’t found any open tickets. Has anyone else run into this?
-
Thick Pipe Example: Ruby Code Storage
In my previous post about laying thick pipes, I talked about the importance of writing code that doesn’t do just one thing, but enables more possibilities of other pieces of code to do a bunch of stuff based on input.
Laying a thick pipe is usually different than over-engineering in that it rarely requires an object oriented approach to do it. An example of over-engineering a project would be making a Person class, then a EducationPerson, then a Teacher and a Student. Its obvious that Students and Teachers would have some common base, maybe even Person - but more than likely (if its a software system) it would be “User,” which implies some actual benefit and functionality. You certainly wouldn’t create a LivingCreature class in this case.
So the point of laying a thick pipe isn’t to increase the complexity of the code, but to keep the code terse (short) while providing opportunities for other pieces of code to work at a good level of abstraction.
A recent example in my daily work was being able to know what code to execute from a record stored in a database. In my case, this Rails app could have easily stored a “code_type” column that was nothing more than a string used as a key for code destination. Maybe something like “user_foo_creator” or “shoot_badguys_with_gun” or something.
What the code would probably do is take the value from the database, do an if-else or a switch on the key, and then grab the object necessary to execute.
The problem with this is the only thick pipe is the storage mechanism - the string in the database. The minute I wanted to add a new execute point, I’d have to change the dispatching code. To make that approach even less feasible, I couldn’t know - at any point in time - what modules were in my application as it could be changed by the user during run time. The dispatching mechanism would need to know all cases, and I couldn’t do that.
So the types of things I was storing was the fully resolved (namespaced) module name for the execution of a background task. Something like “CodeBucket::MyTask” or “CodeBucket::FooBarCreator::CreateFoo”. In the spirit of a “thick pipe,” I would need a very small piece of code that could take a string from the database, get the correct object, and call my execute method on it. It ended up looking something like this:
########################################### # get_module # Given a string, like "CodeBucket::SomeNamespace::SomeModule", # get that "SomeModule" so you can do something with it. ########################################### def get_module(database_string) namespaces = database_string.split('::') cursor = Object::const_get(namespaces.shift) namespaces.each do |item| cursor = cursor.const_get item end cursor endA Rails controller action can find the correct task by id and run it in a one liner now (code_path is the column where the string is stored):
class ExecutionController < ApplicationController def execute_task_in_foreground get_module(ExecutionTask.find(params[:id]).code_path).execute_me params end endNoticed how I passed in params to the #execute_me method. This is another thick pipe. Now ExecutionController#execute_task_in_foreground can run any piece of code with any number of parameters.
If I wanted to, I could allow myself to store strings like “CodeBucket::MyClass#my_method” and create another interface similar to get_module that understood the “#” notation and actually executed the method specified. In my current example, the controller expects that #execute_me exists on the module that was retrieved. This makes sense as the controller provides the context keeping get_module generic. However, if I found myself writing lines like that all over the place with different methods being called on the modules grabbed, I’d probably want to store my method in the database column and write a generic facility to execute it.
I’ll leave that as an exercise for the reader.
Go lay thick pipes!
-
Lay Thick Pipes
One thing that has becoming increasingly important to me over the years when writing software was architecture and design. At first, there was no “architecture” in my code in any traditional sense. It was not laid out in any deliberate manner and once I got it working - it stayed.
Of course, its obvious here that the project quickly becomes a disorganized mess that is almost impossible to bug fix, test, or extend without introducing a million bugs along the way.
One thing I see time-and-time again is the lack of “thick pipes” to help combat these sorts of complexities. What is a thick pipe? To understand what a thick pipe is, we need to understand what a “thin” pipe is.
Here is a Javascript example of an API that searches for Foos. It will create a request and get back a JSON object that contains the Foos from the server.
var findFoos = function( bar, baz,callback ) { (new Ajax.Request('/foo/search', {method: 'get', parameters: {search_bar: bar, search_baz: baz }, onComplete: function(transport) { var myFoos = transport.responseJSON; callback(myFoos) }});) }Now for the sake of simplicity this great. This is the “less or more” or “less is less” mentality. This API allows you to search for foos by its bars and bazzes. Awesome.
But what if in the next revision (there is another release, right?), the Foo has a weeble property now. What changes? The API does, the code in the API does, the server code probably does too, along with every existing call to findFoos in the code base. Using this mentality, your API interface would look like this:
function findFoos( bar, baz, weeble, callback );But now, what if you don’t necessarily care about the bazzes for a specific part of your app, only the bars and weebles. Uh-oh, now your API calls start looking like this:
findFoos( 'the_bar', null, 'the_weeble', callback );What does null mean? What do the other parameters mean? If a new developer comes into the project and code like this is all over the place, they’ll spend a lot of time searching instead of reading. Even worse: every time the “Foo” object changes you will have to go through this mess and change all of the code it touches.
How to lay a thick pipe
All of this could have been avoided from the beginning. Instead of philosophically asking the question “how can I search for Foos on their bar and baz”, you could have asked “How do I search for any object on any number of parameters?” An answer to the first question does something, an answer to the second question enables you to do a lot of things. This is a thick pipe.
var search = function( objectName, parms, callback ){ (new Ajax.Request('/search', {method: 'get', parameters: parms.merge({objectType: objectName}) , onComplete: function(transport) { callback(transport.responseJSON); }});) }Notice how if you add another object to the server database you won’t have to change the API (or create a newObjectSearch() method). Notice how, regardless of what parameters you use or need to search by, the API doesn’t need any more parameters as you’ve passed them in as a hash. Look what this does for code readability in the case where we don’t care about bazzes.
search('Foo',{bar:'the_bar',weeble:'the_weeble'}, callback ); // we can also search for other objects with this search('Report',{id:5},reportCallback); search('Comment',{author:'skottie'},commentCallback);Once we’ve added these other objects and tables to the server-side, the client API enriches itself out of the art of the possible, without any extra code.
Through this example, its obvious that a “thick pipe” is a lot like abstraction. However, with abstraction alone you can still have 3-4 layers with a lot of very “thin” pipes. If one thing changes in your code and you find yourself changing a handful of interrelated pieces, your pipe is probably not thick enough.


