Latest Tweets:

*1

Gitbits.

I’ve been using git for well over a year, but I hardly know how to do anything beyond the basic pull, modify, commit, push, with the occasional merge.  Every time I think I need to do something else, I find that it’s possible, once you’ve figured out how to do it.

i.e. You want to get a particular version of a file:

git checkout <branch or commit> — <path to file>

Next task: rebasing something that I’ve already committed.

*1

Cross Browser Incompatibilities Suck!

Ran into a bug that only showed up on Firefox.  My dynamic select fields weren’t updating properly.  Fortunately StackOverflow users have seen this problem years ago.  Further searches in the YUI forums showed that the Y.Node.setContent() call I was using was internally an innerHTML which triggered the Firefox bug.

Original

Y.one("#id").setContent(node.get('childNodes'));

Fix

Y.one("#id").replace(node);

I already wasted a few hours debugging this before looking online. Lesson should be to search SO before debugging. If I had not found the solution online, it would have been days of debugging. This is why I never jumped into web development earlier.

Pseudo-functional testing with YUI3 Test

After working with Python, I’ve become a big fan of Test Driven Development.  I’m not a stickler for it - meaning I probably should write more tests than I currently do - but it’s clearly saved me a lot of time from debugging regressions, and it’s practically a necessity since there’s no compiler to check for errors for you.

Python, and particularly Django, has a great testing framework, which really gave me no excuses not to write tests.  However, on the web client side, I was not nearly as disciplined.  I’d read about Selenium, but it seemed complicated to set up, so I put it off.

This past week, after having written a fair amount of front end Javascript, I started looking for a testing framework.  I started out with very little knowledge of web app testing, and almost no concept of what a test framework should look like. My app operates like a giant state machine.  I store state data in the DOM nodes (more specifically, in the YUI Node objects that represent DOM nodes).  Button handler functions then do a bunch of stuff, draw stuff on the screen, and update the state machine data.  I imagined the testing I was looking for was similar to functional testing where I call a function with certain parameters and check the result - but in this case, I’d have to set up the page to a certain begin state, click a button and have the button handler run, and then check the state data at the end.

Even better, if I automated the steps to get to the begin state, I would have some extra testing.  This would be an ugly mutant hybrid of the functional testing and system level testing, and I wasn’t really sure what the best way to go about it was. 

Getting started with Selenium is easy.

I looked around, and two tools that seemed popular were YUI Test for functional testing (since I’m already using YUI) and Selenium for system level testing.  I figured I’d try using Selenium first, since it’s designed to script browser actions so I can get to my begin state.

This wasn’t a bad idea, since Selenium WebDriver setup for python was mindblowingly easy.  Just run pip install selenium and you’re ready to go.  I quickly wrote a test program following the example documents, importing the Selenium driver, and giving it a series of instructions.  Running the test program would launch Firefox, and I’d watch the browser navigate automagically to the begin state.  I just as quickly learned the limitation of this approach, which was while I could easily access the DOM elements, the fact that I stored data within YUI nodes made it inaccessible to Selenium, and I could not verify the state. 

Using YUI Test to script browser operations.

Next attempt was to use YUI Test.  It took me a bit of thinking to get started this way.  The examples in the documentation were some what abstract, and I wasn’t sure how to implement mimicking user input as a test case.

The concept is actually extremely simple.  I create a new Javascript function in which I instantiate a YUI Y.Test.Case. The test code simply calls a serious of of Javascript calls to set the values for various forms, an calls to  Y.one("#id").simulate("click") to click on buttons.  After that, I can check  the node data, using the YUI Test assertion functions.

If I include this test function in my HTML, it runs the test script.  For normal functionality, I simply don’t include the test function.

YUI Events are also much simpler than the docs imply.

At least, that’s the concept.  There’s two major caveats, which I solved using YUI custom events.  The first caveat is that my button clicks trigger Ajax requests.  Only after the Ajax request returns, do I set my result data.  Therefore the test case must wait for the Ajax request to succeed before checking for the result data.  In fact, I issue some Ajax requests on page load, showing a “loading…” icon until the requests have all succeeded.  This means I can’t even start my test until I know the first few Ajax requests have succeeded.

The second major caveat is that my test function runs inside a different YUI instance than my main app code.  The two YUI instances do not share node data.

YUI().use('node', function(Y) {
    // main app
    Y.one("#id").setData("example","foo");
});
YUI().use('node', 'test', function(Y) {
    // test script
    var example = Y.one("#id").getData("example"); // this returns null
}

Crap, it’s the same problem as with Selenium!

The solution to both of these problems is using YUI events.  First, I can instrument my main app code with events that indicate when Ajax results are returned, and send the events from the main YUI instance to the test YUI instance.

Second, I can include the main YUI instance as a parameter in the event.  When the test instance receives the event, it can pull out the main YUI instance, and use it to access the node data.  The sheer amount of event documentation made this look like a sizeable amount of effort, but really there’s just 3 small things to do.

  1. Declare the event(s) in the app function so that they are globally broadcast.
  2. Fire the event in the app function.
  3. Handle the event on the Y.Global object in the test function.

The solution, with the addition of the asynchonous YUI Test functionality, looks something like this

YUI().use('node', function (Y) {
    Y.publish('test-start', { broadcast: 2});

    // After initial Ajax load
    Y.one("#id").setData("example","foo");
    Y.fire('test-start', { Y: Y });
}
YUI({userBrowserConsole:true}).use('test','node','node-event-simulate', function(Y) {
    var appY;
    var testCase = new Y.Test.Case({
        name: "Example Test",
        testBody: function() {
            Y.Global.on('test-start', function(e) {  // When we get this event, the Ajax load is done
               appY = e.Y;
               this.resume(function() {
                   var example = appY.one("#id").getData("example"); // We can get the data now!
                   Y.Assert.areEqual(example, "foo");
               });
                Y.one("#button").simulate("click"); // DOM events like button clicks can be simulated fine through our instance of Y.  Don't need to use appY for this.
            }, this);
            this.wait();
        }
}

The test events do nothing when the test script is not included.  The main problem I have is that sending the Y instance with the “test-start” event destroys any potential security you might have for hiding your data within the YUI instance.  Last thing to do is to make sure that event isn’t fired on production code.

*1

Javascript DOM building performance.

I was a little curious about the best way to create DOM subtrees in Javascript.  I found one useful Stackoverflow question which suggests using jQuery to build a node is about 4x the overhead of just using a DOM function.

Since I’m using YUI3, I ran my own experiment using YUI3 A) templates vs other B) building a tree one node at a time.

I thought B might be faster since HTML parsing is kept to a minimum.  I was wrong; A was faster by almost a factor of 4,

A: 970 ms

B: 3860ms

The numbers may not be accurate.  This was timed using the Chrome profiler, and the numbers vary from run to run.  The performance difference varies from 2.5x-4x, but A is consistently faster.

While my test is not an apples-to-apples comparison with the aforementioned SO question, it’s interesting to compare some of the results.  The SO question test created 100,000 DOM div nodes, with performance ranging from 3500ms (jQuery 1.2) to 440ms (jQuery 1.4+, and actually, 340ms on my machine).  

My test with YUI3’s best case perf was 970ms for about 110,000 DOM tr/td/div/text nodes, which is about 1/3 the performance - but my test also includes template parsing and operations to build the nodes into a tree.  Seems to be not too bad.

Wasting my time while debugging on Google Chrome.

Wasted about an hour while debugging, based on two lame issues.

function () {
    var test = "hello world!"; 
    var test2 = "something else"; 
    function closureTest () { 
        console.log(test); 
    } 
} 

This isn’t my initial code, but it demonstrates my lameness.  Firstly, I expected this to display “hello world!” in my javascript console.  It didn’t show up.  Why?  There’s a bunch of buttons at the bottom of the console which say “All”, “Error”, “Warnings” “Logs”.  These are filters.  I had accidentally clicked “Error” at some point, so the console was only showing error message, not logs.  DOH!

Secondly, I expected test2 to show up as a closure variable inside the closureTest function, but it didn’t show up in the debugger.  Weird.  Turns out there’s a Javascript compiler optimization that leaves out unused variables.

http://code.google.com/p/chromium/issues/detail?id=110573

Twitter Bootstrap modal widths.

This blog has been pretty empty since I haven’t ran into any interesting issues, until this past week.  I got stuck for days being frustrated at Bootstrap’s modal dialogs.  They actually worked great for a while - they’re simple and they just work.  Once you need to configure them a bit, you start realizing the limitations.

The modals are designed to be small.  Most likely, they won’t go off the edge of the screen when you resize your window.  Which is great.  But I needed a wider modal.  The first hack was to add some CSS to my page to override the modal css.

    .modal: { width: 960px; left-align: -480px; } 

By making left-align=-0.5*width, the modal will remain centered with the new width.  However, this hard coded method works poorly when the user adjusts the window size.  If the window size is ever less than your width, you likely can’t reach the close button.

Next thing I tried was to adjust the width using jQuery’s CSS manipulation functions.

    $('#modalid').css({width:$(window).innerWidth()-10, marginLeft:-($(window).innerWidth()-10)/2});

This worked a bit better.  The modal would be sized to the window, but only at the time this function is called.  My first mistake was to call it when I initialized the modal.  The problem is that initialization happens when you load the page.  The user might then resize the browser.  When you actually launch the modal at a later point, the width may no longer be correct.  The fix was to make sure I set the size before modal.show(“true”) rather than at modal initialization.

Still this method creates a modal that’s the right size, but if the user resizes the browser window, the modal can still end up off screen.  My first thought was “Oh no, now I’ll have to create an event handler”.  But luckily, my brain wasn’t completely dead.  What if we use percentage widths?

   $('#modalid').css({width:"90%", marginLeft:"-45%"}); 

What do you know. That was all that was required.  Now the modals always fit on screen, and resize along with the browser.

*1

Google App Engine Key Only queries on Django-nonrel

Traced through the Django-nonrel code to figure out how to do an App Engine key-only query, which is a bit more efficient at times, especially when you only want something like a count.

queryset = Model.objects.only('id')

Like any other queryset, you can append a filter to it, etc. This works well with my previous post about getting a limited count.

Django-nonrel count() with limit on Google App Engine

Google’s App Engine’s has a Query.count(limit=<limit>)  method where the performance is based on the number of entities counted.  So the more entities in your database, the longer this takes to return.

You can short-circuit the count by including a limit, so even if there’s a large number out there, the call will return within a somewhat-manageable time frame.

The problem is that Django’s Queryset.count() method doesn’t allow a limit parameter.  Luckily there’s a hackaround:

queryset.query.high_mark = limit
count = queryset.count()
queryset.query.high_mark = None

With the default value of queryset.query.high_mark = None, it will run until it returns the full count, or potentially times out due to the large number of results.

Using MapReduce with Django-nonrel on App Engine.

A while ago, I had read that the best way to clean up App Engine’s datastore was to use the MapReduce API.  For one, you delete datastore entities in parallel.  Secondly, the datastore will return a maximum of 1000 entities per query, if you did it serially, you would have to loop through queries, potentially taking longer than the maximum execution time allowed for processing a single App Engine HTTP request.

Having changed my schema a bit, my Django app started failing when it loaded old data from my datastore.  I decided to try out the MapReduce to clean up some of the illegal old objects from my datastore.  I discovered that the MapReduce API doesn’t work well with Django models.  It turns out the InputReader classes provided with the API is uses App Engine’s python db API.  Fortunately, source is included, so I could write my own InputReader to map Django models instead of db models.

I left the API fetching entities, without converting them back to Django models.  This suited me well, since I was looking to fetch entities that wouldn’t properly convert to my new Django models anyways.  Here’s the code for the InputReader class.  I’ve tested it with App Engine SDK 1.6.2 (with the MapReduce bundle)

import djangoappengine.main

from django.db.models.sql.query import Query

from mapreduce.input_readers import AbstractDatastoreInputReader
from mapreduce import util
from google.appengine.datastore import datastore_query

class DjangoEntityInputReader(AbstractDatastoreInputReader):
  """An input reader that takes a Django model ('app.models.Model') and yields Keys for that model"""

  def _iter_key_range(self, k_range):
    query = Query(util.for_name(self._entity_kind)).get_compiler(using="default").build_query()
    raw_entity_kind = query.db_table

    query = k_range.make_ascending_datastore_query(
        raw_entity_kind, keys_only=True)
    for key in query.Run(
        config=datastore_query.QueryOptions(batch_size=self._batch_size)):
      yield key, key 

class DjangoEntityInputReader(AbstractDatastoreInputReader):
  """An input reader that takes a Django model ('app.models.Model') and yields entities for that model"""

  def _iter_key_range(self, k_range):
    query = Query(util.for_name(self._entity_kind)).get_compiler(using="default").build_query()
    raw_entity_kind = query.db_table

    query = k_range.make_ascending_datastore_query(
        raw_entity_kind)
    for entity in query.Run(
        config=datastore_query.QueryOptions(batch_size=self._batch_size)):
      yield entity.key(), entity 

The most time-consuming part of the project was trying to figure out the MapReduce API documentation. There’s a few versions that come up when I do a google search. It turns out, it’s all the same package, but the documentation comes from various dates.

The Mapper API is an older version that just covers the mapping portion of the pipeline.  This is what I used, and it still works.  It has an easy getting started guide.  The documentation, however, is outdated, yet is still contains details about the mapping portion that are missing in the newer documentation.  Ignore the old download which is still sitting around, use the latest MapReduce bundle which includes the old Mapper API.

The latest documentation covers the full MapReduce pipeline.  However, it just glazes over the entire pipeline at a high level, and isn’t very useful for actual implementation.

What you want to download is the latest MapReduce bundle from the App Engine SDK download page

*1

Reinventing the wheel - how I rebuilt django-smart-selects.

I needed to build a dynamic form that allowed users to select a country, state/province, and city.  I wanted the state/province field to only show valid states/provinces for the given country, and the city field to only show cities for the given state/province.

I certainly googled around, thinking many, many people must have done this before, but sadly I couldn’t find anything.  So I went and built my own.  It’s a version 0.1, but it works ok for my needs.  There’s only a few small tweaks I need to make to get it ship shape, though there may be some .  You can find it here.

https://github.com/dragonx/django-hier-ajax-form

Only after I was done, did I find that there was, indeed an existing package.  And it’s been around for years.

https://github.com/digi604/django-smart-selects

I’m clearly not very good at googling.

*16

Python decorators, docstrings and Sphinx

I’ve been writing a fair bit of code lately, and decided to take a break and do some documentation.  I decided to use Sphinx to autogenerate HTML documentation from python docstrings.

It works pretty well, except I noticed there were key methods that did not show up in the auto-generated documentation.  Comparing them against the methods that appeared properly quickly revealed that somehow methods that had a decorator did not show up.

A search on StackOverflow gave a good description of the problem.  Essentially, Sphinx uses introspection to check function.__name__ and function.__doc__ to pull out the docstring.  A function decorator creates a new function from your code, and the new function has its own __name__ and __doc__.  So in your decorator, you need to ensure that you copy the original function’s __name__ and __doc__ to the function created by your decorator.  functools.wraps() simplifies this.

My solution wasn’t quite so simple.  I had used a class decorator instead of a function decorator.  Simply copying __name and __doc__ over didn’t work, since the class decorator returns a class instead of a function, and Sphinx was looking for functions to document.  It looks like the solution to this is to wrap the class decorator inside a function decorator.  I didn’t strictly need to use a class decorator, so I rewrote it as a function decorator and things went along ok.

I’m glad I had a nice suite of testcases to verify that rewriting the decorator didn’t break anything.  Starting to become a big fan of Test Driven Development.

*4

Running Django-nonrel in a shell on App Engine.

I’m not sure why setting PYTHONPATH doesn’t work on my system.  Instead I have to:

import sys
sys.path.append('')
import djangoappengine.main

This sets up the shell so that it works like python manage.py shell.  At this point I can import my project’s modules, including models for DB queries.  This also works for getting django set up inside App Engine’s remote_api_shell.py

*10

Format a JSON file in vim

Having a Linux dev environment makes things easy.

:%!python -m json.tool

(Source: blog.realnitro.be)

Django-nonrel and dev_appserver.py

When I first had Django-nonrel up and running, starting the local test server using either “python dev_appserver.py <project>” or “python manage.py runserver” (from inside the project folder would work equivalently.

At some point, the behavior changed, the two commandlines would both launch the test server, but the data sets that appeared were different.  Clearly they were using separate datastores, but I never really digged in to see what the difference was.  I just switched to using “python manage.py runserver”, because the dumpdata and loaddata commands are pretty handy.

Today I was attempting to wipe the datastore, and ran into some roadblocks.  The commandline “python manage.py runserver --clear_datastore” didn’t work.  The dev_appserver.py version “python dev_appserver.py --clear_datastore <project>” worked fine, but it only clears the dev_appserver.py datastore.

Turns out, using manage.py to launch the datastore uses the datastore in <project>/.gaedata/datastore, while using dev_appserver.py puts the datastore in /tmp/dev_appserver.datastore.  While I haven’t gotten manage.py to work with --clear_datastore, you can user dev_appserver.py to clear your datastore by specifying the datastore path:

python dev_appserver.py --datastore_path=<project>/.gaedata/datastore --clear_datastore <project>

Benefit is now I can launch my projects using dev_appserver.py again. Oh, you can also just delete the datastore file.

*58

ssh escape sequence

~

The ssh escape character is ~, after a newline.  ~B sends a break, ~^Z suspends the session, ~? gives a help screen.  I keep forgetting it so it’s best to note it here.

(Source: lonesysadmin.net)