XML, Violence, Nokogiri and Xpath

I love Xpath.

 It makes XML easy to use and easy to query.  Gone are the days of parsing things with a SaxParser unless you’re really hard up for control of you text.

 

Also, I love the Ruby Nokogiri Gem.  

XML is like violence – if it doesn’t solve your problems, you are not using enough of it.

– Nokogiri docs

But I do have to say that there is a lack of good examples and documentation for anything particularly advanced.  I found a working solution to my issue, but thought I’d paste here what I wanted to do versus what I ended up doing.

Given the following XML, 

 

https://gist.github.com/1577136

 

I’d like to grab two elements that include “Cover” in the tag, and then operate on each of them.

Nokogiri’s use of Xpath easily allows the first query expression like so: price_xml = doc_xml.xpath('Container/Set/*[contains(name(), "Cover")]')

I’ve selected all the elements (using *) in Set, and then used an Xpath Expression function:

contains, in order to specify that Adult must be in the name.  This returns two Nokogiri XML Nodes in Nodeset.

 

What I wanted to do was then select one of these elements based on a pattern in the tagname use my favorite tool, Xpath.

But I just couldn’t get Nokogiri to give it to me, and several solutions ending up selecting way more than the 1 element I wanted. (Because the nodes in the Nodeset still contain relationships with their parents)

https://gist.github.com/1579343

 

I’m cross posting this on StackOverflow as a question, just in case any Nokogiri Xpath enthusiasts want to recommend a solution that doesn’t resort to find()

 

 

Rails Rake tips

longviewcart
rake tasks

Here’s a tip:
If some of these tasks are actually “private” tasks that only get called by other Rake tasks,
leave off the “desc” line in the task definition.  And Voila, your rake -T list will get a lot cleaner.

 

Thanks to Erik Debill for his nice post on Rake and some Advanced Tips for using it.

Judge Not lest Ye Be Judged – Gigabit Google Challenge

UPDATE:  So the BIG news I had was about Gramercy Private Equity’s prize offer.  See below the fold for details.

Google is rolling out its experimental Gigabit Broadband Fiber  networks across the country in lab like experiments.  One of the early locations is Kansas City and Think Big Partners, a local Incubator that has sponsored a Business Plan competition worth $100,000K.  The Gigabit Challenge.

I’m honored to be a part of the judging panel with some very esteemed folks from  both local KC Enterprises like the Kauffman Foundation, and National players like Silicon Valley Bank.

Continue reading “Judge Not lest Ye Be Judged – Gigabit Google Challenge”

childless white men at work – hipmunk.com

Every now and then I look at meaningless numbers posted from Alexa.com about Travel companies and user growth etc.

But I always love the descriptions provided:

 

http://www.alexa.com/siteinfo/hipmunk.com

 

Statistics Summary for hipmunk.com

Hipmunk.com has a three-month global Alexa traffic rank of 15,128, and approximately 62% of visits to this site consist of only one pageview (i.e., are bounces). Visitors to the site view an average of 1.2 unique pages per day. The site is relatively popular among users in the cities of San Francisco (where it is ranked #1,325) and Seattle-Tacoma (#3,117). Compared with internet averages, Hipmunk.com appeals more to Caucasians; its audience also tends to consist of childless, highly educated men earning over $60,000 who browse from work.

 

 

 


 

Startup Bubble Peaking or Popping

With all the talk about the Startup Bubble, one thing that is commonly lost is that we are MEASURING the wrong thing.  If we’re talking about the Startup Financing bubble, then by all means, it’s likely to peak, or even pop, and soon, for sure.  But today’s startups don’t require the same amount of capital to get going that the 1st DotCom Bubble’s startups did.  It’s easy to bootstrap with low cost Cloud Computing, Software frameworks that let developers get apps or sites up & running in 1-3 months, and a new clean startup philosophy to focuses on Customer Development and Iterating (pivots) over large scale tech investment.  These days, you never hear the phrase “If you build it they will come”.

This graph is from WallStreet Journal post in 2010 this time last year.  Of course it can’t/couldn’t continue.  I’ve read at least 10 posts in the last 2 weeks about the investment market peaking for Startups, and at a minimum that the Valuations are beginning to stablize or even decrease.

Now I’m glossing over an important point: which is

Nevertheless, the global economic slowdown, which already has begun according to America’s recession arbiters, will hurt sales at companies both large and small.

There is a lot to be said for this theory, but when you’re labor costs are 1 developer, 1 business development/sales guy, and 1 designer (most of whom are under 25 and don’t have families) then it doesn’t take a large amount of Revenue to keep the growth going strong.

It’s true folks, there is less money being sloshed around today than there was a year ago, and that’s going to have an impact on your next Round.  Blogger’s typical end this kind of post with “Better go get that money now!”

I’m with Fred Wilson.  Pay less attention to the money issues, and go build your damn product.

Google Flight Search User Review

Caveats
Travel Data
As Co-Founder and former CTO/Designer of Everbread and it’s Haystack Flight Shopping Engine, I’m midly qualified to speak and pontificate about Air Travel Technology.  There is a lot more that I don’t than I do, but I guess I know enough to understand some things.
Usability
On Usability, UI, UX and other forms of the Field’s name, I’m a novice. I know what I like, and I can observer what tends to work well and intuitively, but I’m no Jason Putorti
Travel Business
Anything I have to say on this topic is extremely likely to be wrong, incorrect, fallacious, idiotic, and misguided. But I won’t let that stop me.

Still. I wanted to take a crack at a couple of topics in this post which is a reaction to Google’s New Flight Search Tool, released Tuesday.

TravelData

The first ingredient in producing scheduled flights with a ticketable fare is pure computational magic. I’m not giving away any secrets in this section. Most of what I can say is well known at least “inside” the Travel Tech sector.  ITA’s primary Data Service product QPX is built in LISP and is pretty fast at computing the set of possible fares that can apply to the possible flight itineraries, and then validating the complex dynamic rule set needed to allow a fare to be shopped.  (No point in showing a fare that can’t be ticketed now is there??)  

For those of you who aren’t familiar with just how difficult this is to do, correctly, and completely, here’s the simple version , there’s a slideshow produced by Carl de Marcken from ITA Software that is often referred to.  I’ve added a second source in case it ever gets pulled from the ITA site.

Anyway, the 2nd Ingredient needed to Display Ticketable Flights with Fares is Seat Availability Data.  QPX is completely depenednt upon ITA’s DACS system which requires Airlines to participate in a sensitive data sharing relationship with ITA.  Only a few of the world’s airlines do and most of these are US Domestic and are sharing data only for US Domestic Routes (Continental is one of ITA’s Chief partners in this area).

As a result, currently QPX is reputed to work more perfectly with US Domestic Itineraries, and certainly given ITA’s customer Base, it will have an up-to-date fresh Cache which is US Domestic Flight Centric.

The 3rd Ingredient in this Data Shuffle is The Results Cache. It’s the industry’s dirtiest little secret. All of the GDS’s use a results cache to manage load.  Why shouldn’t they?  If You just asked for Fares from LAX-BER 10 seconds ago, the likelihood that the answer has changed in those 10 seconds, is very very low.  Unfortunately most of the Fare Shopping systems have much higher loads than they are designed for, and also most queries produce little actionable revenue (A lot more people Shop for Fares than actually book them) so there is a lot of weird black magic in deciding whether to compute an answer or serve and old answer.  As for speed, this is where cache’s excel better than pure computation and this is what you’re seeing on the Google Flight Search system.  It may be a Cache this is being constantly refreshed (if someone is seeing data change on the results page after the Query has been completed, please point it out, that would indicate that the page is getting relatively fresh updates from the Cache).

Baking a Cake There are a lot more factors that I’m not prepared to babble on about, but private fares, and point-of-sale issues will also have an impact on the quality of the fares being shown, most of which is driven by the seat availability cache. There are many ways to bake a cake and different spices you can add to make it tastier. However, most people are happy with good cake, it doesn’t have to be heavenly to eat it.  After all, it’s still cake!  That’s my way of saying that it’s certainly not obvious that Google’s Flight Search product will produce cheaper or more convenient fares/flights than Travelocity, Orbitz, Expedia, or Kayak for that matter who are using a mix of Data providers, not just ITA.  Furthermore, the system is not going to perform at it’s best for international and non-US Domestic flights until ITA addresses that in it’s core product offering (QPX).  

Still, for a first effort, it’s an amazing solution (it really is FAST) that produces a wide range of results and will likely satisfy the airfare shopping needs of a majority of customers who may not going to shop beyond the search and click. (More on this later)

Should Everything be easy enough for a Caveman?

Do you really want just “anybody” contributing your crowd-sourced FACT based repository of Knowledge?  I know it’s a bit out of fashion these days, but when I was growing up, it was okay to differentiate between folks who were a little bit smarter than other folks.  We had “Gifted & Talented” programs in public schools (horror!) and we played soccer matches where ONLY THE WINNERS got Medals.  Such programs are now referred to as Elitist, which is true, they seek to identify the Elite, which in my day was something you wanted to be.  I remember winning a speech at the Rotary Club on the topic: “Expect the Best, Be the Best”.  When I won, I felt Elite, and it was a good feeling.  Anyway, that’s another story.

 

 

I recently saw a piece about Wikipedia and Jimmy Wales.  The piece was about how Wikipedia was struggling to retain users, and Jimmy was musing on how to make contributing easier.  The part that struck me was this quote from the piece:

 

Over the years, Wikipedia has often been criticized for having a very convoluted and technically complex way of editing articles that doesn’t just involve learning the arcane markup language the site uses, but also navigating the politics of editing on the site. For beginners, this is a very high barrier of entry that some earlier projects were supposed to fix 

What????  So wait, I know I’m techincal, but look, it’s GOOD that some things are difficult.  If Heart Surgery was easy, then just Anybody with no real discipline, no significant intelligence, or worse, no commitment to quality, and effort could become a Heart Surgeon.  Do you want your Heart to be operated on by the same quality of person who is mostly qualified to be a ditch digger?  If you do, that’s good for you.  But I prefer my Martini’s to be made by someone who knows how to make one, and I prefer my community sourced Encyclopedia of Facts to be written by people who can at least spend 15 minutes to learn: ==Section headings== or ”italicize text”, or ”’bold the text”’. 

I’m saying that a standard that requires contributors to understand that entries need citations, or that “opinions” are only allowed when referencing a controversy and only then by citing an establsihed source?  That’s not too high a requirement, that’s just enough.  How is this any harder than expecting high school student to learn the 5 paragraph 3 topic essay format?  If they can’t do that, I’d prefer they don’t contribute to WIkipedia.

Making Wikipedia as easy to post to as Facebook is, that’s a recipie for WikiSpringer.