1. Apr 26th, 2006

    SOA integration with Flickr and del.icio.us

    This article discusses the underlying architecture and lessons learned from our SOA initiative to integrate our SaaS partners with our content publishing system.

    Introduction

    Over the past year we came to realize a growing need to provide our content consumers with a hyper-media experience that will enrich their user experience and provide us with growing traction in the industry.

    As such, we embarked on an SOA strategy to integrate our content publishing system (here) with an image lifecycle and provisioning service (Flickr) and an hyperlink management and intelligence service (del.icio.us). This article explores the underlying integration architecture, some of the lessons learned and concludes with a summary ROI.

    An Image is Worth a Thousand Words

    We begin with our image lifecycle and provisioning service. After conducting extensive research in this emerging field, we have decided to partner with Flickr Systems. A leader in the Gartner magic quadrant, Flickr proved to meet all our requirements for publishing, discovery and lifecycle management of images. The vendor also provides a best-of-class free repository of readily available stock imagery (here). In addition, we were impressed with their ability to scale both vertically and horizontally, dealing with images of various sizes and quality.

    We were a bit concerned about their ability to address the ever growing enterprise storage needs and their long-term viability, all important factors when selecting an SaaS partner. However, the recent acquisition by Yahoo Inc has convienced us that they will remain in business for a long time, and we consider the partnership to be low risk and successful from the get-go.

    In spite of that, we were not able to secure a satisfactory service level agreement that will meet our requirements for 24×7 accessibility. As such, we decided to offload image serving capabilities from the vendor and provide a replicated storage using our own content repository. We planned for a 3-month turnaround, but the task proved easier than we originally anticipated, and we managed to complete it in-time and under-budget.

    During the content publishing phase we acquire universally unique identifiers (URLs) for each image item. The vendor has agreed to provide these free of charge. We then feed these identifiers to our content publishing service using the UploadImage wizard. Once entered into the system, our content publishing service makes a Web service synchronous call to the image management service and acquires an identical copy of the image, which is then stored locally in our content repository.

    In the next step, a content author would identify the image item in the content repository and with the simplicity of drag & drop assign it a location within the content item (the post). Underneath, the content repository establishes a reference between the content item and image item and persists it as metadata. Once published, the content item and image item are accessible to content cosumers 24×7×365.

    Towards a Better Dashboard

    Integration with the hyperlink management and intelligence service proved to be more tricky. As before, our vendor selection criteria focused on breadth of features, scalability, customer success stories and long-term business viability. We have chosen to partner with DLCS Industries (aka del.icio.us). We were extremly impressed by their strong presentation during our last executive retreat and the traction they have been gaining in the industry.

    The most interesting aspect of this service is the real-time dashboard which provides visibility into the most recent hyperlink acquisition activity, with drill-down capabilities. We have decided to make the dashboard accessible to all our content consumers.

    Initially, we provisioned the dashboard as a remote service accessible over a synchronous protocol and integrated into our content portal (example). As a result content consumers were able to access the dashboard directly from our content service without redirection. The compelling business reason was to allow us to retain a single brand identity over both content items and the dashboard, while using best-of-breed disparate services.

    Unfortunately, we have miscalculated the lack of protocol and format support in the industry. We were not able to use this approach to integrate the dashboard into our asynchronous event stream (RSS). It turns out that content consumer user-agents do not support the JavaScript extensible markup format, and the dashboard data was lost in the transformation phase. As a result, we set looking for a different strategy.

    It so happens that DLCS incorporates their own integration message bus. Instead of pulling intelligence data from the service on-demand, we went with a loosely coupled architecture We provisioned the intelligence service to create a daily report of recently acquired hyperlinks activity, and feed it to our content publishing service directly. We have conducted several field tests to ensure the content provided in the report is identical to the real-time dashboard. We have also conducted a focus group and concluded that providing these reports once every 24 hours is acceptable to our user base.

    Once a day the DLCS service will query its own repository and retrieve all recent acqusition records, create a report and issue a synchronous WS-Blog request to our content publishing service. Our content publishing service was modified to automate the process of creating a new content item, persisting it in the content repository and making it available through the content server. The entire transaction completes in under 10ms, giving us confidence it can scale to larger volumes.

    Since the reports are published alongside all other content items, they are available as part of the event stream (RSS). The content server will also broadcast an asynchronous notification event using WS-Pingomatic, and will make the content accessible through the major content directory and discovery services (Technorati, et al).

    Our Technology Stack

    Although worthy of a separate article, I would like to point at some of the technolgies used in the project. We picked a messaging bus that uses the industry standard WS-HTTP and the emerging WS-Blog protocol. The use of open schema formats allows us to easily plug-in RSS 2.0 as a layer on top of WS-HTTP without pushing updates to the clients.

    For our content repository we have elected not to go with RDF and instead use the more mature and widely supported Semantic SQL. This proved useful when we transitioned from self-built MySQL to RPMs with only an hour of downtime. We use HTML as the preferred content format with extensions for images and links. Our templating system is homegrown but heavily utilizes CSS.

    Since our servers are located off-site (outsourced) we have standardized on an end-to-end security infrastructure using SSL and perform authentication and authorization using htpasswd Enterprise Edition.

    Conclusion

    Nothing speaks better than numbers. Although the integration project was no small feat and required some external expertise, the ROI has been positive from day one. By utilizing our SaaS vendors we were able to use best-of-breed solutions, improve our efficiencies and maximize our bottom line. Since the public roll out of the project one year ago, the number of trusted registered content consumers (as measured by FeedBurner) has increased ten fold from 10 to over 100 and still growing strong.

    I hope this article has been useful to you in understanding how we harnassed the power of SOA and our SaaS partners network to integrate disparate content management services under a cohesive platform providing a seamless user experience resulting in a ten-fold increase in market penetration.

    Resources

    1. Apr 27th, 2006

      Eran

      I’d be interested to learn how you handled the (rather impressive) increase in consumers. Were your systems able to withstand the new load and were any changes made to the system in order to improve scalability?

    2. Apr 27th, 2006

      Assaf

      That’s a topic for a whole new post.

      But the short answer is, we did capacity planning upfront and designed the content server so it can scale massively by deploying an array of redundant, load-balanced, self-healing, on-demand mod_php scripts.

      We then deployed it over an established, wide-scale, fault tolerant network which we outsourced from TCP Inc. We were very pleased with the results of our interoperability testing. Those people have some good IP, they’re innovators in their field, and they charge reasonable price.

      One thing we did have to tackle when traffic went up is image size. It appears that large images create a substantial load on the framework and its sub-components. Through our network of value-added partners and consultants we were able to establish that images come in varying sizes, and provided our staff with on-site three-day training to disseminate best practices throughout our team.

      And as always, our numbers speak better than words.

    3. Apr 27th, 2006

      links for 2006-04-28

      [...] Labnotes » Blog Archive » SOA integration with Flickr and del.icio.us (tags: web2.0 flickr delicious) [...]

    4. Apr 27th, 2006

      Hugh Winkler

      Wow, this WS-HTTP thing is so enterprisey!

    5. Apr 27th, 2006

      Michael

      This post is a joke right? Tell me it is, please.

      “Through our network of value-added partners and consultants we were able to establish that images come in varying sizes”

      No way! Who are these consultants, I think I need to give them some of my money!

    6. Apr 27th, 2006

      Lee Provoost

      haha good one! it perfectly addresses the issue of the plethora of WS-* standards and other crap that makes this whole SOA thing overbloated :-)

    7. Apr 28th, 2006

      Intégration SOA avec Google reader et SimplePie at Aurélien Pelletier’s Weblog

      [...] La lecture de ce retour d’expérience m’a ouvert les yeux: SOA integration with Flickr and del.icio.us. Comme Mr Jourdain fait de la prose, je fais de la SOA sans le savoir avec ce blog!! [...]

    8. Apr 28th, 2006

      Aurélien

      I did met and solved the same problem concerning the different image sizes. But I’ve met another issue and could not solve it.

      How do you deal with the fact that images can be in black and white but also in colors from 16 to 16 millions?
      How do you scale up to so many colors?

      I’m impressed with your numbers.
      If you could put me in touch with your consultants, I would like to know what they can do me for. I’m sure they can solve this issue. My consultants are too cheap and they can’t fix this.

    9. Apr 28th, 2006

      Assaf

      “How do you deal with the fact that images can be in black and white but also in colors from 16 to 16 millions?
      How do you scale up to so many colors?”

      In a heterogenous environment you wouldn’t run into this problem. We employ a grid of interconnected systems and cluster the work-load based on the capabilities of each node. For black & white imaging we use dedicated interconnected Classic Mac servers, while high depth color imagry is off-loaded to a farm of servers running 64-bit CPUs.

      Our consultants come highly recommended. To quote their Web site, they are “the leading provider of solutions for the interpersonal enterprise, bridging the gap between user generated content and the asynchronous Web platform, thereby solving the user’s tagging problem.”

      Their management team I’m told has over 40 years of combined experience making value statement propositions and formulating block diagrams.

    10. May 5th, 2006

      sockdrawer » Blog Archive » joke for a Friday afternoon….

      [...] SOA integration with Flickr and del.icio.us Some of this is too close to the bone….excellent! Found via Scott [...]

    11. May 15th, 2006

      Andrew

      I read this article in the hope to understand how to leverage SOA, and found it utterly useless. I understand the need for numbers, but your article provides too much data to be able to put it together, expecially for people with no technical background. Also, I do not see any sinergies mentioned: I am sure as I used the “Find…” functionality of my Web Browser.

    12. Jul 6th, 2006

      Incremental Operations » Inane patents and other things

      [...] I also have great admiration for whoever wrote this piece “SOA integration with Flickr and del.icio.us“.  Read it if you haven’t already.  Digest it.  Memorize every word.  This will answer all your questions on SOA, ROI, Web 2.0 and SaaS.  Really. [...]

    13. Dec 23rd, 2006

      Labnotes » New Year ReSolutions

      [...] Architect better [...]

    Your comment, here ⇓