This article discusses the underlying architecture and lessons learned from our SOA initiative to integrate our SaaS partners with our content publishing system.
Introduction
Over the past year we came to realize a growing need to provide our content consumers with a hyper-media experience that will enrich their user experience and provide us with growing traction in the industry.
As such, we embarked on an SOA strategy to integrate our content publishing system (here) with an image lifecycle and provisioning service (Flickr) and an hyperlink management and intelligence service (del.icio.us). This article explores the underlying integration architecture, some of the lessons learned and concludes with a summary ROI.
An Image is Worth a Thousand Words
We begin with our image lifecycle and provisioning service. After conducting extensive research in this emerging field, we have decided to partner with Flickr Systems. A leader in the Gartner magic quadrant, Flickr proved to meet all our requirements for publishing, discovery and lifecycle management of images. The vendor also provides a best-of-class free repository of readily available stock imagery (here). In addition, we were impressed with their ability to scale both vertically and horizontally, dealing with images of various sizes and quality.
We were a bit concerned about their ability to address the ever growing enterprise storage needs and their long-term viability, all important factors when selecting an SaaS partner. However, the recent acquisition by Yahoo Inc has convienced us that they will remain in business for a long time, and we consider the partnership to be low risk and successful from the get-go.
In spite of that, we were not able to secure a satisfactory service level agreement that will meet our requirements for 24×7 accessibility. As such, we decided to offload image serving capabilities from the vendor and provide a replicated storage using our own content repository. We planned for a 3-month turnaround, but the task proved easier than we originally anticipated, and we managed to complete it in-time and under-budget.
During the content publishing phase we acquire universally unique identifiers (URLs) for each image item. The vendor has agreed to provide these free of charge. We then feed these identifiers to our content publishing service using the UploadImage wizard. Once entered into the system, our content publishing service makes a Web service synchronous call to the image management service and acquires an identical copy of the image, which is then stored locally in our content repository.
In the next step, a content author would identify the image item in the content repository and with the simplicity of drag & drop assign it a location within the content item (the post). Underneath, the content repository establishes a reference between the content item and image item and persists it as metadata. Once published, the content item and image item are accessible to content cosumers 24×7×365.
Towards a Better Dashboard
Integration with the hyperlink management and intelligence service proved to be more tricky. As before, our vendor selection criteria focused on breadth of features, scalability, customer success stories and long-term business viability. We have chosen to partner with DLCS Industries (aka del.icio.us). We were extremly impressed by their strong presentation during our last executive retreat and the traction they have been gaining in the industry.
The most interesting aspect of this service is the real-time dashboard which provides visibility into the most recent hyperlink acquisition activity, with drill-down capabilities. We have decided to make the dashboard accessible to all our content consumers.
Initially, we provisioned the dashboard as a remote service accessible over a synchronous protocol and integrated into our content portal (example). As a result content consumers were able to access the dashboard directly from our content service without redirection. The compelling business reason was to allow us to retain a single brand identity over both content items and the dashboard, while using best-of-breed disparate services.
Unfortunately, we have miscalculated the lack of protocol and format support in the industry. We were not able to use this approach to integrate the dashboard into our asynchronous event stream (RSS). It turns out that content consumer user-agents do not support the JavaScript extensible markup format, and the dashboard data was lost in the transformation phase. As a result, we set looking for a different strategy.
It so happens that DLCS incorporates their own integration message bus. Instead of pulling intelligence data from the service on-demand, we went with a loosely coupled architecture We provisioned the intelligence service to create a daily report of recently acquired hyperlinks activity, and feed it to our content publishing service directly. We have conducted several field tests to ensure the content provided in the report is identical to the real-time dashboard. We have also conducted a focus group and concluded that providing these reports once every 24 hours is acceptable to our user base.
Once a day the DLCS service will query its own repository and retrieve all recent acqusition records, create a report and issue a synchronous WS-Blog request to our content publishing service. Our content publishing service was modified to automate the process of creating a new content item, persisting it in the content repository and making it available through the content server. The entire transaction completes in under 10ms, giving us confidence it can scale to larger volumes.
Since the reports are published alongside all other content items, they are available as part of the event stream (RSS). The content server will also broadcast an asynchronous notification event using WS-Pingomatic, and will make the content accessible through the major content directory and discovery services (Technorati, et al).
Our Technology Stack
Although worthy of a separate article, I would like to point at some of the technolgies used in the project. We picked a messaging bus that uses the industry standard WS-HTTP and the emerging WS-Blog protocol. The use of open schema formats allows us to easily plug-in RSS 2.0 as a layer on top of WS-HTTP without pushing updates to the clients.
For our content repository we have elected not to go with RDF and instead use the more mature and widely supported Semantic SQL. This proved useful when we transitioned from self-built MySQL to RPMs with only an hour of downtime. We use HTML as the preferred content format with extensions for images and links. Our templating system is homegrown but heavily utilizes CSS.
Since our servers are located off-site (outsourced) we have standardized on an end-to-end security infrastructure using SSL and perform authentication and authorization using htpasswd Enterprise Edition.
Conclusion
Nothing speaks better than numbers. Although the integration project was no small feat and required some external expertise, the ROI has been positive from day one. By utilizing our SaaS vendors we were able to use best-of-breed solutions, improve our efficiencies and maximize our bottom line. Since the public roll out of the project one year ago, the number of trusted registered content consumers (as measured by FeedBurner) has increased ten fold from 10 to over 100 and still growing strong.
I hope this article has been useful to you in understanding how we harnassed the power of SOA and our SaaS partners network to integrate disparate content management services under a cohesive platform providing a seamless user experience resulting in a ten-fold increase in market penetration.
Resources