Cocoon: Building XML Applications
Carsten Ziegeler, Matthew Langham
Cocoon: Building XML Applications is the guide to the Apache Cocoon project. The book contains the much needed documentation on the Cocoon project, but it does not limit itself to just being a developer s handbook. The book motivates the use of XML and XML software (in particular open source software). It contains everything a beginner needs to get going with Cocoon as well as the detailed information a developer needs to develop new and exciting components to extend the XML publishing framework. Although each chapter builds upon the previous ones, the book is designed so that the chapters can also be read as individual guides to the topics they discuss. Varied "hands-on" examples are used to make the underlying concepts and technologies absolutely clear to anyone starting out with Cocoon. Chapters that detail the author s experience in building Internet applications are used to embed Cocoon into the "real world" and complete the picture.
Also see: Chapter 6: A User’s Look at the Cocoon Architecture (864KB, .PDF)
Copyright New Riders. Used with permission.
|
Chapter 11: Designing Cocoon Applications
The previous chapters discussed how Cocoon provides a complete XML platform
for building applications. We looked at how solutions developed with Cocoon
can meet the challenges facing today's modern application architectures.
We also presented some examples for small applications and built a personalized
news portal using Cocoon concepts and technologies.
Cocoon is not a platform specifically aimed at only one application area, such
as a portal. Cocoon can be used to build a variety of applications and solutions.
Because we have been using Cocoon as a base for the paid work we do, we have
built web sites and portals and have also used Cocoon to build front ends for
databases, XML processing systems, and integration systems for different hosting
environments, such as those used for Application Service Providing (ASP).
It is our experience that learning to use Cocoon to build these types of applications
takes time, because the philosophy behind the solution is different from the
way Internet applications are commonly built, using scripting languages such
as ASP and JSP or dedicated software solutions built using servlets or other
components.
We have included this chapter to provide additional background information
and tips that we hope will help you if you want to develop a more advanced Cocoon
application, such as an Internet portal. A lot of this information will not
be completely new if you have read through the book. However, we have often
heard people say, "There is so much in Cocoon. What do I actually need
if I want to build a certain type of application?" The aim of this chapter
is to provide this information in a different context so that you can then go
back to where we originally explained it for the full details.
Before getting into the different types of applications you can build with
Cocoon, we will start with some general points that are important when designing
any type of software solution. Although this might seem to be a long list of
things to think about, remember that you will probably only need to look at
individual points when you start building real applications, such as a new Internet
portal for a major client. That being said, it is always a good idea to start
with a concept of what you will do.
The Application Concept
Few people can cook exotic meals without a recipe. The recipe gives you an
idea of what the result will look like, what ingredients you need, and how you
should prepare the dish. Using a recipe as a concept for your meal is common
sense.
When you build an application with Cocoon, a concept that includes the points
discussed in the following sections helps you plan your solution and prevents
you from making some of the more common mistakes. The following sections define
the system functionality, the application architecture, and aspects such as
performance and presentation design.
While thinking about these points, you can also try to work out which of the
described Cocoon technologies will be important for what you want to do and
whether you perhaps need to write additional components. We will also provide
some guidance for the times when, even after you've done all this, things
still don't work as you expected. We will start with probably the most
common question asked of any application: "What's it supposed to
do?"
General Functionality
The first step is to define the functionality of the system you will build.
Most systems built with Cocoon publish data in some way. In addition, there
might be functions that allow the user to interact with the application in some
form. Depending on the type of application, it might be necessary to define
areas of information that are then combined into the complete application.
As an example, imagine that you are building a web site application for an
imaginary company that produces Rewinders (don't ask us what these are;
it's imaginary). Obviously you need functions that allow general
information about your firm to be published. However, you have several different
areas of information you want to publish, so you need to structure the
application. Here are a few areas you might want to define:
General information about Rewinders
News about the company (Rewinder Inc.)
Industry news
Products offered
Jobs
Information for employees only
If you check out some company web sites, you will see that most have this
sort of structure. Each area in your web site will have subareas that provide
more detailed information. An area such as "Products" will contain all
the different types of Rewinders that are offered. The section called
"Employees Only" will provide special information about upcoming
Rewinders. This information should be available only to someone who has logged
on to the system.
After you have designed the application's structure, it is time to think
about any interactive components you might need. Perhaps you will need an
application form in the "Jobs" area or a feedback form in the
"Products" area. In addition, you will need some form of login page
for the "Employees Only" area. You also want to know when someone
looks at the new "Cool Blue Rewinder," so you specify that you want an
email to be sent when that document is viewed.
Depending on the type of application you are building, you might need only
publication functions. If your solution is aimed more at processing information,
you need more functions that allow interaction with your system.
After you have set up the application's structure, you must work out how
navigation through the system is possible. After someone enters the
"Products" area, what other areas can he access from there? What
happens if he accesses the "Employees Only" area? Working out the
navigation and flow can be one of the most time-consuming jobs when designing
the application.
A typical application will be a combination of published data and data that
flows from the user to the application. After you have defined the site's
structure, it is time to think about the content.
The data you want to publish must come from somewhere. Either it is already
stored in a file or database, or it will be obtained from external sources at
runtime. An area such as "Jobs" will access the current job openings
from a database. An application area such as "Industry News" will
probably access a news provider to obtain news about the current state of the
Rewinder industry. The authentication data is also probably contained in a
database. You will need access to it to check such things as the password when a
user wants to access the "Employees Only" area.
As soon as you know where the data you want to publish comes from, you need
to determine what format it is available in. Of course, it is ideal if the data
is supplied in an XML format.
Next you need to define your output formats. Your imaginary company wants to
publish its web site in HTML first. In addition, some of the documents are to be
in PDF, and you want to offer product descriptions in WML.
Notice that we have not yet talked about a specific technology. Indeed, a
first concept does not require any knowledge of how you will realize your
application. As soon as you have the concept in place, you can decide which
technology to use (in this case, Cocoon) and then move on to defining the actual
system architecture.
Application Architecture
After you have defined and documented the points just discussed, you can
start building the actual architecture for your application using Cocoon. You
need to define the various documents you want to publish through the web site
and work out what sort of pipelines you need in order to generate the different
formats.
Here are some of the types you need for Rewinders Inc.:
Pipelines that obtain data from a file and format that data in HTML or
WML (depending on the browser)
An additional pipeline that sends an email if a particular document is
chosen (such as the Cool Blue Rewinder)
Pipelines that access data from a database and format it in PDF (for
online product handbooks)
A pipeline that accesses online industry data and formats it in
HTML
A pipeline that receives the incoming forms data from the feedback form
and saves it to a database
As soon as you have laid out the types of pipelines you need and have decided
how many of each you require, you might need to think about splitting them
between sub-sitemaps to ease maintenance of the complete site. Another
alternative might be to use content aggregation to combine separate pipelines
into a single pipeline that is then formatted for output.
Because a complete application architecture is seldom confined to just one
area, such as what you build with Cocoon, you also need to think in advance
about things such as bottlenecks that might occur when you roll out your
solution:
What would happen if all 30,000 employees accessed the "Employees
Only" page at the same time?
How will the system react when all the customers hit the "Cool Blue
Rewinder" page at exactly the same time?
Will the email system be able to cope with all the emails?
These are the sorts of questions you should ask yourself while designing the
application architecture. This brings up one of the most important aspects of
such a system: performance.
Performance and System Environment
We have been building Internet applications for quite a few years, and it is
our experience that one of the most common problems is that after it is
installed, the solution is always too slow. This is something that all
applications suffer from, as you can see in the various discussion forums of any
software product.
This does not always mean that the programming is sloppy (although perhaps
often it is). There is often a great difference in the speed an application can
actually achieve and the perceived performance that the end-user might
experience. Also, a system that performs well when only one user accesses it
might collapse if several users send requests at the same time.
A system that integrates many different data sources might suffer from bad
performance even though the actual portal application might be fast enough. A
portal's speed is defined to a great extent by the speed at which data from
external systems is delivered. So a portal will be slow if one of the data
sources is slow. Unfortunately, no one will care that it is not your fault if it
takes minutes for the portal to appear in the browser.
When designing a complex software solution, it is always best to define
performance expectations beforehand and to test for performance as early as
possible. This sounds simple, but this point is often forgotten until it is too
late. If nobody takes the time at the beginning of the project to define the
expected performance, the system will always be too slow. It is a lot more
difficult to correct performance problems after the solution is in a production
environment.
When we installed our first online Internet banking solution, very few people
accessed their accounts via the Internet. The application worked well and
delivered the web pages quickly enough. However, no real stress testing was
performed at the beginning, so we did not really know how many requests our
system could handle. Over the months the application was installed, the number
of requests grew slowly but steadily. Still, no stress testing was done. After
all, the system ran OK didn't it? Then, for some strange reason, the number
of people using the system suddenly exploded overnight! Needless to say, the
whole system collapsed under the load. It was far worse having a nonfunctional
system in this situation than it would have been when Internet banking was still
an exotic application.
How do you define a system's expected performance? It depends on what
the system is supposed to do. The first thing you can do is check out the data
sources and decide what sort of performance you can expect from them. If you are
integrating standard data sources (such as a standard database), you can often
obtain performance data from the vendor. Get that data, but take in the
information with a grain of salt. To really check, you need to run your own
isolated tests against the single system if you can. It is much more difficult
to find bottlenecks after everything is integrated.
Before you start evaluating the performance of individual systems, make sure
you also define your computing environment. What's the good of testing the
system on some high-powered system if it will actually be running on a low-end
box? Also make sure you test on the same operating system and using the same
hosting software (such as a servlet engine). The servlet API might be
standardized, but in reality you will find that life is not so simple. And it is
a lame excuse to say, "We didn't test on that system" when a
complaint comes in.
Another way to find out what to expect from your system is to check out other
solutions that might do the same thing you are planning on doing. See how fast
they run, and try to obtain some information on how they work. Check out case
studies, often published on web sites, to find out the architecture used to
build the application. You might also be able to find out by asking whoever
built the system.
As soon as you are satisfied that you know what to expect of your
application, here are some tips on what you can use in Cocoon to achieve the
fastest possible application:
Use the built-in Cocoon caching whenever possible when building your
pipelines.
If you need to write your own components, make sure they support the
caching interfaces in Cocoon if possible.
Stress-test your application using an available tool, and observe how the
performance changes if you adjust the pooling of Cocoon components.
Make sure you are running your application with the lowest level of
trace, where only errors are logged.
Another piece of advice when writing components that connect to a specific
data source (especially if it is not your data source) is to make sure you add a
time trace. In other words, trace when you connect to the external data source,
and trace when the data is returned. That is the time someone else has to worry
about.
If, after testing with stress tools, you find that your system performance is
not good enough, you will want to look into what else you can do to improve the
response time. Obviously it is a good idea to make sure the system has enough
memory and the processor is fast enough. If you are running in a servlet
environment, you might want to try an alternative servlet engine to see if you
can get better performance.
You might also want to look into front-side and back-side caching. A
front-side cache is placed between Cocoon and the Internet. Any client program
requesting a particular document receives it from the cache, not from Cocoon
itself. The cache can store the complete document and request it from Cocoon
only if it has expired. Cocoon then generates the new document and serves it to
the cache to be stored. Look into how you can control the expiration of
generated documents using the appropriate HTTP headers in your documents. There
are several ways of doing this. For example, the Cocoon reader component allows
you to set HTTP headers. Another way is to write your own component, such as an
action that sets headers when used in a pipeline.
If you are accessing an external data source that is too slow, you might need
to implement a backside cache. This type of cache sits between Cocoon and the
external data source. The pipeline requests the data from the cache, not from
the data source itself. There are various ways of implementing the cache. You
can look at the description of how Cocoon caches pipelines to get some ideas on
how to implement your own.
It is a good idea to provide the user with some visual feedback to show what
is going on. If the user cannot see anything happening on the screen, he will
perceive system performance as being too slow, even though it might not be. One
way of doing this is to load an intermediate page that says something like
"Please wait; your data is being fetched" and then let this page call
the function on the server that does this. Presenting the user with something to
read while the work goes on in the background means that by the time the user
has finished reading, part or all of the data will have been retrieved. Look
into redirects and metatags to do this if you are building a site in HTML.
When designing HTML web sites, one of the mechanisms used most often is
frames. Although this is not a book on magical HTML design, here's a piece
of advice: Remember that each part of a frame causes a new request to be sent to
the server. So if you have a page containing four different parts (header,
footer, navigation, and actual content), that is a total of five requests to the
server and five pipeline calls in Cocoon. Try to reduce the use of frames if
possible. One way is by using Cocoon's content aggregation to aggregate the
different parts of a page and then use a stylesheet to format the output.
In addition to the tips just discussed, there are additional areas you will
want to check when you design the output formatwhich brings us to
presentation.
Presentation
Most applications have some form of presentation. Because presentation in
Cocoon is done using XSL stylesheets, you need a working knowledge of this
technology to be able to author your presentation. You will also want to look at
tools that help you author stylesheets.
One of the major steps is deciding what presentation format you need. Of
course, the advantage of Cocoon is that you can add further types of
presentations by adding stylesheets as you need them. However, this should not
keep you from planning your presentation carefully.
Decide whether you want to support each client application (such as the
different browsers) individually or whether you want to go for a format that
suits both. Be aware that by the time you have finished your application, a
yet-unknown browser might be the market leader.
Design your presentation for speed. This point is not necessarily limited to
Cocoon applications, but it is worth stressing. If you plan on presenting your
data in HTML, make sure you follow the guidelines as to how you should construct
HTML pages for maximum speed when you author your stylesheets. This can depend
on the browser type, so refer to available information on this subject.
Make sure you follow the Cocoon paradigm of separating concerns. Even though
Cocoon offers you ways of splitting layout and content, it does not force you
to. We have seen Cocoon applications built where XHTML was used as the format
for the data. Although this might seem like a good idea to start with, after
all, XHTML is an XML format. Imagine trying to then provide a presentation layer
in WML. As mentioned in Chapter 2, "Building the Machine Web with
XML," extracting the actual data from a format like XHTML is quite
difficult.
Decide whether your presentation is static or whether it offers
personalization of some sort. Check out the later section "Portals"
for more information on using personalization to influence the output of your
application.
Think about seasonal changes to your presentation. Make your application
interesting by making small changes to the web site's appearance, depending
on the current season. For example, you could give your site a Christmas feeling
during November and December. Write a component such as a selector that provides
you with this information.
If you already have HTML pages that you want to reuse in your Cocoon
application, this is also possible. You would use the HTML generator to read the
HTML and then have a stylesheet format the XHTML into the format you require.
This is a way of easing the migration path to a complete XML/XSL-based solution.
Another way of migrating is to have the Cocoon solution run in parallel to the
application you already have. Cocoon can then generate parts of your site for
you. Any new HTML pages can be authored using stylesheets, and the existing site
can be served as before.
Even though you might have authored your HTML documents using stylesheets,
there will be times when you need to include technologies such as JavaScript in
your pages. Another technology that is often used with HTML is Cascading Style
Sheets (CSS). CSS is often used to achieve dynamic look-and-feel changes on HTML
pages. All of this can be used (or reused) in a Cocoon environment. The site map
must be configured to allow the JavaScript (.js) files and the CSS (.css) files
to be served through Cocoon. Look into using a reader to do this. Alternatively,
these files can be served directly from the web server.
It's possible to use other technologies inside your web pages in the
same way. You can use Java applets inside web pages by using the appropriate
tags to include them inside the generated HTML pages. Just make sure your .jar
file can be served either through Cocoon or directly.
While someone is working on the presentation side of the application, someone
else can be defining the content.
Know Your Content
We have already mentioned that the XML parser used in Cocoon can validate the
XML data it parses. It can do this using DTDs or XML Schemas. When building the
application, you will probably not yet have a DTD for all your data. This means
that you cannot use XML validation in Cocoon, because you can only activate it
for all the documents, not for an individual one. Even if you do not use the
parser to validate the data, you should document your XML using either a DTD or
an XML Schema before moving the application into a production environment. (Of
course, the earlier the data's format is documented, the better.)
As more and more XML tools come onto the market, they begin to offer advanced
features such as automatically validating the data you enter into, say, an
editor. Now, suppose you have a Cocoon-based system and have authors who are
writing content for that system. Often, they will use third-party tools to do
this and then upload the content to the system or deploy it through some other
means (perhaps saving it to a database). Obviously this is ideal if you can
provide these authors with a DTD of the data. They can then use the DTD inside
their editing program, and you know that the data they submit will be in a
format you expect and have written stylesheets for.
While the designers are working on developing the stylesheets that will
present the data, that data also needs to be defined and documented.
Document Your Data Sources
We talked briefly about external data sources when we discussed application
performance. However, other factors also need to be taken into account when data
is obtained from an external provider, such as a news feed.
Obviously, the most important fact is that you know exactly what format the
data will be in. The best way to achieve this is if the data's format is
documented in some way, such as in a DTD. You read about the various ways of
documenting XML data in Chapter 2. It is an enormous advantage if your provider
can send you the data in a standardized format. This becomes a great time-saver
if you have to integrate several sources and they all can provide the data in
the same format. It will then be possible to reuse the stylesheets. This is
true of the news providers we looked at when building the Cocoon news portal in
this book. Because the news is provided in RSS format, you could use the same
stylesheet for several different feeds.
When designing the flow of data through your application, you need to consider
two important points. The first point is the internal data definition. As shown
in Figure 11.1, this is the format of the
news data in your application. Every external data format needs to be converted
into this format, so you need a stylesheet for every data source. Obviously
it makes sense to choose a standardized format as your own internal format.
This reduces the number of transformations you need, because not every external
source that already supports your internal format needs a stylesheet transformation.
The next step is to define a logical layout format. News data is not normally
structured for presentation, so you need to think about defining a format that
allows transformations into the end format, such as HTML or PDF. If your
application is not limited to publishing just news data, but it also publishes
other types of information, you will want to look into defining a logical layout
format that is not data-specific. This lets you easily publish different types
of data using the same stylesheets.
If you opt to use a standard format such as WML or XHTML as your logical layout
format, make sure you will still be able to convert this format into a different
layout, as shown on the right side of Figure
11.1.
This concept leaves you with three different transition areas:
Incoming data must be transformed into your news data format.
The news format must then be transformed into the logical layout
format.
The last area of transformation is into the regular output
format.

Figure 11.1 Format transitions using stylesheets.
Check to see whether
your data source is always online. Nothing is more embarrassing than finding out
that your news provider is online only during the day when your news portal
crashes the first night. Use appropriate selectors in the pipeline to ensure
that you access the online server during the day and perhaps a database
repository at night.
Make sure you can obtain the data you need with the least number of requests
possible. We have seen a Cocoon-based application built to present stock
information in which one block of information (such as an overview page)
required the middleware solution to perform more than 20 requests against the
data provider. Even worse, most of these requests had to be sent in order,
because they were dependent on each other. The problem isn't that this
can't be done with Cocoonit can. But if you remember the earlier tips
on performance, perhaps you will see why this point is worth stressing.
After you've defined the functions your system should have, the layout
you want to present to the user, and the data format that is to be the core of
your application, you need to look at the Cocoon components you can use to do
all this.
Different Technologies
As mentioned at the beginning of this chapter, Cocoon provides many ways of
solving certain problems. People new to Cocoon are sometimes overwhelmed by the
many possibilities. Often, only one type of component is used to solve a problem
when perhaps a different solution would have been better. As an example, when
starting out with Cocoon, we often found ourselves writing new transformers when
it would have been better to use an action or selector instead.
Here are some tips on when to use what:
Using a given component is better than writing your own.
Use generators when you have an identifiable data source that can be used
as the starting point for your pipeline.
Use transformers when you need to manipulate the XML data flowing through
the pipeline.
Use actions and selectors to influence the pipeline if their results do
not need to manipulate the output document.
Use an action if you want to execute a task that does not influence the
XML processing pipeline.
Use a selector if you want to choose between different processing
pipelines.
Use XSP for rapid development of a custom generator, and transform it
later into a real generator.
This section has looked at a few aspects that are important when you design
your Cocoon application. Performance is probably the key factor when the
application is actually finished and installed. A well-thought-out concept is a
necessary starting point for good design. "Program now; think later"
is, in our opinion, not the way to build Cocoon applications. Unfortunately,
even writing a great concept beforehand still might not prevent problems from
occurring.
Solving Problems
So, you've written the concept, designed the architecture, written any
needed components, and built the pipelinesand things still don't work
as you expected. Here is a two-sentence answer to this problem:
Sounds simple, doesn't it? But for many cases, this is true. Problem
solving has become easier with the Internet. When we first started using Usenet
newsgroups (which were exchanged using UUCP back in those days), we could post
our problemsnot just to our colleagues in Paderborn, Germany, but to the
whole world! And the Internet has expanded this "knowledge base" so
that now it is very probable that someone out there has already had the same
problem you are trying to solve.
The Cocoon web site is a good starting place for finding information and
help. There you can find mailing lists and archives of past list discussions.
Chances are your question is there somewhere. Subscribe to the mailing lists and
join the Cocoon community. Appendix C, "Links on the Web," lists links
for the Cocoon web site.
Search engines are also a good choice when you are looking for a solution to
your problem. However, if you query a search engine, you probably will be
swamped with thousands of answers that don't really help. If you already
know roughly the area your question applies to, perhaps checking one of the
newsgroups is a better way to go. There are newsgroups for most of the subjects
in this book, such as XML and XSL. However, there is as yet no newsgroup for
Cocoon. Hopefully, you will be able to solve any problem that might arise using
one of the listed methods.
Using the information discussed so far should allow you to complete your
application concept and design the architecture of your solution, complete with
the required Cocoon technologies. Even though most people who look at Cocoon and
read this book will already have an exact idea of the type of application they
want to build, it is always a good idea to see how other people are using the
technology. The following examples might provide some additional ideas for the
types of applications you can build with Cocoon.
Different Types of Applications
Cocoon lends itself to being used to build a variety of solutions. Although
Cocoon is aimed primarily at the XML publishing sector, adding your own
components lets you expand Cocoon into a complete middleware architecture.
In the past we have worked on building a commercial solution that provides
additional (and sometimes customer-specific) components needed to provide a
complete solution. We added components and functionality to Cocoon without
throwing away a single Cocoon concept. This shows the extensibility of the
architecture.
To give you some idea of what perhaps you can do to solve a specific problem,
here are some of the extensions we have written to provide the various solutions
we have built with Cocoon:
Components for authentication and user administration
Portal framework components
A complete XML/XSL-based content management system
Integration components for a commercial XML database
System management components
Although these components were not written as part of the Cocoon project,
some of them will find their way back into Cocoon and hopefully will be
available in the not-too-distant future.
Using Cocoon and the additional components allows you to build applications
such as portals, flexible publishing systems, and web sites. Because Cocoon can
process XML data, you can also build solutions that can receive complete XML
documents as input and process them using pipelines.
Let's look at some of these application types in more detail. The most
common Internet application is the web site, where information is published as
HTML. This type of application becomes more complex to develop when the
information is stored in external systems such as databases and when additional
formats such as PDF are required. The web site needs to be extended into a
network publishing application to provide these advanced capabilities. When
several different types of users are accessing the system, some form of
personalization is called for. The term portal is often used to describe this
type of application. This chapter concludes with a look at how to use Cocoon to
build portals.
Using Cocoon to Build Web Sites
One of the most common uses of Cocoon is as a system for building web sites.
After all, that is its main function. Many web sites already use Cocoon; they
are listed on the Cocoon web site. We discussed a web site example earlier in
this chapter. Now we will add to the information that was discussed there.
Remember that Cocoon organizes a web site's content using a sitemap.
Although it is possible to define a pipeline for each document your web site
will serve, this would result in a sitemap that becomes very hard to maintain.
Therefore, you need to define pipelines that can handle similar types of
content, perhaps split into different areas. Look into how you can use wildcards
in the sitemap as a method of combining several documents into one pipeline.
Make sure the layout developer (the author of the stylesheets) uses a tool
that can perform XSL transformations on some sample data for that format. You
should provide the author with sample data to use. It will be easier for him to
test individual stylesheets this way instead of having to use Cocoon each
time.
Another important point is to make sure the layout deployers use a tool that
either already uses the Xalan XSLT component or that lets you use it
additionally. If the tool allows a version of Xalan to be used, make sure you
use the same version as the one in the Cocoon you will be running. Which tool is
best suited for the job depends largely on exactly who will be using it and for
what purpose. We have provided a list of relevant links to tools in Appendix
C.
Although your first-version web site might only read its content from XML
files and publish to a single format such as HTML, one day you will want to use
something more advanced to store your data, such as a database. You might also
need to integrate external systems such as mainframes into your application. In
addition, there might be demand for additional formats as users use devices such
as mobile phones to access your solution. The web site must therefore be
extended into a network publishing application.
Network Publishing Applications
Although this is only a different way of defining something, we use the term
publishing application to emphasize that the data you want to display is
actually stored somewhere, and we don't mean in a file. A publishing system
might generate reports from data that is obtained from a database, for example.
It then might manipulate the data in some way, perhaps to generate different
views and then publish that data in one or more formats.
Areas you will want to look into include the Cocoon components that allow you
to access data from a database or external systems such as a remote XML server
via HTTP. You will also want to learn more about standards such as XSL:FO. After
it is formatted this way, your data can be laid out in different output formats,
such as PDF or PostScript.
Publishing systems might be the first time you need to publish data that is
dependent on the type of end device. For example, you could allow mobile phone
users to access only the most important information while allowing browser users
to access the full beauty of your web site.
In our experience, using Cocoon as a publishing system for specific data is
an ideal way to introduce the technology into a new area. Applications such as a
report generator, which reads data from a database, consolidates it, and then
presents that information in HTML and PDF, can be built in an isolated fashion
that does not intrude on given software structures. The first little application
we built with Cocoon was a front end to an internal database we had at that time
containing work reports. The solution read the data from the database dependent
on a query parameter and then presented an overview of the data in the various
formats. As a prototype showing what could be done with Cocoon and how flexible
it was, this was an ideal solution.
Publishing systems might be the first time you also need to integrate
something like user authentication and personalizationallowing only
certain people to access the data. This brings us to the next application
formthe portal.
Portals
Although you probably think of something like myYahoo or myAOL when the term
portal is used, portals can actually be a lot simpler. We refer to this type of
application whenever some form of user authentication is necessary to access
information or when information can be individually personalized. This
personalization can range from changing the color of a single document to
configuring external news sources in a news portal.
In our portal example, built over several chapters, we have already seen how
it is possible to build a portal using Cocoon. Nevertheless, and because we know
that some readers might jump right to this section, we will go over some of the
main points again and in a more general context.
In order for personalization to be possible, we need to be able to recognize
the user when he accesses the portal. Most portals require some form of
authentication, such as entering a user ID and password. This data is then
matched against a repository, such as a database, and the user is rejected if
there is no match. Each user therefore requires an entry in the database, and
the application perhaps also needs to cater to an anonymous user (a user without
a login). After the user is authenticated, the application will want to allow
the user to access the different areas in the portal without having to log in
again. Look into ways of creating a session when running inside a servlet engine
in order to do this. It will also be necessary to recognize a returning portal
user so that he does not have to log in each time he accesses some part of the
portal. An appropriate action component can solve this problem.
Another important step is to define the portal structure. What information
will be available to the user after he has logged in? Will each user have an
individual profile, or will the portal cater to only specific groups of users?
As soon as this has been decided, a suitable XML format for the profiles can be
defined. The profile should then contain information relevant to the
personalization (such as colors) or to the individual preferences in regard to
the types of information to be displayed.
Therefore, the first step of building the portal is to define where the user
data and the portal profile are to be stored. Then the application needs to
define and set up a pipeline in Cocoon for the authentication. One way of doing
this is to have an HTML form send the user ID and password to Cocoon and then
use the sql_transformer to select the user and profile from the
database.
If the portal profile contains data on the types of information that are to
be displayed, this information must be fetched and integrated into the profile
so that it is complete before it reaches the stylesheet. Look into using content
aggregation as a way of doing this. Each different data source will then return
information that is added to the user's profile, so that the end result
will be a complete portal in XML.
After the profile has been selected and all the data fetched from the various
sources, the complete profile can then be transformed into a specific look and
feel using a stylesheet. The stylesheet can access specific details contained in
the individual profile and format the output as necessary.
If the personalization is based on the user who accesses the site, you need
to define what types of information the user can change and how the presentation
should be affected by, say, his age. If you will be providing a different layout
for teenagers than for middle-aged people, you will need to define the criteria
by which this can be decided. Writing a new component such as a selector is an
ideal way of doing this.
Think about whether you want to change the presentation dependent on other
factors, such as the time of day or the weather. Say you are building a
stock-quote portal and you present the current market chart (say NASDAQ) on your
front page. After the NASDAQ closes for the day, it might be a good idea to
present a different chart, such as from Asia. So if you want to switch content
and presentation dependent on the time of day, look into the Cocoon selector
component as a way of doing this.
If you are thinking about building a late-night portal, in which the
presentation changes after a certain hour, remember that your user might be
living in a different time zone, so it might be the middle of the day for him
when you select the late-night presentation.
Summary
This completes this chapter on Cocoon application design. As we said at the
beginning, you can build many different types of applications with the current
version of Cocoon. Although Cocoon's main focus currently is on web sites,
as more components are built that integrate into the Cocoon architecture, it
will expand and become a platform for other types of applications as well.
This is one of the great advantages of using Cocoon as a base for XML
applications. Because of the way new components can be easily added, there is
really no limit as to how you can use Cocoon as the platform for your solution.
As an open-source project, it has much support from individuals and companies.
Several firms have donated components to the Cocoon project and in so doing have
helped the software become better suited for application scenarios such as the
network publishing system and portal described in this chapter. The next chapter
outlines some of the directions Cocoon might go in as XML and XML applications
become more widespread. It also provides some additional ideas as to where
Cocoon can be usedperhaps in your particular environment.