Lab: XMLTree RSS Processing
The Problem
RSS (Really Simple Syndication) is an XML application for
distributing web content that changes frequently. Many news-related
sites, weblogs and other online publishers syndicate their content
as an RSS Feed to whoever wants it. In this lab, you will write code
that extracts information from an RSS (version 2.0) document loaded
into an XMLTree object.
RSS 2.0 documents have the following format:
Note the following properties of RSS 2.0 XML documents:
- The children of the <channel> tag and of the
<item> tag can occur in any order; do not assume
they will appear in the order above. Furthermore there can be
other children of other types not listed above.
- <title>, <link>, and <description>
are required children of the <channel> tag, i.e.,
you should assume they are present. However, <title>
and <description> may be blank, i.e., they may not
have any text child.
- All the children of <item> tag are optional,
i.e., do not assume they are present; but, either <title>
or <description> must be present. However, the <title>
and/or <description> tags, even if present, may be
blank, i.e., they may not have any text child.
- If a <source> tag appears as a child of an <item>
tag, it must have a url attribute.
Setup
Follow these steps to set up a project for this lab.
- Create a new Eclipse project by copying ProjectTemplate.
Name the new project RSSProcessing.
- Open the src folder of this project and then open
(default package). As a starting point you can use any of
the Java files. Rename it RSSProcessing and delete the
other files from the project.
- Follow the link to RSSProcessing.java,
select all the code on that page (click and hold the left mouse
button at the start of the program and drag the mouse to the end
of the program) and copy it to the clipboard (right-click the
mouse on the selection and choose Copy from the contextual
pop-up menu), then come back to this page and continue with these
instructions.
- Finally in Eclipse, open the RSSProcessing.java file;
select all the code in the editor, right-click on it and select Paste
from the contextual pop-up menu to replace the existing code with
the code you copied in the previous step. Save your file.
Method
- Implement the following static method that, given an
XMLTree and a tag name (a String), searches the children of the
XMLTree for the given tag and returns the index of the first
occurrence of the tag or -1 if the tag does not exist.
- Review the main method skeleton and modify it to
output the title, description, and link of
the RSS channel. Each element in the output should be preceded by
a descriptive label, e.g.,
Title: Yahoo! News - Latest News & Headlines
Description: The latest news and headlines from Yahoo! News.
Link: http://news.yahoo.com/
Run the program and test your
implementation. As input you can use any URL of a valid RSS 2.0
feed, e.g., https://news.yahoo.com/rss/.
- Once you are confident that your implementations above are
correct, implement the following static method that, given an
XMLTree whose root is an <item> tag and an output stream,
outputs the title (or the description, if the title is not
available) and the link, if available.
Here is an example of what the output might look like:
Title: Tropical Storm Leslie churns northward in Atlantic
Link: http://news.yahoo.com/storm-churns-northward-winds-buffeting-bermuda-144218080.html
- Back in the main method, add code so that it
prints all items in the RSS channel by repeatedly calling processItem.
Then run and test your code to make sure it works as intended.
Additional Activities
- Modify processItem (including updating the
comments) so that, in addition to title (or description) and link,
it also outputs publication date (tag pubDate) and source
(tag source) with the source URL (attribute url
of source tag). If any of these elements are not present,
output <element> not present (where <element>
is replaced by the name of the missing tag).