Skip to content

Writing a Custom RSS Reader Web Part

June 17, 2010

For this entry, I’m exploring a very specific case that uses things that are practical in a multitude of situations. Specifically, I’m developing a Web Part that will integrate a user-defined number of stories from a Xigla Absolute News Manager system into a custom Javascript news slideshow of sorts. To do this, I’m going to parse through an RSS file and take out the data I want. I’m then going to convert it to the format the Javascript slideshow wants.

Step 1: Reading RSS
After a little bit of research, this isn’t too hard. In fact, Microsoft Support has an article with everything you need for this, but I’ll share some code and elaborate a bit anyway. In my case, the News Manager can generate an RSS feed with a certain amount of items — perfect! I’m building the RSS URL and storing it to rssURL. Then, I’m loading that RSS file into something usable with System.XML.XMLTextReader. The variable articleCount is a user-defined value representing the number of articles to display.

// Pull in data from RSS feed
// ?h is the number of articles to display
string rssURL = "http://wwwdev.co.boulder.co.us/newstest/rss.aspx?z=1&h="
   + articleCount;
XmlTextReader reader = new XmlTextReader(rssURL);

Now, as the Microsoft article says to do, I’m going to loop through a while() statement as such:

while (reader.Read())
{
}

Inside this while() statement, I need to accomplish the following things:

  1. Determining whether or not the program has reached the first <item>, and only processing data if it has.
  2. Removing HTML tags from CDATA’d <description>s because there seems to be lots of Microsoft Word-generated HTML and life will be better without it.
  3. Converting newlines to paragraphs to re-institute some valid HTML.
  4. Ignoring lines that start with “Contact” or “Updated”  because these are included in most articles but should not be in the summaries.
  5. Limiting the number of sentences displayed to a user-defined value, with sentences being recognized by a period and a space (unless it’s a.m. or p.m.).

Step 2: Format the content
As I’ve mentioned in a previous post, script.Append(); is just adding code to the variable that will be output at the end of my program, so for those of you keeping score at home, you can treat it as Console.Write(); or anything else that outputs text. To strip HTML tags, I’m going to stealuse a regular expression method as described on csharp-online.net (thank you!). This is the code from that website; I adapted it into my while() loop as you’ll see below.

using System.Text.RegularExpressions;
...
const string HTML_TAG_PATTERN = "<.*?>";

static string StripHTML (string inputString)
{
   return Regex.Replace
     (inputString, HTML_TAG_PATTERN, string.Empty);
}

Step 3: Put it together
And now the whole thing together in my while() loop:

while (reader.Read())
{
    switch (reader.NodeType)
    {
        case XmlNodeType.Element:
            // Set thisElement to the name of the current tag we're in
            thisElement = reader.Name;
            if (thisElement == "item")
            {
                // We have reached the first <item>; it's ok to process data
                reachedContent = true;
            }
            break;
        case XmlNodeType.Text:
            if(thisElement == "title" && reachedContent == true)
            {
                // Article title
                script.Append("<p><strong>" + reader.Value + "</strong></p>\n");
            }
            break;
        case XmlNodeType.CDATA:
            if (reachedContent == true)
            {
                // Remove HTML tags:
                articleSummary = Regex.Replace(reader.Value,
                    HTML_TAG_PATTERN,
                    string.Empty);

                // Replace   with " "
                articleSummary = articleSummary.Replace(" ", " ");

                // Split up article by newlines:
                String[] articleParts = articleSummary.Split('\n');
                int customSummarize = summarize;

                foreach(string articleLine in articleParts)
                {
                    // Do first two lines start with "Contact" or "Updated"?
                    // If not, add the line to articleOutput.

                    if (articleLine.Length >= 7 && loopcount < 2)
                    {
                        string firstSeven = articleLine.ToLower().Substring(0, 7);
                        if (firstSeven.Contains("contact")
                            || firstSeven.Contains("updated"))
                            customSummarize++;
                        else
                            articleProcessed += "<p>" + articleLine + " </p>\n";
                    }

                    else if (articleLine.Contains("-END-"))
                    {
                        ;
                    }

                    else
                        articleProcessed += "<p>" + articleLine + " </p>\n";

                    loopcount++;
                }

                // Cut the summary down to customSummarize number of sentences
                string[] articleOutputParts = Regex.Split(articleProcessed,
                    "\\.[\\s]");

                // Check if one of the ". " found was actually from a.m. or p.m.
                // If so, increase summarize to compensate
                for (int i = 0; i < summarize; i++)
                {
                    articleOutput += articleOutputParts[i] + ". ";

                    if (articleOutputParts[i].Length > 4)
                    {
                        string thisSentence =
                            articleOutputParts[i] + "[end]";
                        string preceding =
                            thisSentence.Substring((thisSentence.Length - 8),
                                3).ToLower();

                        if (preceding == "a.m" || preceding == "p.m")
                            summarize++;
                    }
                }

                script.Append(articleOutput);
                // Reset variables for next time through
                articleOutput = "";
                articleProcessed = "";
                loopcount = 0;
            }
            break;
    }
}

I apologize for the gratuitous line breaks; code doesn’t wrap well. There you have it, though. That code does my five things pretty well, and hopefully you can glean something useful from it, even if it’s just how to use substrings or regular expressions. The next step would be to apply styles to the output; right now it’s a bold title with the content preview in paragraphs. If you have any input on the code, better ways to do things, etc., feel free to comment and help out me and anybody else who happens upon this entry.

Resources I Used:

Advertisements
No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: