A Simple Twitter Search Parser with PHP

This article is obsolete. Now Twitter has a more complex API for tweets.

I was recently asked to aggregate tweets based on their hashtags using PHP (no Ajax), so I decided to turn this into a small tutorial that will hopefully enable you build all sorts of XML parsers in PHP.

PHP Twitter Reader
PHP Twitter Reader

If you read my posts, you can see that I love simplicity. In all of my solutions, tips and tutorials, I strive for the simplest code that gets the job done and for the most straightforward explanation. This tutorial is no exception. It is my sincere hope that you’ll not just copy & paste the code in your project, but you will actually understand it as well and you’ll be able to modify and extend it to your purposes.

Twitter has a search service at search.twitter.com. The search results are available as an Atom feed and this is how we’re going to use it. If you’re wondering why Atom instead of RSS, one can argue that despite the popularity of RSS 2.0, Atom is a superior format.

Building the parser

My goals for this little parser were as follow:

  • Show the tweets in the format “Full Name: text – time”
  • Show the sender’s avatar
  • Show relative time, e.g. “5 minutes ago”.
  • Open links in a new window
  • Limit the number of results (and process just the first page of results)
  • Filter tweets containing profanity
  • Style everything with CSS
  • Work with PHP 5.

So, first I should stress that this code is written for PHP 5, specifically it was not tested with PHP versions prior to 5.2.0.

I made this into a class, so that you can easily use it in your project:

To load and parse an XML file, the easiest method is simplexml_load_file(), however Twitter is rather picky with request headers and doesn’t like if the user agent is not set the way it likes, so we’ll use curl instead.

    $ch = curl_init($this->searchURL . urlencode($q));
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
    $response = curl_exec($ch);

Pretty simple. The search term is encoded and appended to the Twitter search url and the result is loaded in the $response variable as a string. Also note that we’re making the request using the browser user agent.

Parsing the resulted string could not be easier:

      $xml = simplexml_load_string($response);
      $output = '';
      $tweets = 0;

      for($i=0; $i<count($xml->entry); $i++)
      {
        $crtEntry = $xml->entry[$i];
        $account  = $crtEntry->author->uri;
        $image    = $crtEntry->link[1]->attributes()->href;
        $tweet    = $crtEntry->content;
      }

So we can get the link to the poster account, the image and the tweet itself right away.

To get the name, we need a little parsing. The name is sent this way: “username (Full Name)”. I prefer to show just the full name, so I’m using a simple regexp:

        $this->realNamePattern = '/\((.*?)\)/';
        preg_match($this->realNamePattern, $crtEntry->author->name, $matches);
        $name = $matches[1];

Next, it’s using relative time instead of absolute. This is a matter of personal taste, but considering how quicky new tweets are added, it’s worth doing.

For this we’ll use two arrays, one with various interval names, the other with the number of seconds in that interval, e.g. an hour has 3600 seconds and so on.

    $this->intervalNames   = array('second', 'minute', 'hour', 'day', 'week', 'month', 'year');
    $this->intervalSeconds = array( 1,        60,       3600,   86400, 604800, 2630880, 31570560);

The idea is this: we calculate the difference in seconds between the current time and the tweet time and then we start looking in the interval array from the largest to the smallest value, until our difference is larger than the value read from the array. For example, if our calculated difference is 173000 seconds, we start with the last value in the array, that is 31570560 and look until we find the value 86400, which corresponds is the ‘day’ interval. Now we know our difference is more than one day but less than one week. By dividing the difference by the interval length, that is 173000/86400, we get 2.002, that’s just a little over two days. If the division is exactly 1, we must use the singural form, i.e. ‘day’, otherwide the plural, ‘days’.

So here’s the code that does all that:

$time = 'just now';
        $secondsPassed = time() - strtotime($crtEntry->published);
        if ($secondsPassed>0)
        {
          // see what interval are we in
          for($j = count($this->intervalSeconds)-1; ($j >= 0); $j--)
          {
            $crtIntervalName = $this->intervalNames[$j];
            $crtInterval = $this->intervalSeconds[$j];

            if ($secondsPassed >= $crtInterval)
            {
              $value = floor($secondsPassed / $crtInterval);
              if ($value > 1)
                $crtIntervalName .= 's';

              $time = $value . ' ' . $crtIntervalName . ' ago';

              break;
            }
          }
        }

Finally, it’s the filtering. Depending on your site audience you may or may not need such a filter, I’m including it just in case.

You’d have a list of banned words in an array, like this:

    $this->badWords = array('bannedword', 'anotherbannedword');

and the code:

        $foundBadWord = false;
        foreach ($this->badWords as $badWord)
        {
          if(stristr($tweet, $badWord) !== FALSE)
          {
            $foundBadWord = true;
            break;
          }
        }

        // skip this tweet containing a banned word
        if ($foundBadWord)
          continue;

Now let’s put everything together:

The complete class

<?php

class twitter_class
{	
	function twitter_class()
	{
		$this->realNamePattern = '/\((.*?)\)/';
		$this->searchURL = 'http://search.twitter.com/search.atom?lang=en&q=';
		
		$this->intervalNames   = array('second', 'minute', 'hour', 'day', 'week', 'month', 'year');
		$this->intervalSeconds = array( 1,        60,       3600,   86400, 604800, 2630880, 31570560);
		
		$this->badWords = array('somebadword', 'anotherbadword');
	}

	function getTweets($q, $limit=15)
	{
		$output = '';

		// get the seach result
		$ch= curl_init($this->searchURL . urlencode($q));

		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
		$response = curl_exec($ch);

		if ($response !== FALSE)
		{
			$xml = simplexml_load_string($response);
	
			$output = '';
			$tweets = 0;
			
			for($i=0; $i<count($xml->entry); $i++)
			{
				$crtEntry = $xml->entry[$i];
				$account  = $crtEntry->author->uri;
				$image    = $crtEntry->link[1]->attributes()->href;
				$tweet    = $crtEntry->content;
	
				// skip tweets containing banned words
				$foundBadWord = false;
				foreach ($this->badWords as $badWord)
				{
					if(stristr($tweet, $badWord) !== FALSE)
					{
						$foundBadWord = true;
						break;
					}
				}
				
				$tweet = str_replace('<a href=', '<a target="_blank" href=', $tweet);
				
				// skip this tweet containing a banned word
				if ($foundBadWord)
					continue;

				// don't process any more tweets if at the limit
				if ($tweets==$limit)
					break;
				$tweets++;
	
				// name is in this format "acountname (Real Name)"
				preg_match($this->realNamePattern, $crtEntry->author->name, $matches);
				$name = $matches[1];
	
				// get the time passed between now and the time of tweet, don't allow for negative
				// (future) values that may have occured if server time is wrong
				$time = 'just now';
				$secondsPassed = time() - strtotime($crtEntry->published);

				if ($secondsPassed>0)
				{
					// see what interval are we in
					for($j = count($this->intervalSeconds)-1; ($j >= 0); $j--)
					{
						$crtIntervalName = $this->intervalNames[$j];
						$crtInterval = $this->intervalSeconds[$j];
							
						if ($secondsPassed >= $crtInterval)
						{
							$value = floor($secondsPassed / $crtInterval);
							if ($value > 1)
								$crtIntervalName .= 's';
								
							$time = $value . ' ' . $crtIntervalName . ' ago';
							
							break;
						}
					}
				}
				
				$output .= '
				<div class="tweet">
					<div class="avatar">
						<a href="' . $account . '" target="_blank"><img src="' . $image .'"></a>
					</div>
					<div class="message">
						<span class="author"><a href="' . $account . '"  target="_blank">' . $name . '</a></span>: ' . 
						$tweet . 
						'<span class="time"> - ' . $time . '</span>
					</div>
				</div>';
			}
		}
		else
			$output = '<div class="tweet"><span class="error">' . curl_error($ch) . '</span></div>';
		
		curl_close($ch);
		return $output;
	}
}

?>

To use the class in another php file, you’d do use it like this:

<?php
  require('twitter.class.php');
  $twitter = new twitter_class();
  echo $twitter->getTweets('search term', 10);
?>

This will show the latest 10 tweets for your query.

You can style the results any way you want. Styling is outside the scope of this tutorial but you can look at the end of the class to see the html tags and classes that are generated.

Further improvement

Given the quasi-real-time nature of Twitter (depending on the topic, tweets get published every moment), you may want to use Ajax to load new tweets. You can give an id to each tweet (usually the timestamp) and modify the PHP to return only tweets newer than the timestamp. You can use either an Ajax library like JQuery or Flash to load and show the new tweets and a few seconds later to make a new request specifying the latest id.

Armand Niculescu

Armand Niculescu

Senior Full-stack developer and graphic designer with over 25 years of experience, Armand took on many challenges, from coding to project management and marketing.

24 Responses

    1. When you do a normal search, you can use the simple NEAR operator; however, when using the ATOM feed, it always expects to have the geocode parameter, otherwise it throws an error.

      The only way I see it to do it dynamically is to use a free Geolocation service like https://www.geonames.org, parse the NEAR parameter, make a request to their web service, get the coordinates and then make the Twitter search. Not really worth it in my opinion.

      However, if the search is always the same, you can edit the twitter.class.php and on line 8, hardcode the coordinates like this:
      $this->searchURL = 'http://search.twitter.com/search.atom?geocode=40.75604%2C-73.986941%2C50.0mi&q=near%3Anyc+within%3A50mi';
      and when you make the search, just send an empty string – echo $twitter->getTweets('', 15)

      It’s not an ideal solution, especially if you need more than one hardcoded search.

  1. I love it, I just wish there were a way to specify which size of avatar you want to pull… As it is now, it’s pulling a 48×48 avatar but I need to pull a 44×44 instead.

    I tried using timthumb, but even after adding a1.twimg.com, a2.twimg.com, and a3.twimg.com to the list of remote sites in the timthumb script, it still won’t work.

    oh well.

    1. I am not familiar with timthumb but it would seem a serious overhead to resize the images on the fly.
      You could use CSS to either resize the thumbnails in the browsers or to clip/mask parts of the thumbnail…

  2. Hi, this works wonderfully for normal search terms, but I’m having trouble getting tweets from a single user using the search operator “from:user” It returns a url like this:

    http://search.twitter.com/search.atom?q=from%3Anytimes

    I’m plugging in ‘from%3Anytimes’ as my search term, but I’m not getting any results. Am I doing something stupid?

    1. Sorry for my late reply, Jay.
      You should use the normal search term, e.g. “from:nytimes” as the php class does the URL encoding for you (%3A is the encoded value for “:”)

      echo $twitter->getTweets(‘from:nytimes’, 15);
      will work just fine.

  3. great post!!

    Is there a way to get an array of found tweets, so that I can count() them? All I want is to output the number of found tweets against a search term.

    1. You can simply edit the twitter.class.php file.
      At line 30 you can simply write return count($xml->entry) and the class will return the number of tweets rather that the contents.

  4. I was leaning toward client side javascript/Ajax to accomplish this, but I love what you have done with PHP instead. This is really clean code. Do I have permission to reuse and style it to for use with CSS? Also, Is there anyway to increase the tweet posts to last more than 2 days? I would like to see tweets on a #hashtag or keyword last for more than a week. Excellent work!

    1. Hi Jason,

      yes, feel free to use the code for any purpose, commercial or not. The timespan is dependent only on the number of tweets you want displayed. If you want to see old tweets, you’ll have to increase the number of displayed tweets.

  5. Thank you Armand! I increased the number of tweets to 50:

    getTweets(‘ifest’, 50);
    ?>

    However, it appears that after 48 hours the tweets begin to drop off the list. Here is the example I created: http://jhaag.us/twitter_class/ for the search term “ifest”. There were 8 tweets on there yesterday and today there are only two.

    1. Hi Jason,

      First of all, Twitter limits the search results set to 15. It’s possible to get more, but it’s a bit more complicated as a request has to be made for each 15 results page and we also need to keep track of IDs to prevent duplication of results. I will add such a feature if requested by more people.

      Second, your query for ‘ifest‘ returns less than 15 results because by default the class filters for English language only. However, I see that people tweet in English even if the language code is set to something else, probably due to their twitter clients. So, in twitter.class.php, at line 8, remove the lang=en& part.

  6. I’ve been looking for something like this for a while. Thanks a lot!
    Your tutorial was very useful for me.
    I am also kind of interested in more than 15 posts, but I may try integrating this with a MySql database so that I can store tweets and display them as needed.
    Thanks again!

  7. Actually, if possible I would love to know how to style the “Twitter Search Term”. I can style everything else with CSS, but I can’t seem to find a way to style only the search term.

  8. Thank you so much!!!!!! I was trying to use php to style it! This is much easier 🙂

  9. Love your work…

    I’ve got as far as reloading a div with AJAX and having that ‘div class = the time’ – any tips on the PHP for only fetching a tweet newer than the timestamp? Would appreciate any help 🙂

  10. Very simple and really helpful !!! Thanx lot Armand
    keep it up this good work… 🙂

  11. hi, As you said, can you provide that mechanism in details(may be a new post) to get all the tweets for a search or geocode api rather than limiting it to 15. Really urgent !!!
    Thanks again…

  12. This appears to have been affected by changes to the Twitter API. I got it working again once I removed lang=en from line 8, so it just reads:

    $this->searchURL = ‘http://search.twitter.com/search.atom?&q=’;

Comments are closed.