don't_panic
personal and professional blog of mike pultz, technology specialist and serial entrepreneur.

8Jan/120

How To Mine Twitter Streams from PHP in Real Time

Tweet

Need to mine Twitter for tweets related to certain keywords?

No problem-spacer

Twitter provides a pretty simple streaming interface to the onslaught of tweets it receives, letting you specify whatever keywords you want to search for, in a real-time "live" way.

To do this, I created a simple PHP class that can run in the background, collecting tweets for certain keywords:

ctwitter_stream.php

class ctwitter_stream
{
    private $m_username;
    private $m_password;

    public function __construct()
    {
        //
        // set a time limit to unlimited
        //
        set_time_limit(0);
    }

    //
    // set the login details
    //
    public function login($_username, $_password)
    {
        $this->m_username = $_username;
        $this->m_password = $_password;
    }

    //
    // process a tweet object from the stream
    //
    private function process_tweet(array $_data)
    {
        print_r($_data);

        return true;
    }

    //
    // the main stream manager
    //
    public function start(array $_keywords)
    {
        while(1)
        {
            $fp = fsockopen("ssl://stream.twitter.com", 443, $errno, $errstr, 30);
            if (!$fp)
            {
                echo "ERROR: Twitter Stream Error: failed to open socket";
            } else
            {
                //
                // build the request
                //
                $request  = "GET /1/statuses/filter.json?track=";
                $request .= urlencode(implode($_keywords, ',')) . " HTTP/1.1\r\n";
                $request .= "Host: stream.twitter.com\r\n";
                $request .= "Authorization: Basic ";
                $request .= base64_encode($this->m_username . ':' . $this->m_password);
                $request .= "\r\n\r\n";

                //
                // write the request
                //
                fwrite($fp, $request);

                //
                // set it to non-blocking
                //
                stream_set_blocking($fp, 0);

                while(!feof($fp))
                {
                    $read   = array($fp);
                    $write  = null;
                    $except = null;

                    //
                    // select, waiting up to 10 minutes for a tweet; if we don't get one, then
                    // then reconnect, because it's possible something went wrong.
                    //
                    $res = stream_select($read, $write, $except, 600, 0);
                    if ( ($res == false) || ($res == 0) )
                    {
                        break;
                    }

                    //
                    // read the JSON object from the socket
                    //
                    $json = fgets($fp);
                    if ( ($json !== false) && (strlen($json) > 0) )
                    {
                        //
                        // decode the socket to a PHP array
                        //
                        $data = json_decode($json, true);
                        if ($data)
                        {
                            //
                            // process it
                            //
                            $this->process_tweet($data);
                        }
                    }
                }
            }

            fclose($fp);
            sleep(10);
        }

        return;
    }
};

The "process_tweet()" method will be called for each matching tweet- just modify that method to process the tweet however you want (load it into a database, print it to screen, email it, etc). The keyword matching isn't perfect- if you search for a string of words, it won't necessarily match the words in that exact order, but you can check that yourself from the process_tweet() method.

Then create a simple PHP application to run the collector:

require 'ctwitter_stream.php';

$t = new ctwitter_stream();

$t->login('your twitter username', 'your twitter password');

$t->start(array('facebook', 'fbook', 'fb'));

Just provide your twitter account username/password, and then an array of keywords/strings to search for.

Since this application runs continuously in the background, it's obviously not meant to be run via a web request, but meant to be run from the command line of your Unix or Windows box.

According to the Twitter documentation, the default access level allows up to 400 keywords, so you can track all sorts of things at the same time. If you need more details about the Twitter streaming API, it's available here.

This class uses the HTTPS PHP stream- so you'll need the OpenSSL extension enabled for it to work.

Tagged as: development, PHP, twitter No Comments
4Jan/120

Fonolo Consumer Service Rebranded As DeepDial.com

Tweet

I just wanted to post a quick note about the Fonolo Consumer service. After much internal discussion, we've decided to rebrand and simplify the consumer offering, and move the service to a new domain - deepdial.com.

spacer

We did this to reduce confusion with the Fonolo Enterprise product, which has become the primary focus of our company.

We're also streamlining and simplifying the service. At deepdial.com you'll be able to "Deep Dial" to hundreds of companies (bypassing their dreaded phone menus) as before. Now, you'll be able to do that without having to log in. We've heard from users that creating an account (and remembering another password) was a barrier to using the service, so we're removing that barrier!

Filed under: DeepDial, Fonolo, Telephony No Comments
23Dec/112

Net_DNS2 Version 1.2.0

Tweet

I've released a new version of the PEAR Net_DNS2 library- you can install it now through the command line PEAR installer:

pear install Net_DNS2

Or download it directly from the Google Code page here.

This release includes a significant speed-up with the local cache by using JSON to encode the cache data, rather then the PHP serialize function. Using JSON loses the class information of the objects, but the data remains the same, and the serialization time is about half.

A Query lookup against Google DNS- NO cache

time: 0.0340800285339

        Net_DNS2_RR_A Object
        (
            [address] => 199.59.148.82
            [name] => twitter.com
            [type] => A
            [class] => IN
            [ttl] => 28
            [rdlength] => 4
            [rdata] =>
        )

with cache + serialize

time: 0.00258994102478

        Net_DNS2_RR_A Object
        (
            [address] => 199.59.148.82
            [name] => twitter.com
            [type] => A
            [class] => IN
            [ttl] => 28
            [rdlength] => 4
            [rdata] =>
        )

with cache + json

time: 0.00178384780884

        stdClass Object
        (
            [address] => 199.59.148.82
            [name] => twitter.com
            [type] => A
            [class] => IN
            [ttl] => 28
            [rdlength] => 4
            [rdata] =>
        )

Version 1.2.0

This version changes the way some exceptions are thrown, and may break your code!

  • added numeric error codes to the Lookups class, and had each method that throws an exception throw a numeric error code along with the message.
  • dropped all references to InvalidArgumentException; we only use the Net_DNS2_Exception from now on.
  • added the CAA, URI, TALINK, CDS and TA resource records. Some of these are experimental, but are pretty straight forward.
  • fixed a bug in formatString(); my version was only putting double quotes around strings that have spaces, but apparently ALL strings should have double quotes around them. This is how BIND does it.
  • re-organized the Net_DNS2_Lookups initialization code; it no longer creates a global object of itself.
  • fixed a bug in the caching code; in some cases it wouldn't cache the same content more then once.
  • added an option to use JSON to serialize the cache data rather than using the PHP serialize function. JSON is much faster, but loses the class definition, and becomes a stdClass object.
  • fixed a handful of cases where I was using double quotes (") where a single quote (') would be fine.
Filed under: Development, Net_DNS2 2 Comments
19Dec/110

#onholdwith – Tracking Complaints About Waiting On Hold With Companies

Tweet

We're extremely excited to announce the launch of our public service site- #onholdwith.

spacer

#onholdwith tracks complaints about waiting on hold with companies, by looking through public twitter feeds and aggregating the results by company and by industry. This lets us generate some pretty interesting statistics, and keep a consistent history of complaints over time.

Why are we doing this?

We are providing this as a public service so that companies can see how much goodwill they are losing from their customers and so that people will feel like their complaints are not going unheard.

Want to be included in the stats?

Just use the #onholdwith hash tag and the company name when you complain about being on hold, and our system will automatically include your tweet in real-time!

Filed under: #onholdwith, Fonolo No Comments
6Nov/110

Fonolo – Put a New Face on Your Call Center

Tweet

Filed under: Fonolo No Comments
   Older Entries »
Fonolo on Facebook

companies

  • Fonolo
  • Mr.Host Web Hosting

projects

  • #onholdwith
  • Deep Dial
  • Mr.DNS
  • Net_DNS2
  • php-swift-tts
  • Speech::Swift
  • Speech::Swift::Simple

friends

  • Always Crashing In The Same Car
  • Bryan Dobson
  • Camilla d’Errico
  • Elya McCleave
  • Jason Pultz
  • Justin Hamade
  • Le Bird
  • Remember Love?
  • S a t u r n i n
  • Tim Lukian

technology

  • Asterisk VOIP News
  • Call the Cloud
  • eComm
  • FierceVoIP
  • Jason Goecke’s blog
  • ModMyi
  • TechCrunch
  • VoIP-info.org
  • VOX POPULI

extracts

  • January 2012 (2)
  • December 2011 (2)
  • November 2011 (1)
  • October 2011 (2)
  • September 2011 (3)
  • July 2011 (1)
  • June 2011 (4)
  • April 2011 (3)
  • March 2011 (1)
  • December 2010 (1)
  • November 2010 (2)
  • September 2010 (1)
  • August 2010 (1)
  • July 2010 (2)
  • May 2010 (4)
  • April 2010 (1)
  • February 2010 (2)
  • January 2010 (1)
  • September 2009 (1)
  • August 2009 (2)
  • July 2009 (1)
  • April 2009 (1)
  • March 2009 (1)
  • February 2009 (2)
  • January 2009 (3)
  • December 2008 (5)
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.