Monday, September 29, 2014

How to capture all tweets via PhP?

What's the method to get all tweets for a given query?

  • You need to login to http://dev.twitter.com to create an application and capture the below elements and store them in a file [app_tokens.php]
< ? php
           consumer_key = '';
           consumer_secret = '';
           user_token = '';
           user_secret = '';

?>
< ? php
require 'app_tokens.php';
require 'tmhOAuth-master/tmhOAuth.php';
$query = htmlspecialchars($_GET['query']);
if (empty($query)) {
    $query = "ModiInAmerica";
}
$connection = new tmhOAuth(array(
    'consumer_key' => $consumer_key,
    'consumer_secret' => $consumer_secret,
    'user_token' => $user_token,
    'user_secret' => $user_secret
));
// Get the timeline with the Twitter API
$http_code = $connection->request('GET',
    $connection->url('1.1/search/tweets'),
    array('q' => $query, 'count' => 100, 'lang' => 'en'));
// Request was successful
if ($http_code == 200) {
    // Extract the tweets from the API response
    $response = json_decode($connection->response['response'],true);
    $tweet_data = $response['statuses'];

    // Accumulate tweets from results
    $tweet_stream = '[';
    foreach ($tweet_data as $tweet) {
        // Add this tweet's text to the results
        $tweet_stream .= ' { "tweet": ' . json_encode($tweet['text']) . ' },';
    }
    $tweet_stream = substr($tweet_stream, 0, -1);
    $tweet_stream .= ']';
    // Send the tweets back to the Ajax request
    print $tweet_stream;
}
// Handle errors from API request
else {
    if ($http_code == 429) {
        print 'Error: Twitter API rate limit reached';
    }
    else {
        print 'Error: Twitter was not able to process that request';
    }
}
?>

  • Under browser run localhost/~username/search.php

Sunday, September 14, 2014

How fast is AWS RedShift?

Words buzzing BigData, BigData, BigData...how big is really your data and how soon you could load to get insightful information for your end users?
Working on a project where velocity is about 10~15 GB daily on a structured format [35million recs].

Before being exposed to RedShift@AWS, have been using cloud computing from Amazon and was quite happy to the bandwidth used for ETL purpose on performance. Cloud computing was achieved via S3 > EC2 > RDS stack.

Soon after getting exposed to Redshift@AWS, a sparkling fast way to load and further query on a column storaged based DB was just fascinating. It was just 120mins to load #20Days of data with the above mentioned velocity. That is without any parallel activities while loading. There were neither sortkeys provided, nor any indexes or constraints. This is with just one cluster and the basic mode of using this column store fashioned DB.

I am quite interested in bunch of mathematical based usage on querying such data. So could freely use set based operations {though it's slight different than Oracle, but mostly same}. Also could  make use of analytical queries and is really really fast.