tag:blogger.com,1999:blog-49033845439535992142024-02-06T18:23:14.775-08:00aModerateAnonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-4903384543953599214.post-26173924907029537302014-03-21T16:34:00.001-07:002014-03-21T16:34:54.692-07:00Updated Prediction app<br />
<span style="color: #cc0000; font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">Noise</span><br />
<br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;">Over the last few months I have been developing a new application. It's purpose is to visualize Premier League trends and predictions by leveraging the freakishly awesome D3.js libraries found at http://d3js.org/</span><span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif;">. I call the app <span style="color: #cc0000;">noise</span></span><span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif;">, as it is intended to both counteract the noise associated with sports punditry, while recognizing the inherent complexity of the tracking and measuring team performance and fan engagement. You can view the site <a href="http://noise-roc.appspot.com/" target="_blank">here</a>.</span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span><span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;">I have built two features, with a third feature in the works. The first is a simple social media tracker (in this case Twitter- I plan to add additional sites such as reddit in the near future). This provides a snapshot of the current volume of information being exchanged concerning a team. </span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;">Additionally I have trained a statistical model to predict the outcome of each match. The model is built by comparing the home teams home performance: average wins, goals, corners, and shots on target against the away team's away performance across the same metrics. Additionally I calculate both teams statistics as an expanding average over the season, and a rolling average for the last three games. </span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;">If any one is interested in the model used, or wants to add something to it the code, the project is located on <a href="https://github.com/amoderate/epl_predict" target="_blank">github</a>.</span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span>Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-26201801632402920352013-09-11T21:09:00.000-07:002013-12-09T22:29:22.636-08:00New EPL predictionsI spent the last week and a half reworking the model and building out the skeleton of a web app to better convey the results. I will expand on the web app in the near future to include a more holistic, unsupervised look at the match.<br />
<div>
<br /></div>
<div>
<a href="http://noise-roc.appspot.com/prototype">http://noise-roc.appspot.com/prototype</a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
For those interested - the tools in use here include Python for the data munging, Orange (a python ml library) for the modeling, and d3.js for the visual.</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-91764919987810837162013-09-02T07:35:00.002-07:002013-09-02T08:22:30.455-07:00Premier League Predictions<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEG4iP4kkyeVS-mGWfFgbMZpaT9ittJYYvUSe7nR2dWhObjG9WCvEOvyZYfvJA_Y0J8ADULjAu6m9lTzM85hvdBmNFoo6pAJ2XUm4Jj1l7IwCdGXUjhBj_vSGfE_OKDuivUEdXII_kv7ch/s1600/epl_2013_09_01_16.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEG4iP4kkyeVS-mGWfFgbMZpaT9ittJYYvUSe7nR2dWhObjG9WCvEOvyZYfvJA_Y0J8ADULjAu6m9lTzM85hvdBmNFoo6pAJ2XUm4Jj1l7IwCdGXUjhBj_vSGfE_OKDuivUEdXII_kv7ch/s640/epl_2013_09_01_16.png" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<span class="Apple-style-span" style="color: #444444;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span class="Apple-style-span" style="color: #444444;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span class="Apple-style-span" style="color: #444444; font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif;">This is my first attempt at using machine learning to predict EPL matches. There are significant improvements to be made - which I will gradually incorporate in future updates.</span></div>
<div class="separator" style="clear: both; text-align: left;">
<span class="Apple-style-span" style="color: #444444; font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: left;">
<span class="Apple-style-span" style="color: #444444; font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif;">A brief walkthrough of the visual - the model results describe the likelihood of a favorable outcome - </span><span class="Apple-style-span" style="color: #cc0000;"><span class="Apple-style-span" style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif;">z</span><span class="Apple-style-span" style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif;">ero representing a low probability of success</span></span><span class="Apple-style-span" style="color: #444444; font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif;">. The</span><span class="Apple-style-span" style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif;"><span class="Apple-style-span" style="color: #cc0000;"> <a href="http://www.bet365.com/en/" target="_blank">365 Odds</a></span><span class="Apple-style-span" style="color: #cc0000;"> </span><span class="Apple-style-span" style="color: #444444;">and the data for the model itself can be sourced from <a href="http://www.football-data.co.uk/englandm.php" target="_blank">this website</a>.</span></span></div>
Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-10849069278355141962013-09-01T09:23:00.000-07:002013-09-01T17:52:46.991-07:00Arsenal and Tottenham - Full<div class="separator" style="clear: both; text-align: center;">
</div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTJF9K0pJ_k410IYuKJPiY7QPiJkfEd36EeaFAdFcs9M9N9yFzvP46fzCy4SwpukFX49JGeIHp4BAfxYspnae_hBkuA1WjPM8RYuz5vgYg5Y37gH0UQbG6LhUtDFaTxSGSsK6azgH8Cvzm/s1600/arsenal_v_tot_2013_09_01_final.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTJF9K0pJ_k410IYuKJPiY7QPiJkfEd36EeaFAdFcs9M9N9yFzvP46fzCy4SwpukFX49JGeIHp4BAfxYspnae_hBkuA1WjPM8RYuz5vgYg5Y37gH0UQbG6LhUtDFaTxSGSsK6azgH8Cvzm/s640/arsenal_v_tot_2013_09_01_final.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><div style="text-align: left;">
<span class="Apple-style-span" style="color: #444444; font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: small;">The tweets are aggregated by positive and negative sentiment</span></div>
</td></tr>
</tbody></table>
<div style="text-align: left;">
<br /></div>
Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-2910539107990218672013-08-25T21:57:00.002-07:002013-08-25T22:04:46.142-07:00Twitter During the Arsenal Game on Saturday<div>
A visualization of twitter and the English Premier League:</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhANFC89G2WFUlUme0EKuUng5yXTa6aowkjIdMPrqZpwNfx4ngfT1tUzSULasmYFvvlBvNgScscM4iZbgz-1PXUZe_jIpnshyphenhyphencbYkoMXJ5A9UJ2JWc6TJGtfdcPvorzat7a0F2RcAvDlUkg/s1600/arsenal_line_viz_2013_08_26.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhANFC89G2WFUlUme0EKuUng5yXTa6aowkjIdMPrqZpwNfx4ngfT1tUzSULasmYFvvlBvNgScscM4iZbgz-1PXUZe_jIpnshyphenhyphencbYkoMXJ5A9UJ2JWc6TJGtfdcPvorzat7a0F2RcAvDlUkg/s640/arsenal_line_viz_2013_08_26.png" width="640" /></a></div>
<br />Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-68184851221906819672013-08-18T08:46:00.000-07:002013-08-18T08:46:00.069-07:00Extracting Sentiment<span style="font-size: large;">This is just a quick follow up to the twitter sentiment visualization. The following is a description of some of the technical challenges I faced. This is by no means a complete analysis -<span style="color: #cc0000;"><i> some might even consider it naive</i></span> - but the purpose here is not to build the worlds best twitter analyzer but rather to build a framework with which one can extract tweets, process them and begin to derive meaning. </span><br />
<span style="font-size: large;"><br /></span>
<span style="font-size: large;">The tools in use here include:</span><br />
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;"><a href="http://www.python.org/" target="_blank">Python</a>, Twitter API's, <a href="http://nltk.org/" target="_blank">NLTK</a>, a word sentiment corpus (I am using the one available via the <a href="https://www.coursera.org/" target="_blank">Coursera Data Science</a> course), and for the visualization I am using <a href="http://nodebox.net/download/" target="_blank">Nodebox</a></span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;">I began by extracting tweets - here I pretty much just followed the instructions on the Courersera Data Science course - for detailed steps on setting up the Oath2 protocol and the necessary dependencies on a Mac - check out this <a href="http://amoderate.blogspot.com/2013/05/coursera-data-science-101.html" target="_blank">earlier post</a>. </span></div>
<div>
<span style="font-size: large;"><br /></span></div>
<div>
<span style="font-size: large;">Once I had tweets, I had to normalize them. <span style="color: #cc0000;">Tweets are messy</span>-they feature an extravagant use of vowels, non standard English, and special characters:</span></div>
<div>
<br /></div>
<div>
<div>
<span style="font-size: large;"> <b> #Convert to lower case</b></span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"> tweet<span style="color: #cc0000;"> =</span> tweet.lower()</span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"><br /></span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"> <b>#Convert www.* or https?://* to URL</b></span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"> tweet<span style="color: #cc0000;"> =</span> re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',tweet)</span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"><br /></span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"> <b>#Convert @username to AT_USER</b></span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"> tweet <span style="color: #cc0000;">=</span> re.sub('@[^\s]+','AT_USER',tweet)</span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"><br /></span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"> <b> #Remove additional white spaces</b></span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"> tweet <span style="color: #cc0000;">=</span> re.sub('[\s]+', ' ', tweet)</span></div>
<span style="font-size: large;">
</span>
<br />
<div>
<span style="font-size: large;"><br /></span></div>
<span style="font-size: large;">
</span>
<div>
<span style="font-size: large;"> <b>#Replace #word with word</b></span></div>
<span style="font-size: large;">
<div>
tweet<span style="color: #cc0000;"> =</span> re.sub(r'#([^\s]+)', r'\1', tweet)</div>
<div>
<br /></div>
<div>
<b>#Trim</b></div>
<div>
tweet <span style="color: #cc0000;">=</span> tweet.strip('\'"')<br />
<br />
<b> #Check if the word starts with an alphabet</b><br />
val = re.search(r"^[a-zA-Z][a-zA-Z0-9]*$", word)<br />
<br />
<b> #Look for a patter of 2 or more letters and replace with the character itself</b><br />
pattern <span style="color: #cc0000;">=</span> re.compile(r"(.)\1{1,}", re.DOTALL)<br />
<br /></div>
<div>
<hr color="red" />
Because of I have no idea how to format code for a blog, I will refrain from pasting in code here, but instead just describe - in detail- the process.<br />
<br />
Once I have a "clean" tweet, I use the following steps to process it:<br />
<br />
<ol>
<li>First I remove all "stop words" so - words in ('is', 'are', 'the', ... ) basically any word that has no inherent emotional value is removed. While omitting stop words, I match the tweet against the word sentiment corpus mentioned earlier, and, based on the total sentiment value of the tweet I assign it a 'positive', 'negative', or 'neutral' value. <i><span style="color: #cc0000;"> This was my hacked up way of coming up with a training set, or examples with which to build a model and could use a lot of improvement. More on that in the future.</span></i></li>
<li>I take all tweets that have either a positive or negative sentiment and a geotag and append them to a list</li>
<li>Now it is time to use the NLTK tools to extract a feature list. For more on that - see their documentation <a href="http://nltk.googlecode.com/svn/trunk/doc/book/ch06.html" target="_blank">here</a>.</li>
<li>With features in hand, you can go ahead and train a classifier - a good example of this can be found <a href="http://nltk.org/_modules/nltk/classify/naivebayes.html" target="_blank">here</a>.</li>
</ol>
<div>
Once I have a satisfactory classifier, I store the model using pickle(), and start classifying new tweets. In the coming weeks I will upload the full code <span style="color: #cc0000;">(a hack job if there ever was one</span>) to github.</div>
<br />
<br /></div>
</span></div>
Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-86727136359016220122013-08-11T17:12:00.001-07:002013-08-11T17:12:12.368-07:00<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">An average day on twitter</span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3q-WQrDnqHRph9ZWOSz2Hp1xbXNJrNp4h4h72MIpWpFZM64QX3Ev6psKjenSrJZ8TFKNpzI_wCqLNPVgRuCdElaOrFulhwGZhmf7V938h_j8B9mRVkhoKYsvXLFsCJ0gJafH64QYVYxxH/s1600/average_tweet.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi3q-WQrDnqHRph9ZWOSz2Hp1xbXNJrNp4h4h72MIpWpFZM64QX3Ev6psKjenSrJZ8TFKNpzI_wCqLNPVgRuCdElaOrFulhwGZhmf7V938h_j8B9mRVkhoKYsvXLFsCJ0gJafH64QYVYxxH/s640/average_tweet.png" width="640" /></a><span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"></span></div>
<div class="separator" style="clear: both; text-align: center;">
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span></span></div>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">Twitter on July 24 2013</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKUlknOvqPDDM7Z0aRHxSBUrLQWSTJ-stZMnuytQrIP-wKGVyZCE1_vhznJYx7tj1CC8AfXsNcYso1BdW1wY_DHt81HZatyAht3eFOXwHQEDPVgrNaTX3dPIY5EEv4ukci1XwEe_hyCGID/s1600/filtered_tweet_small.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKUlknOvqPDDM7Z0aRHxSBUrLQWSTJ-stZMnuytQrIP-wKGVyZCE1_vhznJYx7tj1CC8AfXsNcYso1BdW1wY_DHt81HZatyAht3eFOXwHQEDPVgrNaTX3dPIY5EEv4ukci1XwEe_hyCGID/s640/filtered_tweet_small.png" width="640" /></a></div>
Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-74585356446029668982013-08-05T09:15:00.001-07:002013-08-05T09:15:55.594-07:00Twitter on the Royal Baby<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-_0B6mnM-DhI5EDwCscY2yfn5Mre1ttXaf5Zk98zYPc7tWUDBh8d6w5pidlierJEEa68EIvvXKwQ4VqbVswoME2bApzY6rw-O0xP4gzqRmHFu2C04UPfegx5Gm8iZgnc1S7hT_0KbmkNI/s1600/tweet_iter2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-_0B6mnM-DhI5EDwCscY2yfn5Mre1ttXaf5Zk98zYPc7tWUDBh8d6w5pidlierJEEa68EIvvXKwQ4VqbVswoME2bApzY6rw-O0xP4gzqRmHFu2C04UPfegx5Gm8iZgnc1S7hT_0KbmkNI/s400/tweet_iter2.png" width="397" /></a></div>
<br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;">I just completed a trial twitter sentiment analysis. The larger circle represents the positive tweets associated with <span style="color: #cc0000;">#Royalbaby</span>, the smaller circle represents negative sentiment. I used python to extract an hours worth of tweets and<span style="color: #cc0000;"> Nodebox</span> to construct he visual.</span>Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-8564006030983281652013-05-19T21:41:00.001-07:002013-05-19T21:41:55.459-07:00Thank you JJFinally a solid scifi concept that balances fun, action, and philosophy into a visually amazing package. The bonus? It is accessible to people who adhere to accepted hygiene practice's, and have plans for the weekend that don't involve the words "game" and "workshop". A solid movie- looking forward to seeing it once more.Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-87434575463235511482013-05-19T18:21:00.000-07:002013-05-19T18:21:25.489-07:00Coursera - Data Science 101I am participating in the data science course, freely available at coursera.com. I wanted to set up the environment on my mac, and bypass the virtual machine environment (I hate working on a virtual machine). Here are some of the extra steps needed to get the course working on a Mac, I am using snow leopard.<br />
<div>
<br /></div>
<div>
<ol>
<li>Download and install python 2.7 form <a href="http://www.python.org/download/releases/2.7.5/" target="_blank"><span style="color: #cc0000;">the python website</span></a></li>
<li>You will need to install <span style="color: #cc0000;">oath2-1.5.211</span> in order to access the twitter stream. Download <span style="color: red;"><a href="https://pypi.python.org/pypi/oauth2/" target="_blank">here</a></span></li>
<li>Install the new library by navigating to the directory of the file "setup.py" inside the oarth2 folder in the command line and typing:<span style="color: #cc0000;"> sudo python setup.py install </span>(enter your password when prompted)</li>
<ol>
<li>I received an error at this point complaining about not being able to locate the setuptools package. If you also see this error, use the following steps to rectify:</li>
<ol>
<li>Search for and download setuptools-0.6c11-py2.7.egg</li>
<li>In the command line run<span style="color: red;"> </span><span class="pln" style="border: 0px; color: #cc0000; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; font-size: 14px; line-height: 18px; margin: 0px; padding: 0px; vertical-align: baseline;">sudo sh setuptools</span><span class="pun" style="border: 0px; color: #cc0000; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; font-size: 14px; line-height: 18px; margin: 0px; padding: 0px; vertical-align: baseline;">-</span><span class="lit" style="border: 0px; color: #cc0000; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; font-size: 14px; line-height: 18px; margin: 0px; padding: 0px; vertical-align: baseline;">0.6c11</span><span class="pun" style="border: 0px; color: #cc0000; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; font-size: 14px; line-height: 18px; margin: 0px; padding: 0px; vertical-align: baseline;">-</span><span class="pln" style="border: 0px; color: #cc0000; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; font-size: 14px; line-height: 18px; margin: 0px; padding: 0px; vertical-align: baseline;">py2</span><span class="pun" style="border: 0px; color: #cc0000; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; font-size: 14px; line-height: 18px; margin: 0px; padding: 0px; vertical-align: baseline;">.</span><span class="lit" style="border: 0px; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, serif; font-size: 14px; line-height: 18px; margin: 0px; padding: 0px; vertical-align: baseline;"><span style="color: #cc0000;">7.egg </span>(password again)</span></li>
</ol>
</ol>
<li> Once the setuptools has been installed, try the installation of the oath2-1.5.211</li>
<li>Follow the rest of the directions as outlined on the course website.</li>
</ol>
<div>
Hope this helps anyone who had trouble! </div>
<div>
<br /></div>
<br /><ol><ol><ol>
</ol>
</ol>
</ol>
<br /></div>
Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-74719954911873339722013-01-13T19:05:00.000-08:002013-01-13T19:05:11.452-08:00/a week for work and ransac<div>
<br /></div>
<h3>
Too Much Work</h3>
<h4>
<i><span style="color: #cc0000;">My first week back at Big Blue in 2013</span></i></h4>
Last week was my first week back at big blue in 2013, and as one can imagine I was fairly busy playing catch up Mostly model production readiness tests, sprinkled with some meetings and emails. Production readiness in this case entailed pushing as much code to run "in database" as possible, and there was a lot of code. Go postgreSQL.....<div>
<i style="color: #cc0000;"><br /></i></div>
<div>
<i style="color: #cc0000;"><b>/ransac</b></i></div>
<div>
<i style="color: #cc0000;"><b><br /></b></i></div>
<div>
As a result of stupid hours put in at work, I contributed only minimal time to iMobi. The video tutorial (partially complete) being one, and an algorithm called<i><span style="color: #cc0000;"> ransac: </span></i>http://en.wikipedia.org/wiki/RANSAC that uses the Kinect's point cloud to identify planes, and hopefully where the floor is, being the other.</div>
<div>
<br /></div>
<div>
More on those two next week.</div>
Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-73759170968245313002013-01-02T13:43:00.001-08:002013-01-02T13:43:09.932-08:00Modeling American Football<h4>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">/the Problem</span></h4>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">Analysts in my company received a challenge to build a model that can predict wins and losses in the NFL. </span><br />
<h3>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">Gain Understanding</span></h3>
<h4>
<span style="color: #cc0000; font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><i>A crucial step</i></span></h4>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhULZd0NulJbJV78MtAVJoQBru41cuYdduB5n8F_w4sJpYcgGT8UCqaNclEYMH1QKNNYdMPOgiaDB5Ry8xHqjLhimdHsoZqRPWM_bh1oQN4uukXoEG5uYgJzElKQLTxTSIlrBSHcfLhaGYv/s1600/NNSPSS.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="173" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhULZd0NulJbJV78MtAVJoQBru41cuYdduB5n8F_w4sJpYcgGT8UCqaNclEYMH1QKNNYdMPOgiaDB5Ry8xHqjLhimdHsoZqRPWM_bh1oQN4uukXoEG5uYgJzElKQLTxTSIlrBSHcfLhaGYv/s320/NNSPSS.png" width="320" /></a></div>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">Knowing absolutely nothing about the sport, I decided to try my hand at the problem;<span style="color: #cc0000;"><i> how hard could it be?</i></span> My first task was to collect enough data for training, test and validation sets. I started by extracting outcomes for the 2008 - 2011 seasons from the website: Pro-Football-Reference.com. Using the same website I also gathered basic statistics for each team. The variables created within the SRS (Simple Rating System) which calculate team offensive and defensive strength relative to average NFL team performance became the first data points. Before any modeling could take place, I needed to understand the game more. I spent a couple hours reading blogs on what statistics are the best representation of a teams likely performance, and a couple more hours just reading about various aspects of the game.</span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span>
<br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">In an ideal world my model would include individual player level data, but the scope of this type of collection exercise quickly exceeded my available bandwidth. Instead I decided only consider aggregate team statistics. </span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">The term "</span><span style="font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;">SRS" seemed to pop up a lot so I wanted to start there. I organized each training example into a Y vector that contained the outcome of each game. Next I transposed the data I had gathered into a matrix where each column represented a feature: SRS, SoS, OSRS, DSRS etc. Each row in the matrix contained the aforementioned statistics for each team in a game. So one row might look like T1_SRS, T2_SRS, T1_Home, T2_Home etc.</span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span>
<h3>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">Exploration</span></h3>
<h4>
<span style="color: #cc0000; font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><i>Follow the white rabbit</i></span></h4>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKFnWxuG8ewKBg5zSHceHklEQcpxPQ-Dl9z7gHy7BNocHYa-1-kn_DIdT28hx3bgzp-PgQv1-GXMmv9tnryru5xYMmdMBk5mq-yJ_JNDZU8LrnTLeb78bHiPu5Nz0P0_uNTgJzMt0T1aSb/s1600/NNImport.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiKFnWxuG8ewKBg5zSHceHklEQcpxPQ-Dl9z7gHy7BNocHYa-1-kn_DIdT28hx3bgzp-PgQv1-GXMmv9tnryru5xYMmdMBk5mq-yJ_JNDZU8LrnTLeb78bHiPu5Nz0P0_uNTgJzMt0T1aSb/s1600/NNImport.png" /></a></div>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><div>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span></div>
<div>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span></div>
<div>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span></div>
<div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdOp6ETGtO0mCL15SnJu2lyHIBp73ZaPgNygJNNZqEPreY5AQ9_ezMnCsYWNzkLmlCbw-sHzIli6y5pYFIAs-dZQ_YtB1dHtWnR9orOMlb4uW3-8xDnvsVfxZIydFcu7qbPhT1t9GW5FzY/s1600/NNResults.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="262" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdOp6ETGtO0mCL15SnJu2lyHIBp73ZaPgNygJNNZqEPreY5AQ9_ezMnCsYWNzkLmlCbw-sHzIli6y5pYFIAs-dZQ_YtB1dHtWnR9orOMlb4uW3-8xDnvsVfxZIydFcu7qbPhT1t9GW5FzY/s320/NNResults.png" width="320" /></a></div>
I tried several variations of two machine learning algorithms: first I constructed a decision tree that uses the Information Entropy to prune. Next I tried a neural network. I have coded a custom neural network that uses backpropogation to learn the weights, but for exploration I just threw it into SPSS. I like SPSS because it is user friendly and I can get results quickly, but I also found myself severely limited by the software. For example, I was unable to add more then two layers, and I did not see a way to play around with the bias term. That being said the neural network outperformed the decision tree algorithm so I decided to to work to refine the model using neural networks.</span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3DR4NzjO2A-__rBnUUaZ5iNC1RXyKw3Rvg1tILxM1piDSJhw6Ds-CCEoYksy475ew6yMYJOaYlpU1rWBLqPPGYfZ5yZ6WBIyUX1c-dIxG220GED1jjtLJHa9EdgsOjwicttZiei2hm8su/s1600/NN.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="286" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3DR4NzjO2A-__rBnUUaZ5iNC1RXyKw3Rvg1tILxM1piDSJhw6Ds-CCEoYksy475ew6yMYJOaYlpU1rWBLqPPGYfZ5yZ6WBIyUX1c-dIxG220GED1jjtLJHa9EdgsOjwicttZiei2hm8su/s320/NN.png" width="320" /></a><span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">While I was able to obtain a 93% "accuracy" on the Training and Test sets, I only saw a slight improvement over a naive model with the holdout sample. <i><span style="color: #cc0000;">This was proving to be more difficult then I first imagined</span></i>. I quick look at the learning curves revealed the problem. I was severely over fitting the training data, this is a problem often caused by "variance" or "noise" in the target and sometimes the quickest way to improve the model is to gather more data...so back to google.</span><br />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">First I extracted three additional years of data from the initial website, next I pulled in the vegas odds on each of the games hopping to make use of the professional betting establishment's sentiment. Finally I dug deeper into the football blogosphere and came across the website: http://www.advancednflstats.com/. Which includes both team efficiency ratings and predictions in an easy to extract format. The inclusion of this new data gave my models a significant boost, and I am now correctly identifying 100% of the cases in the training and test sets and percentage on the validation set good enough to go to Vegas with. </span><br />
<h3>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">Next steps</span></h3>
<h4>
<span style="color: #cc0000; font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;"><i>Refinement</i></span></h4>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">There is likely a lot of co-linearity in the data, I want to reconstruct the model using my neural network in Octave so I can have more flexibility with the architecture and try non-supervised ML techniques to address the collinearity. Hopefully these steps will improve the model performance. </span><br />
<br />Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-6594252455159968852012-12-23T17:17:00.001-08:002012-12-23T17:21:39.117-08:00a new sensor<h2>
<span style="font-size: x-large; font-weight: normal;">/theKinect</span></h2>
<h3>
<span style="color: #cc0000; font-size: large;"><span style="font-weight: normal;"><i>First Impressions</i></span></span></h3>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMt829vtOuz9eJqR7AB5D5ryIeT3OjIDsFSGEZ_rsWa7Br1oXBJ7YUyjJZaqchCC_6BEhthlIKja8FgP5yp9af_ibrhj5BTJJWRSBsbQ95PaaemW8vYMWJqiIIxuNig8tRU4JMTjHeb7yK/s1600/IMG_20121223_154421.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMt829vtOuz9eJqR7AB5D5ryIeT3OjIDsFSGEZ_rsWa7Br1oXBJ7YUyjJZaqchCC_6BEhthlIKja8FgP5yp9af_ibrhj5BTJJWRSBsbQ95PaaemW8vYMWJqiIIxuNig8tRU4JMTjHeb7yK/s640/IMG_20121223_154421.jpg" width="640" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMt829vtOuz9eJqR7AB5D5ryIeT3OjIDsFSGEZ_rsWa7Br1oXBJ7YUyjJZaqchCC_6BEhthlIKja8FgP5yp9af_ibrhj5BTJJWRSBsbQ95PaaemW8vYMWJqiIIxuNig8tRU4JMTjHeb7yK/s1600/IMG_20121223_154421.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><span style="font-size: large;"></span></a><br />
<span style="font-size: large;">I just received the <i>Microsoft Kinect </i>for Developers from Amazon. I intend to use it as the primary sensor for my robot; It has a lot of interesting features including an IR sensor, depth sensors, audio, camera, and a fantastic set of API's. For the cost -200.00 U.S. dollars- it seems like the easiest place to start. After the unboxing I downloaded the </span><span style="font-size: large;">Kinect SDK and Developers Toolkit</span><span style="color: #cc0000;"><span style="font-size: large;"> </span></span><br />
<span style="color: #cc0000;">-http://www.microsoft.com/en-us/kinectforwindows/develop/developer-downloads.aspx-</span><br />
<span style="font-size: large;">and spent a couple days playing with the code. The two main development languages are C# and C++ and are numerous examples using both - I chose C# to get started. </span><br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijbXBgcKAbTlBiIc4VvG7QIPLPiuzrCOGWvP2Ea6gDcqp404T4wZf3fUkB0hgLGxprMQyGbSVQhmEpGuUV5GD_A3GL2A9E_fEY5OxQN79G4Xlg1vMN7oHTWtFtT4saJA0Npv2EXCtp479G/s1600/IMG_20121223_154442.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijbXBgcKAbTlBiIc4VvG7QIPLPiuzrCOGWvP2Ea6gDcqp404T4wZf3fUkB0hgLGxprMQyGbSVQhmEpGuUV5GD_A3GL2A9E_fEY5OxQN79G4Xlg1vMN7oHTWtFtT4saJA0Npv2EXCtp479G/s400/IMG_20121223_154442.jpg" width="400" /></a><span style="font-size: large;"><br /></span><br />
<span style="font-size: large;">A side note, I have never coded in C# before, but I was able to get up to speed with the help of some tutorials posted by Microsoft </span><br />
<span style="color: #cc0000;">-http://channel9.msdn.com/Series/C-Sharp-Fundamentals-Development-for-Absolute-Beginners. </span><br />
<span style="font-size: large;">I spent a couple hours and was able to figure the rest out from there.</span><br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<span style="font-size: large;">My first step was to just turn the sensors on and load the video into a window; next I played around with the <i><span style="color: #cc0000;">face tracking API's</span></i>. In all it was fairly easy to get things up and running, and to start doing some really cool thins with the data streams the Kinect has on offer. Latter on I will post some videos with actual code.</span><br />
<a name='more'></a><object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://i.ytimg.com/vi/4uiJcP_bff0/0.jpg" height="266" style="clear: left; float: left;" width="320"><param name="movie" value="http://www.youtube.com/v/4uiJcP_bff0?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" /><param name="bgcolor" value="#FFFFFF" /><param name="allowFullScreen" value="true" /><embed width="320" height="266" src="http://www.youtube.com/v/4uiJcP_bff0?version=3&f=user_uploads&c=google-webdrive-0&app=youtube_gdata" type="application/x-shockwave-flash" allowfullscreen="true"></embed></object><br />
<br />Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0tag:blogger.com,1999:blog-4903384543953599214.post-71464617595050812992012-12-22T11:08:00.000-08:002012-12-22T11:08:09.316-08:00iMobi<h3>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">Winter [vacation] is coming </span></h3>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">iMobi- This idea has been forming over the past two years. I took the <i><span style="color: #cc0000;"><a class="g-profile" href="http://plus.google.com/111950594039269281469" target="_blank">+Coursera</a> <a class="g-profile" href="http://plus.google.com/117950991762523582075" target="_blank">+Machine Learning</a></span></i> course by Andrew Ng and latter two <a class="g-profile" href="http://plus.google.com/116286004036789369492" target="_blank">+Udacity</a> courses (AI for robotics, and CS101).</span><br />
<div>
<h3>
<hr />
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">Concept</span></h3>
<h4>
<span style="color: #cc0000; font-family: 'Helvetica Neue', Arial, Helvetica, sans-serif; font-size: large;"><i>A to B to C</i> </span></h4>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUVC9F_szB9PJv0F1_1dH8lprsTYAviqy9LJR1qbFtvfs-vCpPwpn9MY0UTBPUkKabY1q28e5dfkSh8ppfRXAHs_kY-puOSyqGqxsOWagk4KWptE182Nauc0TEDOxlpDnGxNG-NULl13_r/s1600/iMOBI.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><span style="font-size: large;"><img border="0" height="171" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUVC9F_szB9PJv0F1_1dH8lprsTYAviqy9LJR1qbFtvfs-vCpPwpn9MY0UTBPUkKabY1q28e5dfkSh8ppfRXAHs_kY-puOSyqGqxsOWagk4KWptE182Nauc0TEDOxlpDnGxNG-NULl13_r/s320/iMOBI.png" width="320" /></span></a></div>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">A robot that can sense it's surroundings, build models based on those sensor inputs and <span style="color: #cc0000;"><i>make decisions based on those models</i>.</span> I want to drop it in a room, or outside in a desert or another planet.....just kidding, but that would be cool... and it should start learning about it's environment. I have no idea if I can ever finish this alone, but I hope my rather simplistic outline will provide a structure for the systems I will need to develop. </span><br />
<span style="font-size: large;"><span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;">Here is what I have in mind: Picture a human brain and it's perceptions of the world. It is able to perform relatively simple tasks such as recognizing a handwritten "t" to fairly complex tasks like understanding irony, or making an apple pie. One might be tempted to think the brain's circuitry operates on a scale where less complex tasks are a assigned to less complex circuitry while the more complex tasks are left to complicated computational unites. As it turns out this is not the case; each computational unit in the brain is no more or less complicated than the other- the brain makes sense of the world by layering simple computations. The computational unit that recognizes irony relies on input from dozens of other units that have already done their job. something had to process the visual images, something had to recognize a spoken word, something had to recognize grammar etc. Each unit performs it's task and eventually we understand irony.</span></span><br />
<span style="font-size: large;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif; font-size: large;">I will borrow an idea from <a class="g-profile" href="http://plus.google.com/101992784223411115531" target="_blank">+Ray Kurzweil</a>'s book, <i><span style="color: #cc0000;">How to Create a Mind</span>,<span style="color: #cc0000;"> </span></i>and call these computational units <i>recognizes. </i>In a similar way, perhaps the only path from input to model to decision is to take a page out of natures book and construct separate model's or "recognizes" for different tasks<i>. </i></span><br />
<span style="font-size: large;"><span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><i><br /></i></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><i><br /></i></span></span></div>
<div>
<span style="font-size: large;"><br /></span>
<span style="font-family: Helvetica Neue, Arial, Helvetica, sans-serif;"><br /></span></div>
Anonymoushttp://www.blogger.com/profile/16711436771614036591noreply@blogger.com0