Lesson learned. Don’t say what your next topic is going to be in a post. I’ve skipped around so much in the last few weeks of research and at the moment, I’m not interested in the installation of HBase/Hadoop.
So, what am I interested in at the moment?
- NewSQL vs. NoSQL.
- AWS S3 Cloud Storage
- Sample big data for download (here, and here)
- PHP CodeIgniter framework
Just to name a few…
Dev Environment Setup
So, to catch up… I purchased a commodity used desktop PC for $70 on eBay (an IBM Lenovo MT-M, CoreDuo, 2GB RAM, 80GB hard drive) and installed Ubuntu Linux 12.04.2 LTS on it. Installation was easy… mostly just follow the prompts. I’ve been re-learning Linux command-line, and leveraging those VI editor skills. I’ve set up vsftpd (ftp server), x11vnc (for remote access), lamp server (apache httpd web server, mysql and php), and phpmyadmin (for web based mysql administration).
Why did I abandon the Oracle VirtualBox environment that I previously set up for my Linux sandbox? Primarily because I had set up multiple sandboxes on different laptops that I use, and I wanted to have one central, always-on, server to consolidate my work. My multiple-sandboxes situation was getting more challenging to manage and keep the configurations in sync and to share code and databases between virtual boxes. One central Linux server (built out of a cheap used PC) was a worthwhile $70 investment.
I then set up port-forwarding on my home router to allow me to connect to certain ports (ssh, http, https, ftp, vnc and the like). I’d love to figure out how to set up a home VPN, to eliminate the port forwarding rules and the Passive FTP requirements. But, this is a project for another day. For now, I have my hello-world Linux development environment set up and accessible to me wherever I go.
HBase and Hadoop frustration
I spent a bit of time following online tutorials on setting up HBase and Hadoop. Perhaps I wasn’t patient enough. I wanted it to be as easy as mySQL to setup (sudo apt-get mysql). LOL. Didn’t happen. There’s a boat-load of configuration to do at the OS level and the app level to get it to work. Now, I’m no stranger to working through error messages, config edits, stackoverflow.com searches, and general block-breaking to make something work. HBase just irked me enough that I decided to take a break from it for a while. Don’t get me wrong, I will go back to it and push through. I just needed to get some space from it before I started saying things about it that might hurt its feelings. I figure I also need to read up on it more before I just hack my way blindly through online installation tutorials. So, that’s a topic for another day…
PHP and CodeIgniter
My goal with PHP is to get back to the point where I can productively and efficiently whip up scripts to scrape the web for monstrous amounts of big data to stuff into my own little hidey hole for experimentation. Of course, I expect to build some web code and traditional SQL front-end code to get by.
I didn’t want to invest heavily in this area (time wise), so I researched open source frameworks that I could use to get up and running fast. Symfony and Zend look awesome (and prolific), but the entry price (time) looked higher than I wanted to spend. So, I went with CodeIgniter; which has its limitations and imperfections, but for a soon-to-be ex-PHP-noob, it was perfect.
Within an hour I had some MVC hello-world web site working. Within another hour, I had it talking to a mySQL database.
The biggest challenge I had was getting the URL rewriting working so I could eliminate the “index.php” segment from my URI’s. I kept following online advise to modify my .htaccess file with various rules. Nothing worked. I finally found a hint about setting these rules in my /etc/apache2/sites-available/default config file. I was able to set up an Alias for my /var/www/ci folder and setup the Rewrite conditions. I also had to run a2enmod to enable the “rewrite” module in apache. This probably burned through about 3-4 hours in various hack sessions before I finally figured it out and got it working.
So, now I’m practicing my skills at fundamentals (like building a basic web-based databases administration site that uses reflection to discover data structures and generates basic editor screens for the database). This is my equivalent to what typing students do when they practice typing the phrase (The quick brown fox jumps over the lazy dog.) or (Now is the time for all good men to come to the aid of their country.). It gets me familiar with the database manipulation framework calls, the MVC model of the CodeIgniter framework, the basic language constructs (conditionals, loops, variable scoping, function variables, object orientation), and lets me dive into some of the inner bowels of the language (reflection, dynamic code generation/execution, and security). I’m sure this will take me a few weeks, as I’ve only been able to make time for “geek out” sessions (as my wife calls them) about 4 hours a week.
A couple of friends of mine suggested that I take a look at a few technologies they’ve been looking at for Big Data. They tend to fall into the NewSQL category. I’m most excited about NuoDB, as it seems to be one of the most innovative and disruptive of the NewSQL databases out there. Highly elastic (cloud scale), compatible with Amazon Web Services, free to use for personal use and for developers. However, it has a pricetag (unknown to me as of yet) for commercial/enterprise use. I love the concept of intelligent, self-replicating, self-preserving atoms that they use as a fundamental architectural design concept.
VoltDB seems amazing in its own right, and it makes me think about how enamored I was just a few weeks ago about SAP Hana. VoltDB may very well be more impressive than HANA, but now that I’ve peeked at some of these NewSQL technologies (like Clustrix and NuoDB), I find VoltDB to be a mere “major step forward” and not a quantum leap forward. Funny how unimpressed I now am with technologies that are merely cloud scalable and use in-memory columnar data for fast big data access. There’s so much more to Big Data than just columnar data and multi-node, replicated storage.
I was going to say that next time I will talk about NewSQL technologies in more depth, but I’ve learned my lesson. I have no idea what I’ll talk about next. So, until next time, do whatever you were going to do…