So a friend of mine (Srini Reddy of Promorphics) started talking to me about Big Data. I know that he and his company have specialized in Business Intelligence technologies and taking their analytics magic into big enterprise. I didn’t realize that they were starting to move down the Big Data path.
What to Learn – So much about big data!!!
He turned me onto some information about SAP HANA. I dove into some online training on SAP HANA and I have to admit that I was quite impressed with how easily I could apply big data principles in an OLTP database technology. It appears to blur the lines between OLTP/OLAP and Big Data. Pretty cool.
Before going too much deeper on SAP’s tool, I wanted to soften my gaze and understand the field of Big Data more broadly. So, HADOOP, HBASE, HIVE, PIG and the whole Apache open source suite came into view. Yes, I want to do it the hard way before I get too cozy with the soft, cushy easy way that I found so tantalizing in the SAP HANA space.
New Sandbox – Setting up a Linux playground
Well, as a long time Windows geek and a former Linux hack, I decided it was time to get back to my hardcore self. I needed to set up an environment to play in. I looked at Amazon Web Services (AWS) as an on-demand linux hosting option. I toyed with it a bit. In the end, I wanted to have the freedom to install my own Linux kernel and to do whatever I would need to in order to overcome hurdles and perform hacks as needed to more fully understand the inner workings of the Apache Big Data tech underneath.
Lacking a personal server farm in my home, I decided to set up an open-source virtual machine host on my laptop. I found and downloaded/installed the Windows 64-bit edition of Oracle VirtualBox. This was quite easy to use, and quite versatile. At first (starving to make fast progress), I downloaded a ready-made virtual machine image for VirtualBox that had Ubuntu Linux 64-bit OS and some other goodies (LAMP server and such). However, I immediately had issues with the drivers and kernel build. So, I decided to install a fresh build. I downloaded the free open-source Ubuntu server edition ISO image for 64 bit (verifying first that my laptop was indeed 64-bit).
Once VirtualBox was up and running, I was able to easily point-and-click to create a new virtual machine with 768MB RAM, 2 CPU cores and 24 MB video RAM (pretty light for an initial setup). When I started the VM, it immediately prompted me for the ISO image on my windows hard disk, and booted the VM in setup mode. I simply followed the prompts and poof I had a new Linux box running at a command prompt.
Honestly, I then spent about 3 weeks (a few hours at a time) re-familiarizing myself with the joys of Linux systems administration. (chown, sudo, apt-get, umask, vi, *.conf, /etc /var /tmp, oh boy!). I installed basic services (httpd, vsftpd, openssh, php, mysql) and configured my network settings so I could NAT and port-forward from my Windows OS applications to my Ubuntu Linux server running as a guest VM. I installed some tools on my windows box to make GUI client access to my Linux services easier.
- HeidiSQL is nice as a GUI client for mySQL
- Putty is nice for a terminal client (open SSH)
- NetBeans is nice for a developer IDE for PHP, Java, HTML and the like.
- Filezilla is nice for PASV mode FTP (which I have to use to run through the VM NAT port-forward).
So, now I had a personalized, familiar, tooled-up, ready-to-go, Ubuntu Linux sandbox on which to set up my Apache open source Big Data playground.
Next Blog Post – Installing Hadoop and HBase on my Ubuntu Linux box…