Course Syllabus

Its Big Data & Open Source Software Projects V594/37186

 

BUEX-V594 Section 37186:  Big Data & Open Source Software Projects (Offered through School of Informatics & Computing) – Geoffrey Fox

This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~300 software subsystems illustrated at http://hpc-abds.org/kaleidoscope/. We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing - enhanced Apache Big Data Stack).

  • The cloud computing architecture underlying ABDS and contrast of this with HPC.
  • The software architecture with its different layers at http://hpc-abds.org/kaleidoscope/ covering broad functionality and rationale for each layer.
  • Then we will go through selected software systems – about 5% of those in the Kaleidoscope which have been already deployed on FutureSystems (previously called FutureGrid) cloud using OpenStack and Chef recipes.
  • Students will chose one other open source member of Kaleidoscope each and deploy as illustrated in class
  • The main activity of the course will be building a significant project using multiple HPC-ABDS subsystems combined with user code and data.
  • Projects will be suggested or students can chose their own
  • For more information, see: http://bigdataopensourceprojects.soic.indiana.edu/
 
Prerequisites
  • Elementary knowledge in a scripting language needed (if not available this can be acquired as part of this course)
  • Basic knowledge of Python desirable (if not available this can be acquired as part of this course)
  • Ability to (learn to) use the LInux/Unix command shell (we will have a lesson on this)
  • Basic understanding on how to install packages and programs on Linux (we will have a lesson on this)
 
Then as part of course you will get experience in 
 
  • DevOps: "software deployment automation"
  • Linux command shell
  • Elementary usage of ssh
  • Use of Github to store software packages and documentation
  • The reproducible installation of sophisticated platforms on virtual clusters. This is facilitated either by scripts developed in Python, Openstack Heat, or a DevOps framework such as Ansible, Chef, or Puppet. Which framework is chosen will depend on the experience level of the student.

  • You will learn utility of the key parts of Big Data Stack

Course Summary:

Date Details Due