Course Syllabus
Its Big Data & Open Source Software Projects V594/37186
BUEX-V594 Section 37186: Big Data & Open Source Software Projects (Offered through School of Informatics & Computing) – Geoffrey Fox
This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~300 software subsystems illustrated at http://hpc-abds.org/kaleidoscope/. We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing - enhanced Apache Big Data Stack).
- The cloud computing architecture underlying ABDS and contrast of this with HPC.
- The software architecture with its different layers at http://hpc-abds.org/kaleidoscope/ covering broad functionality and rationale for each layer.
- Then we will go through selected software systems – about 5% of those in the Kaleidoscope which have been already deployed on FutureSystems (previously called FutureGrid) cloud using OpenStack and Chef recipes.
- Students will chose one other open source member of Kaleidoscope each and deploy as illustrated in class
- The main activity of the course will be building a significant project using multiple HPC-ABDS subsystems combined with user code and data.
-
Projects will be suggested or students can chose their own
- For more information, see: http://bigdataopensourceprojects.soic.indiana.edu/
Prerequisites
- Elementary knowledge in a scripting language needed (if not available this can be acquired as part of this course)
- Basic knowledge of Python desirable (if not available this can be acquired as part of this course)
- Ability to (learn to) use the LInux/Unix command shell (we will have a lesson on this)
- Basic understanding on how to install packages and programs on Linux (we will have a lesson on this)
Then as part of course you will get experience in
- DevOps: "software deployment automation"
- Linux command shell
- Elementary usage of ssh
- Use of Github to store software packages and documentation
- The reproducible installation of sophisticated platforms on virtual clusters. This is facilitated either by scripts developed in Python, Openstack Heat, or a DevOps framework such as Ansible, Chef, or Puppet. Which framework is chosen will depend on the experience level of the student.
- You will learn utility of the key parts of Big Data Stack
Course Summary:
Date | Details | Due |
---|---|---|