Its Big Data & Open Source Software Projects V594/37186
BUEX-V594 Section 37186: Big Data & Open Source Software Projects (Offered through School of Informatics & Computing) – Geoffrey Fox
This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~300 software subsystems illustrated at http://hpc-abds.org/kaleidoscope/. We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing - enhanced Apache Big Data Stack).
- The cloud computing architecture underlying ABDS and contrast of this with HPC.
- The software architecture with its different layers at http://hpc-abds.org/kaleidoscope/ covering broad functionality and rationale for each layer.
- Then we will go through selected software systems – about 5% of those in the Kaleidoscope which have been already deployed on FutureSystems (previously called FutureGrid) cloud using OpenStack and Chef recipes.
- Students will chose one other open source member of Kaleidoscope each and deploy as illustrated in class
- The main activity of the course will be building a significant project using multiple HPC-ABDS subsystems combined with user code and data.
Projects will be suggested or students can chose their own
- For more information, see: http://bigdataopensourceprojects.soic.indiana.edu/
- Elementary knowledge in a scripting language needed (if not available this can be acquired as part of this course)
- Basic knowledge of Python desirable (if not available this can be acquired as part of this course)
- Ability to (learn to) use the LInux/Unix command shell (we will have a lesson on this)
- Basic understanding on how to install packages and programs on Linux (we will have a lesson on this)
- DevOps: "software deployment automation"
- Linux command shell
- Elementary usage of ssh
- Use of Github to store software packages and documentation
- The reproducible installation of sophisticated platforms on virtual clusters. This is facilitated either by scripts developed in Python, Openstack Heat, or a DevOps framework such as Ansible, Chef, or Puppet. Which framework is chosen will depend on the experience level of the student.
- You will learn utility of the key parts of Big Data Stack
The syllabus page shows a table-oriented view of the course schedule, and the basics of course grading. You can add any other comments, notes, or thoughts you have about the course structure, course policies or anything else.
To add some comments, click the "Edit" link at the top.