storage, along with the difficulties and problems of consensus in enormous stores that use large quantities of processors,
memories, and disks. We also present distributed key-value stores and in-memory databases used for caching layers (e.g.
Memcached and Redis) used in data centers for performance. Next we present NewSQL and NoSQL Databases in the cloud. We
visit HBase, the scalable, low latency database that supports database operations in applications that use Hadoop. Then, we will
show how Spark SQL can program SQL queries on huge data and present Distributed Publish/Subscribe systems using Kafka, a
distributed log messaging system that is finding wide use in connecting Big Data and streaming applications together to form
complex systems.
Right after the midterm exam (which is proctored by the ProctorU service), we switch to higher end applications in the cloud. We
start by exploring how the Cloud opens up data analytics to huge volumes of data that are static or streamed at high velocity and
represent an enormous variety of information. Cloud applications and data analytics represent a disruptive change in the ways
that society is informed by, and uses information. We also introduce some common enterprise-level analytics applications that
are offered by major cloud providers. We then look at graph processing, graph databases and machine learning in the cloud. We
introduce the ideas of graph processing and present Pregel, Giraph, and Spark GraphX, as well as machine learning. Spark ML
and Mllib continue the theme of programmability and application construction. We also cover the Machine Learning lifecycle,
and how different cloud services contribute to it.
We then turn our attention to Fast Data systems, such as Apache Storm and Flux. We discuss real-time data streaming and
introduce Storm technology that is used widely in the industry. We continue with Spark Streaming, Lambda and Kappa
architectures, and a lesson on a complete streaming ecosystem. After that, we move on to the topics of virtualization and
containers, which is a fundamental technology behind many cloud-based services. We cover virtualization and containers with a
deeper focus, including lectures on Docker, Docker Compose, ECS, Kubernates and Infrastructure as Code. Finally, we wrap up
the course by talking about the future trends, and wraps up with an interview with an industry-expert cloud architect.
Course Goals and Objectives
Upon successful completion of this course, you will be able to:
• Understand what cloud computing is and why it is important.
• Get a picture of the economics of cloud computing.
• Describe Big Data and the challenges of working with it.
• Learn about many fundamental technologies that enable cloud computing, such as software defined architectures,
virtualization, and containers.
• Learn about many “glue” technologies that enable access to clouds, such as web middleware, JSON, REST API, RPC,
etc.
• Learn about the different levels of clouds services, which include IaaS (Infrastructure as a Service), PaaS (Platform as a
Service), SaaS (Software as a Service), MaaS (Metal as a Service), FaaS (Function as a Service (server-less
architecture)), MBaaS (Mobile Backend as a Service (server-less architecture)), and Amazon Lambda.
• Learn about many types of cloud-based storage services, including object storage, block-level storage, archival storage,
and Big Data file systems.
• Become familiar with the key concepts underlying Big Data and data streaming applications on the Cloud.
• Describe the concerns of storage, processing, parallelism, distribution, consensus, and scalability as they relate to the
Cloud.
• Understand key benefits and limitations of the various technologies available in the Cloud.
• Utilize the course content to select technologies you wish to use in your work or company.
Textbook
There are no required textbooks for this course.
Course Components
Lecture videos