Course Synopsis

CDS501/4 – Principles & Practices of Data Science & Analysis

This course introduces the basic goals and techniques in data science and analytics process with some theoretical foundations which include useful statistical and machine learning concepts so that the process can transform hypotheses and data into actionable predictions. The course provides basic principles on important steps of the process which include data collecting, curating, analysing, building predictive models and reporting and presenting results to audiences of all levels. R programming language and statistical analysis techniques are introduced based on examples such as from marketing, business intelligence and decision support.

At the end of this course, the students will be able to:

  • Organize effectively all the necessary steps in any data science and analytics real-world project.
  • Adapt the R programming language and useful statistical and machine learning techniques in data science and analytics projects.
  • Practice all the skills needed by the data scientist, which include acquiring the data, managing the data, choosing the modelling technique, writing the code, and verifying and presenting the results.

CDS502/4 – Big Data Storage and Management

Storing and managing big data addresses different issues compared to conventional databases. Big data involves huge amount of data (volume), supports heterogeneous data format (variety) and can be accessed at high speed (velocity). The course includes fundamental on big data storage and management related issues. Understanding of various storage infrastructures includes understanding of technologies ranging from traditional storage to cloud-based storage. The course provides exposure on recent technologies in manipulating, storing and analyzing big data. The technologies include but not limited to Hadoop, MongoDB and Apache Cassandra.

At the end of this course, the students will be able to:

  • Compare the various data storage infrastructures, advanced concepts and technologies
  • Build a database to support big data using related big data storage system.
  • Identify and master the rules of modern and traditional in storing and managing large data.

CDS503/4 – Machine Learning

Upon successful completion of the course, students will have a broad understanding of machine learning algorithms. Students will be acquiring skills of applying relevant machine learning techniques to address real-world problems. Students will be able to adapt or combine some of the key elements of existing machine learning algorithms. Topics which will be covered in this course include supervised and unsupervised learning techniques, parametric and non-parametric methods, Bayesian learning, kernel machines, and decision trees. The course will also discuss recent applications of machine learning. Students are expected to obtain hands-on experience during labs and assignments to address practical challenges. An understanding of the current state-of-the-art in machine learning is done via a review of key research papers allowing students to further research in machine learning.

At the end of this course, the students will be able to:

  • To apply relevant machine learning algorithms for typical real-world problems.
  • Manipulate machine learning algorithms which can be adapted to more complex scenarios.
  • Synthesize findings and recommendations.


CDS504/4 – Enabling Technologies & Infrastructures for Big Data

Data science is advancing the inductive conduct of science and is driven by big data available on the Internet. This course will explain the technologies and techniques to improve the access, security, and performance of big data processing and storage systems. This course will help students:

  • Acquire the necessary skills as an analyst for big data system.
  • Identify the security aspects of the data and determine the appropriate measures to protect it.
  • Have an exposure and training in designing basic infrastructure for the application of big data with sensitive nature of the low-power edge devices.

This course includes parallel and distributed processing, grid and cloud computing, big data tools, big data processing techniques, network infrastructure and architecture, network performance and security for big data.

At the end of this course, the students will be able to:

  • Distinguish major concepts of data science which are high-performance parallel and distributed computing; computing with emerging technologies, and network performance.
  • Identify the needs and issues for big data security to protect sensitive data and suitable access controls.
  • Design a cloud platform and efficient techniques that can support end-users running latency-sensitive big data applications on low-powered edge devices.

CDS505/4 - Data Visualisation & Visual Analytics       

This course discusses the use of computer-supported, interactive and visual representations of data in order to amplify cognition, help people reason effectively about information, find patterns and meaning in the data, and easily explore the datasets from different perspectives in particular in data-intensive environment. The course covers techniques from two branches of visual representation of data, namely data visualization and visual analytics. In data visualization, the course covers scientific visualisation techniques (representations of empirically-gathered scientific datasets) such as contours, isosurface, and volume rendering as well as specifics techniques in information visualisation (representations of abstract datasets) which include tables, networks and trees, and mapcolour. In visual analytics, a visualization process features a significant amount of computational analysis and human-computer interaction. So, the topics covered in this part of the course include view manipulation, multiple views, reduction in items and attributes, and focus + context as well as analysis case studies involving a visualization system or tool.

At the end of this course, the students will be able to:

  • Select the right visualization techniques for any given problems or applications.
  • Adapt visualization techniques for particular application.
  • Apply several techniques either by designing or developing specific visualization techniques or using existing tools.

CDS506/4 - Research, Consultancy and Professional Skills

The course provides knowledge and effective skills that are required in research, consultancy and professional practice. For the research section, it will cover literature review, development of research questions, usage of theories, research design, data collection as well as related analysis techniques. For the consultancy skills, students will be equipped with the mindset tools and skills to provide effective consulting advice to clients. In the final section, professional issues, and different aspects such as ethical, legal and social in conducting research and consultancy will also be discussed.

At the end of this course, the students will be able to:

  • Combine theory and consultation techniques to effectively meet clients' needs
  • Adapt a structured and effective research method in data science and analytics research.
  • Correlate professional issues inherent in research methods and consultancy.

CDS511/4 - Consumer Behavioural and Social Media Analytics

This course provides a broad and interdisciplinary research and practise focusing on two areas: behaviour and web & social media analytics. Specifically, behaviour analytics concerns the process of systematically converting multimodal human behavioural cues (facial, speech, textual etc.) to machine readable form, in order to automatically model the human behaviour. The focus is on humans as consumers. This involves human-computer interaction (HCI), user behaviour modelling, computational models of emotions, and emotion sensing and recognition. Web and social media analytics concerns the strategies to leverage powerful social media data concerning customer needs, behaviour and preferences. Students will learn the strategies to derive insights from the above mentioned data that are crucial for business decisions. Students will be encouraged to explore statistical, machine learning and analytical tools such as SPSS, R, WEKA, Google Analytics, TrueSocial Metrics and Clicky for analysis.

It is worth to note that an understanding of the current state-of-art in consumer behavioural and social media analytics is done via a review of key research papers, and book chapters allowing students to further research in this area if needed.

At the end of this course, the students will be able to:

  • Distinguish the suitable metrics for assessing multimodal human behavioural cues in a consumer perspective.
  • Identify human behavioural cues across a variety of contexts with state-of-the-art tools to facilitate better interaction and decision making.
  • Construct predictive models (by extracting, analyzing and deriving insights) from the related web and social media data for data-informed decision-making within a business perspective.

CDS512/4 - Business Intelligence & Decision Analytics

The course will focus on the knowledge and skills to select, apply and evaluate business intelligence and decision analytics techniques which discover knowledge that can add value to a company. The course will also discuss innovative applications and exploitation of the current techniques and approaches related to business intelligences and performance measurement, and mathematical model to facilitate decision-making process in business and operations.

At the end of this course, the students will be able to:

  • Elaborate concepts, technologies and theories related to business intelligences and decision analytics.
  • Integrate the use of different types of business intelligence models and tools, and decision analytics models to various real-life problems.
  • Propose improvement strategies for enhancing business performance by applying business intelligence and decision analytics techniques.

CDS513/4 - Predictive Business Analytics

The course provides the theory behind predictive analytics, and methods, principles and techniques for conducting predictive business analytics projects. The course introduces the underlying algorithms as well as the principles and best practices that govern the art of predictive analytics that translate big data into meaningful, usable business information. The course also explores the tips and tricks that are essential for successful predictive modelling in areas such as business performance, pharmaceutical industry, finance, accounting, and organization management. The course takes technology approach to address a big data analytic challenge by applying the concepts taught in the course in the context of predictive analytics project lifecycle. Students will be exposed to a predictive business analytics tool.

At the end of this course, the students will be able to:

  • Apply appropriate predictive business analytics techniques and tools to effectively interpret big data.
  • Revise and adapt insights that can lead to actionable results and pragmatic business solutions.
  • Construct a business challenge as a predictive business analytics challenge.

CDS521/4 - Multimodal Information Retrieval

This course provides the basic concepts, principles and applications for multimodal (text, image, video and audio) retrieval. This course covers basic techniques for content processing, indexing, representation, ranking, querying, and evaluation for multimodal information retrieval. In addition, advanced techniques such as large scale retrieval, multimodal analysis, and cross media retrieval will be covered based on the latest context such as mobile devices, social media and big data.

At the end of this course, the students will be able to:

  • Summarize and criticize the state of the art of multimodal information retrieval.
  • Adapt the framework, models and techniques of multimodal information retrieval.
  • Solve problems in emerging multimodal applications using the learned techniques.

CDS522/4 - Text and Speech Analytics

A lot of the information resides in documents and speech format. This information however is not directly utilisable because they are unstructured. The course focuses on the theory and applications of natural language processing and speech processing to retrieve linguistic knowledge in these sources. The linguistic knowledge from words, syntax and semantics of sentences will be combined with machine learning algorithms and statistical approach to find, organize, categorize, analyze and interpret the unstructured and semi-structured text that allow users to seek advice to make a decision.

At the end of this course, the students will be able to:

  • Describe basic concepts and algorithms in natural language and speech processing, for example tokenization, morphological analysis, ngram, tagging, parsing, word sense disambiguation and decoding.
  • Manipulate natural language processing and speech processing approaches to obtain different levels of linguistics information such as word, sentence and semantics for text analytics.
  • Design custom solutions using natural language processing and speech processing techniques or text and speech analytics problems in organizations.

CDS523/4 - Forensic Analytics and Digital Investigations

This course introduces fundamental knowledge and techniques of computer forensics and digital investigations. Starting from an overview of the profession of digital investigator, issues on the digital forensics and investigations on big data, and the current practices for processing crime and incident scenes will be explained. Next, the principles of interpretation of evidence, ways of controlling and preserving evidence, and techniques for manual interpretation of raw binary data will be detailed. The students will learn advanced techniques in forensic investigations on big data: methods to identify big data evidence, collecting and performing analysis on the data, and then the proper techniques to report and present the forensic findings as well as the proper way to act as expert witness in reporting results of investigations.

In addition, technical and legal difficulties involved in searching, extracting, maintaining and storing digital evidence will be explained along with the legal implications of such investigations and the rules of legal procedure relevant to electronic evidence.

At the end of this course, the students will be able to:

  • Conduct digital investigations that conform to accepted professional standards and are based on the investigative process: identification, preservation, examination, analysis and reporting.
  • Identify and document potential security breaches of computer data that suggest violations of legal, ethical, moral, policy and/or societal standards.
  • Master the principles and practices of big data forensics and digital investigations.
  • Access and critically evaluate relevant technical and legal information and emerging industry trends.

CDS590/8 - Consultancy Project & Practicum

This experiential work-based learning course prepares students to be a data scientist/analytics consultant by enhancing students’ knowledge and skills in research, planning and implementation of a consultancy project in the field of data science/analytics, which can be applied to real life situation.  Students are required to complete the practicum at their respective workplaces or their chosen/assigned organisations.  Students work under the supervision of a lecturer and an industry supervisor.  The students are required to solve a real world problem or tap opportunities related to data science and analytics during their practicum.

School of Computer Sciences, Universiti Sains Malaysia, 11800 USM Penang, Malaysia
Tel: +604-653 3647 / 2158 / 2155  |  Fax: +604-653 3684  | Email: This email address is being protected from spambots. You need JavaScript enabled to view it.  |  icon admin