Ansible Playbook for Preparing Servers - Cloudera

Ansible Playbook for Setting Up Cloudera Hadoop Clusters.

Posted by Aravind Nuthalapati on May 23, 2016

This Ansible Playbook helps setting up Cloudera Hadoop Cluster with ease.

1. Ansible Introduction

Ansible is a radically simple IT automation system. It handles configuration-management, application deployment, cloud provisioning, ad-hoc task-execution, and multinode orchestration - including trivializing things like zero downtime rolling updates with load balancers.

Read the documentation and more at https://ansible.com/

You can find installation instructions here for a variety of platforms. Most users should probably install a released version of Ansible from pip, a package manager or our release repository. Officially supported builds of Ansible are also available. Some power users run directly from the development branch - while significant efforts are made to ensure that devel is reasonably stable, you're more likely to encounter breaking changes when running Ansible this way.

2. Ansible Playbook: Server Preparation for Cloudera

Purpose:

Configure all nodes with necessary system settings, packages, and networking for Cloudera CDH deployment.


---
- name: Prepare servers for Cloudera Hadoop installation
  hosts: all
  become: yes

  vars:
    java_package: java-1.x.0-openjdk-devel
    cloudera_user: cloudera-scm

  tasks:
    - name: Install required packages
      yum:
        name:
          - ""
          - ntp
          - wget
          - curl
          - bind-utils
        state: present

    - name: Set hostname
      hostname:
        name: ""

    - name: Update /etc/hosts
      blockinfile:
        path: /etc/hosts
        block: |
          192.168.1.10 master01
          192.168.1.11 data01
          192.168.1.12 data02

    - name: Disable SELinux
      selinux:
        state: disabled

    - name: Stop and disable firewalld
      service:
        name: firewalld
        state: stopped
        enabled: no

    - name: Create cloudera-scm user
      user:
        name: ""
        shell: /bin/bash
        state: present

    - name: Set vm.swappiness to 1
      sysctl:
        name: vm.swappiness
        value: 1
        state: present
        reload: yes

    - name: Ensure NTP is enabled and started
      service:
        name: ntpd
        state: started
        enabled: yes

3. Ansible Playbook: Cloudera CDH Cluster Setup via Cloudera Manager

Pre-conditions:

  • Cloudera Manager installed on master node (manual or pre-built image).
  • API port 7180 is accessible from Ansible control node.

---
- name: Install Cloudera CDH cluster via Cloudera Manager API
  hosts: master
  become: yes
  vars:
    cm_host: "master01"
    cm_user: "admin"
    cm_password: "admin"
    cluster_name: "Test-CDH-Cluster"
    parcel_version: "CDH-version.x.0"

  tasks:
    - name: Add host templates for roles
      uri:
        url: "http://:7180/api/v14/cm/deployment"
        method: PUT
        user: ""
        password: ""
        body: ""
        body_format: json
        status_code: 200

    - name: Trigger installation
      uri:
        url: "http://:7180/api/v14/cm/commands/installCluster"
        method: POST
        user: ""
        password: ""
        status_code: 200

Note:

You must define a cluster_template.json file containing the Cloudera Manager deployment template (including hosts, roles, and services such as HDFS, YARN, Hive, and Impala).

4. Recommended Directory Structure


cloudera-ansible/
├── inventory/
│   └── hosts.ini
├── playbooks/
│   ├── prepare-nodes.yml
│   └── install-cdh.yml
├── templates/
│   └── cluster_template.json
└── group_vars/
    └── all.yml

5. Additional Tips

  • Use ansible-pull for isolated environments without centralized control.
  • Set up an internal YUM repository to speed up package installation.
  • Use screen or tmux when running playbooks on jump boxes.
  • Test role-to-host assignment thoroughly before running installCluster API.

6. Summary

Big Data Administrators and DevOps teams could automate and accelerate Cloudera CDH cluster deployment — improving consistency, reducing errors, and saving operational time. These playbooks still serve as a foundation for on-prem automation strategies.