Prerequisites¶
There are two main ways to launch a logisland job :
- within Docker containers
- within an Hadoop distribution (Cloudera, Hortonworks, …)
1. Trough a Docker container (testing way)¶
Logisland is packaged as a Docker container that you can build yourself or pull from Docker Hub.
To facilitate integration testing and to easily run tutorials, you can use docker-compose with the following docker-compose.yml.
Once you have this file you can run a docker-compose command to launch all the needed services (zookeeper, kafka, es, kibana and logisland)
Elasticsearch on docker needs a special tweak as described here
# set vm.max_map_count kernel setting for elasticsearch
sudo sysctl -w vm.max_map_count=262144
#
cd /tmp
wget https://raw.githubusercontent.com/Hurence/logisland/master/logisland-framework/logisland-resources/src/main/resources/conf/docker-compose.yml
docker-compose up
Note
you should add an entry for sandbox and kafka (with the container ip) in your /etc/hosts
as it will be easier to access to all web services in logisland running container.
Any logisland script can now be launched by running a logisland.sh script within the logisland docker container like in the example below where we launch index-apache-logs.yml job :
docker exec -i -t logisland bin/logisland.sh --conf conf/index-apache-logs.yml
2. Through an Hadoop cluster (production way)¶
Now you have played with the tool, you’re ready to deploy your jobs into a real distributed cluster. From an edge node of your cluster :
- download and extract the latest release of logisland
- export SPARK_HOME and HADOOP_CONF_DIR environment variables
- run logisland.sh launcher script with your job conf file.
cd /opt
sudo wget https://github.com/Hurence/logisland/releases/download/v1.1.1/logisland-1.1.1-bin.tar.gz
export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/
export HADOOP_CONF_DIR=$SPARK_HOME/conf
sudo /opt/logisland-1.1.1/bin/logisland.sh --conf /home/hurence/tom/logisland-conf/v0.10.0/future-factory.yml