EnginePlus is a big data cluster management service that helps users to manage and operate clusters such as Hadoop.
One-clickcreate cluster.(Hadoop)Healthy-Statusdection in cluster.Auto-Scalingfor cluster nodes.
Of course, it is not only that, we have done in-depth optimization for cluster performance and other related parameters, the performance compared with similar instances is better than AWS EMR and Aliyun EMR .
With some simple operations, users can start a service to maintain a high performance cluster.
Assumption: You are familiar with AWS. Prepare:
SSH keyuser is ec2-user.IAM role:ssm:*,aws-marketplace:*,rds:CreateDBInstance.- Security Group Ids:one for EnginePlus server instance, another for cluster instances.
- MySQL RDS.(Currently,we will create MySQL RDS automatically in EnginePlus-Enterprise by Cloudformation,the RDS could be connect by your vpc network).
- VPC subnetIds(at least 2 availability zones for DBSubnetGroup when create MySQL RDS,but EnginePlus server only need one subnetId).
By the words,we hope you sign in AWS Marketplace using IAM role that includes ssm:*,aws-marketplace:*,rds:CreateDBInstance, if your AWS account for marketplace is not the IAM role you will use in EnginePlus,please allow the permissionrds:CreateDBInstance for your AWS account.
You can generate those in AWS.
Notice:
As shown above, the EnginePlus server manages all host resources in cluster units.
- For EnginePlus, the examples are mainly divided into two categories,
server instanceandcluster instance. - All instances use the same ssh key for authentication.
- Server and cluster are recommended to use different security group configurations. Because you need to access github and all machines of cluster, you need to open all outbound rules.
- The port of 22, 443 related inbound access should be allowed in server security group.
- Because the cluster needs to access all instance communication, for security reasons, all ports must be opened to the intranet.
- Cluster and the server instance use the same IAM role. You need to enable
ssm:*,aws-marketplace:*for this IAM role.
Please add Service Policy System Manager(ssm:*) and import managed policy AWSMarketplaceFullAccess(aws-marketplace:*) like belows:

ssm:*(used to send commands to the cluster nodes);aws-marketplace:*(used to start,stop,terminate cluster instance,if you subscribe EnginePlus-Enterprise,the policy ofaws-marketplace:MeterUsagewill be used to send metering records).
If the policies are not correct,the Engineplus will stop,and will restart automatically after 30s util you set correct IAM polices.
Also,if the metering service of Engineplus-Enterprise has exceptions,the server will switch the restart process utill the metering service turns normal,but the cluster you have created will be alive.
-
Since the meta-information of cluster needs to be managed, it is recommended that users should provide a third-party mysql database(you can use AWS RDS,the EnginePlus-Enterprise will create the AWS RDS
db.t3.mediumautomatically) to maintain these meta-information. You only need to provide the database address, username, and password here. The user must have the permissions of creatingdatabase,table,index, etc. But please don't use root credentials. -
Temporarily only supports us-east-1, when you select subnetId, we should take the security group into consideration.
AWS CloudFormation --> Start Service --> Create cluster --> Run Application
Subscribe EnginePlus.
Use AWS CloudFormation to deploy the service automatically (The AMI will be provided by subscription in AWS Marketplace). Please fill out the CloudFormation as belows,all of fields is required:
When the instance is launched,we can access it via: https://{host} , port is 443.
We recommend that you open the external network access port of 443, or access it through proxy.
Form Description
| Field | Description |
|---|---|
| InstanceType | EnginePlus Sever instance type. |
| KeyName | SSH key, by which you can access instances of EnginePlus and clusters. |
| SubnetIds | the subnetIds of instances vpc network,please provide subnetIds covered at least 2 AZs. |
| serverSecurityGroupIds | the list of security group ids in EnginePlus instance. |
| serverUserName | the authentication user of EnginePlus. |
| serverUserPassword | the authentication password of EnginePlus. |
| iamRoleName | the name of IAM role for all instances. |
| clusterSecurityGroupIds | the list of security group ids in Cluster instances. |
| metaHost | address of RDS,where stored the cluster meta info, but the field will not appear in the EnginePlus-Enterprise. |
| metaUserName | meta database authentication user. |
| metaPassword | meta database authentication password. |
When EnginePlus is deployed successfully, you can access https://{host} to sigin in to the management page. Users can choose Install Cluster to install new cluster.

- When click
Install Cluster, you need to fill out the installation form:
| Field | Description |
|---|---|
| Cluster Name | EnginePlus will create a database named by{ClusterName}, store the cluster meta. |
| @User/@Password | login username/password credentials used by Cluster management based on 'ambari' |
| SubnetId | the cluster vpc subnetId |
| Company | Cloud Provider, only support aws currently |
| Instance | Instance type |
- After the user submits, the installation progress can be observed on the left side. Once the installation is completed, the cluster will be displayed in the middle form. The installation takes about 20 minutes.
Users can install multiple clusters, but must wait for the previous installation before submitting the installation.

- Once the installation is successful, we can check the details through the
Detailbutton.
When the cluster is installed, there is only one NodeManager in Hadoop Cluster. We can set our resource requirements manually or auto-scaling rules according to requirements.
When the installation is successful, we can click on the access to ambari in the upper right corner. We use Ambari to complete the job of cluster management and installation.
After the installation is completed, the components are:
Now, We can submit the application on the ambari server instance:
#!/usr/bin/env bash
INPUT_PATH="s3a://mob-emr-test/dongtao/data/"
OUTPUT_PATH="s3a://mob-emr-test/dongtao/output/MapReduce"
hadoop fs -rm -r "${OUTPUT_PATH}"
hadoop jar \
hadoop-study-1.0-selfcontained.jar \
job.HadoopSaligia \
${INPUT_PATH} \
${OUTPUT_PATH}Of course, the cluster environment files are all located in the '/usr/HDP/' directory, where users can download and put into their own instance's environment if they want to submit their applications in other instances.
Note:
- For security reasons, the operation of remove cluster only deletes the part of the monitor mounting, and the instance and the meta-information database are still in use. You need to manually release the instances and delete the database corresponding to mysql, if you want to remove completely.
- If the installation is failed, you need to check whether the database has been added. If the database has been automatically added, you need to manually delete it to prevent the subsequent installation process.







