
Resource Manager API
The Resource Manager is the primary contact for any application and therefore it contains around 80% of the information that can be accessed via YARN's REST API. The YARN REST API has many retrieval applications, which will be explained as follows:
- Retrieving cluster information: The basic API is used to access cluster information that contains information such as clusterID, when did the cluster start, what is the state of the cluster, versions of Hadoop, the Resource Manager, and so on. The CURL request on the REST API will look as follows:
curl -X GET http://localhost:8088/ws/v1/cluster/info
The default response would be in JSON, but you can also specify an XML response in the request header.
- Retrieving cluster metrics: The cluster metrics contains detailed information about the total number of applications submitted, total number of failed, running, killed, and completed applications counts, information about memory and containers, number of active and non active nodes, and so on. You can use following HTTP request to get the information:
curl -X GET http://localhost:8088/ws/v1/cluster/metrics
- Retrieving application information: YARN keeps information about all the applications whether they finished successfully, killed by any error, killed forcefully, pending for execution, and so on. We can easily get information about all the applications using the following REST API, which will return the information about each application and get details such as ID, username, start and finish time, resource allocation, container log location, and so on. We can extract the information of the applications using the following command:
curl -X GET http://localhost:8088/ws/v1/cluster/apps
We can also easily extract the information of a specific application using its application ID in the preceding request as follows:
curl -X GET http://localhost:8088/ws/v1/cluster/apps/{APP_ID}
The APP_ID should be replaced by the real application ID; for example, here is a request URL: http://localhost:8088/ws/v1/cluster/apps/application_14151592305_01.
YARN can make multiple attempts to run an application because of a number of reasons. We can also see the detailed information about the number of attempts and location of logs for those attempts in order to debug and correctly identify the root cause of failure. The REST API looks like the following:
curl -X GET http://localhost:8088/ws/v1/cluster/apps/{APP_ID}/
appattempts
The APP_ID should be replaced by a valid application ID.
- Retrieving node information: A YARN cluster consists of multiple nodes and each node may have different configurations and types. YARN also provides an API to extract information about all the nodes configured with cluster. The response includes information such as Node ID, rack information, status, memory and container information, and so on. We can use the following curl request to retrieve information:
curl -X GET http://localhost:8088/ws/v1/cluster/nodes
We can also get the specific node information by providing a node ID to the REST request. To retrieve information of a node with ID node1, a request would look like the following:
curl -X GET http://localhost:8088/ws/v1/cluster/nodes/node1
- Application API: YARN has a REST API to create new application requests and then it uses the response to submit new applications to a YARN resource manager. There are two steps to submit jobs, which are as follows:
- Creating new application request: The first step is to create an application request to YARN. YARN will then respond with a new application ID, which will be used for the new application. The creation REST API will be as follows:
curl -X POST http://localhost:8088/ws/v1/
cluster/apps/new-application
The response for the preceding HTTP request will be as follows:
{ "application-id":"application_1412438797841_0001", "maximum-resource-capability": { "memory":10456, "vCores":40 } }
-
- Submitting new application: Once we have a new application ID available, we can use it to submit a new application using the job submit API. The POST request for the job submit API will contain a request body that has the detailed information about the application, as follows:
curl -v -X POST -d new_application.json -H "Content-type: application/json"'http://localhost:8088/ws/v1/cluster/apps'
The following is the code for the application API new_application.json:
{ "application-id":"application_1412438797841_0001", "application-name":"new_application", "am-container-spec": { "local-resources": { "entry": [ { "key":"AppMaster.jar", "value": { "resource":"hdfs://hdfs-
namenode:9000/user/packt/DistributedShell/demo-app/AppMaster.jar", "type":"FILE", "visibility":"APPLICATION", "size": "43004", "timestamp": "1405452071209" } } ] }, "commands": { "command":"{{JAVA_HOME}}/bin/java -Xmx10m
org.apache.Hadoop.YARN.applications.distributedshell.ApplicationMaster --
container_memory 10 --container_vcores 1 --num_containers 1 --priority 0 1>
<LOG_DIR>/AppMaster.stdout 2><LOG_DIR>/AppMaster.stderr" }, "environment": { "entry": [ { "key": "DISTRIBUTEDSHELLSCRIPTTIMESTAMP", "value": "1405459400754" }, { "key": "CLASSPATH", "value": "{{CLASSPATH}}<CPS>./*<CPS>{{HADOOP_CONF_DIR}}<CPS>{{HADOOP_COMMON_HOME}}/share/Hadoop/common/*<CPS>{{HADOOP_COMMON_HOME}}/share/Hadoop/common/lib/*<CPS>{{HADOOP_HDFS_HOME}}/share/Hadoop/hdfs/*<CPS>{{HADOOP_HDFS_HOME}}/share/Hadoop/hdfs/lib/*<CPS>{{HADOOP_YARN_HOME}}/share/Hadoop/YARN/*<CPS>{{HADOOP_YARN_HOME}}/share/Hadoop/YARN/lib/*<CPS>./log4j.properties" }, { "key": "DISTRIBUTEDSHELLSCRIPTLEN", "value": "6" }, { "key": "DISTRIBUTEDSHELLSCRIPTLOCATION", "value": "hdfs://hdfs-namenode:9000/user/packt/example/shellCommands" } ] } }, "unmanaged-AM":"false", "max-app-attempts":"2", "resource": { "memory":"1024", "vCores":"1" }, "application-type":"YARN", "keep-containers-across-application-attempts":"false" }
- Retrieving application status: We can retrieve the current status of an application using the REST API, which provides the status by using an application ID, as follows:
curl -X GET 'http://localhost:8088/ws/v1/cluster/apps/
application_1412438797841_0001/state'
- Killing application: Sometime we may want to kill an application because of reasons such as application taking too long for execution, some mistakes in the application code, or undesired output. Any application that is in a running or pending state can be killed and YARN provides a REST API to kill the application. The preceding submitted application can be killed using following request:
curl -v -X PUT -d '{"state":
"KILLED"}''http://localhost:8088/ws/v1/cluster/apps/
application_1412438797841_0001'