Configuration File¶
Location¶
All commands in The GC3Apps software and The GC3Utils software read two configuration files at startup:
- system-wide one located at :file:
/etc/gc3/gc3pie.conf
, and- a user-private one at :file:
~/.gc3/gc3pie.conf
.
Both files use the same format. The system-wide one is read first, so that users can override the system-level configuration in their private file. Configuration data from corresponding sections in the two configuration files is merged; the value in the user-private file overrides the one from the system-wide configuration.
If you try to start any GC3Utils command without having a
configuration file, a sample one will be copied to the user-private
location :file:~/.gc3/gc3pie.conf
and an error message will be
displayed, directing you to edit the sample file before retrying.
Configuration file format¶
The GC3Pie configuration file follows the format understood by
Python ConfigParser,
which is very close to the syntax used in MS-Windows .INI
files.
See http://docs.python.org/library/configparser.html for reference.
The GC3Libs configuration file consists of several configuration blocks. Each configuration block (section) starts with a keyword in square brackets and contains the configuration options for a specific part.
The following sections are used by the GC3Apps/GC3Utils programs:
[DEFAULT]
– this is for global settings.[auth/name]
– these are for settings related to identity/authentication (identifying yourself to clusters & grids).[resource/name]
– these are for settings related to a specific computing resource (cluster, grid, etc.)
Sections with other names are allowed but will be ignored.
The DEFAULT
section¶
Values defined in the [DEFAULT]
section will be inserted in
any other section (unless they are explicitly overridden).
For example, the following will add a debug=1
line to any section
in the configuration file:
[DEFAULT]
debug=1
See documentation of the Python SafeConfigParser object at http://docs.python.org/library/configparser.html for an explanation.
The [DEFAULT]
section is optional.
auth
sections¶
There can be more than one [auth]
section.
Each authentication section must begin with a line of the form:
[auth/name]
where the name
portion is any alphanumeric string.
You can have as many [auth/name]
sections as you want; any
name is allowed provided it’s composed only of letters, numbers and
the underscore character _
.
This allows you to define different auth methods for different
resources. Each [resource/name]
section will reference one
(and one only) authentication section.
Authentication types¶
Each auth
section must specify a type
setting.
type
defines the authentication type that will be used to access
a resource. There are three supported authentication types:
ssh
; use this for resources that will be accessed by opening an SSH connection to the front-end node of a cluster.voms-proxy
: usesvoms-proxy-init
to generate a proxy; use for resources that require a VOMS-enabled Grid proxy.grid-proxy
: usesgrid-proxy-init
to generate a proxy; use for resources that require a Grid proxy (but no VOMS extensions).ec2
: use this for a EC2-compatible cloud resource.
For the ssh
-type auth, the following keys must be provided:
type
: must bessh
username
: must be the username to log in as on the remote machine
For the ec2
-type auth, the following keys can be provided. If
they are not found, the value of the corresponding environment
variable will be used instead, if found, otherwise an error will be
raised.
ec2_access_key
: Your personal access key to authenticate against the specific cloud endpoint. If not found, the environment variableEC2_ACCESS_KEY
will be used.ec2_secret_key
: Your personal secret key associated with the aboveec2_access_key
. If not found, the environment variableEC2_SECRET_KEY
will be used.
Any other key/value pair will be ignored.
For the voms-proxy
type auth, the following keys must be provided:
type
: must bevoms-proxy
vo
: the VO to authenticate with (passed directly tovoms-proxy-init
as argument to the--vo
command-line switch)cert_renewal_method
: see below.remember_password
: see below.
Any other key/value pair will be ignored.
For the grid-proxy
type auth, the following keys must be provided:
type
: must begrid-proxy
cert_renewal_method
: see below.remember_password
: see below.
Any other key/value pair will be ignored.
For the voms-proxy
and grid-proxy
authentication types, the
cert_renewal_method
setting specifies whether GC3Libs should attempt
to get a certificate if the current one is expired or otherwise invalid.
Currently there are two supported cert_renewal_method
types:
slcs
: user certificate is generated through an invocation of theslcs-init
:command: program.manual
: user certificate is generated/renewed though an external process and has to be performed by the user outside of the scope of GC3Pie. In this case, if the user certificate is expired, invalid or non-existent, GC3Pie will fail to authenticate.
For the slcs
certificate renewal method, the following keys must be provided:
aai_username
: passed directly to slcs-init as argument to the--user
command-line switch.idp
: passed directly to slcs-init as argument to the--idp
command-line switch.
For the manual
certificate renewal method, no additional keys are required.
The remember_password
entry (optional) must be set to a boolean
value (the strings 1`, ``yes
, true
and on
are interpreted
as boolean “true”; any other value counts as “false”). If set to a
true value, the remember_password
entry instructs GC3Pie to keep
the password used for this authentication in the program’s main
memory; this implies that you will be asked for the password at most
once per program invocation. This setting is optional, and defaults
to “false”. Keeping passwords in memory is bad security practice; do
not set this option to “true” unless you understand the implications.
Example 1. The following example auth
section shows how to
configure GC3Pie for using SWITCHaai SLCS services to generate a
certificate and a VOMS proxy to access the Swiss National Distributed
Computing Infrastructure SMSCG:
[auth/smscg]
type = voms-proxy
cert_renewal_method = slcs
aai_username = <aai_user_name> # SWITCHaai/Shibboleth user name
idp= uzh.ch
vo = smscg
Example 2. The following configuration sections are used to set up two different accounts, that GC3Pie programs can use. Which account should be used on which computational resource is defined in the resource sections (see below).
[auth/ssh1]
type = ssh
username = murri # your username here
[auth/ssh2] # I use a different account name on some resources
type = ssh
username = rmurri
Example 3. The following configuration section is used to access an EC2 resource (access and secret keys are of course invalid :)):
[auth/hobbes]
type=ec2
ec2_access_key=1234567890qwertyuiopasdfghjklzxc
ec2_secret_key=cxzlkjhgfdsapoiuytrewq0987654321
resource
sections¶
Each resource section must begin with a line of the form:
[resource/name]
You can have as many [resource/name]
sections as you want; this
allows you to define many different resources. Each [resource/name]
section must reference one (and one only) [auth/name]
section (by its auth
key).
Resources currently come in several flavours, distinguished by the
value of the type
key:
- If
type
isarc1
, then the resource is accessed using the ARC grid middleware (version 1.1.x/1.0.x);- If
type
isarc0
, then the resource is accessed using the ARC grid middleware (version 0.8.x);- If
type
issge
, then the resource is a Grid Engine batch system, to be accessed by an SSH connection to its front-end node.- If
type
ispbs
, then the resource is a Torque/PBS batch system, to be accessed by an SSH connection to its front-end node.- If
type
islsf
, then the resource is a LSF batch system, to be accessed by an SSH connection to its front-end node.- If
type
isslurm
, then the resource is a SLURM batch system, to be accessed by an SSH connection to its front-end node.- If
type
isshellcmd
, then the resource is the computer where the GC3Pie script is running and applications are executed by just spawning a local UNIX process.- If
type
isec2+shellcmd
, then the resource is a cloud with EC2-compatible APIs, and applications are run on Virtual Machines spawned on the cloud.
All [resource/name]
sections (except those of shellcmd
type) must reference a valid auth/***
section. Resources of
sge
, pbs
, lsf
and slurm
type can only reference
:command:ssh
type sections.; resources of type arc0
or
arc1
can only reference [auth/***]
sections whose type is
voms-proxy
or grid-proxy
.
Some configuration keys are commmon to all resource types:
type
: Resource type, see above.
auth
: the name of a valid[auth/name]
section; only the authentication section name (after the/
) must be specified.
max_cores_per_job
: Maximum number of CPU cores that a job can request; a resource will be dropped during the brokering process if a job requests more cores than this.
max_memory_per_core
: Max amount of memory (expressed in GBs) that a job can request.
max_walltime
: Maximum job running time (in hours).
max_cores
: Total number of cores provided by the resource.
architecture
: Processor architecture. Should be one of the stringsx86_64
(for 64-bit Intel/AMD/VIA processors),i686
(for 32-bit Intel/AMD/VIA x86 processors), orx86_64,i686
if both architectures are available on the resource.
time_cmd
: Used only whentype
isshellcmd
. The time program is used as wrapper for the application in order to collect informations about the execution when running without a real LRMS.
prologue
: Used only whentype
ispbs
,lsf
,
slurm
orsge
. The content of the prologue script will be inserted into the submission script and it’s executed before the real application. It is intended to execute some shell commands needed to setup the execution environment before running the application (e.g. running a module load ... command). The script must be a valid, plain /bin/sh script.
epilogue
: Used only whentype
ispbs
,lsf
,
slurm
orsge
. The content of the epilogue script will be inserted into the submission script and it’s executed after the real application. The script must be a valid, plain /bin/sh script.
<application_name>_prologue
: Same asprologue
, but it is used only when<application_name>
matches the name of the application. Valid application names are: zods, gamess, turbomole, codeml, rosetta, rosetta_docking, geotop,
<application_name>_epilogue
: Same asepilogue
, but it is used only when<application_name>
matches the name of the application. Valid application names are: zods, gamess, turbomole, codeml, rosetta, rosetta_docking, geotop,
arc0
resources¶
The arc_ldap
key should be set to the LDAP URL of an ARC GIIS or
GRIS. If, in addition, the frontend
key is also defined, then
only queues belonging to the specified frontend will be considered for
brokering.
When a job has just been submitted, the ARC information system does
not immediately report about it: the job will appear at the next cache
update. This creates a time window during which no information is
reported about the job by ARC, as if it never existed. In order not
to mistake this for a “job lost” error, GC3Libs allow a “grace time”:
job information lookups are allowed to fail for a certain time span
after submission. The duration of this time span is set with the optional
lost_job_timeout
parameter, whose default is 4 times the ARC default
cache time; this parameter should not be lower than twice the
information system update frequency.
lost_job_timeout
: Time (in seconds) a failure in job lookup in the information system will not be considered critical
arc1
resources¶
The arc_ldap
key should be defined to a valid ARC1 information system URL.
When a job has just been submitted, the ARC information system does
not immediately report about it: the job will appear at the next cache
update. This creates a time window during which no information is
reported about the job by ARC, as if it never existed. In order not
to mistake this for a “job lost” error, GC3Libs allow a “grace time”:
job information lookups are allowed to fail for a certain time span
after submission. The duration of this time span is set with the optional
lost_job_timeout
parameter, whose default is 4 times the ARC default
cache time; this parameter should not be lower than twice the
information system update frequency.
lost_job_timeout
: Time (in seconds) a failure in job lookup in the information system will not be considered critical
sge
resources¶
The following configuration keys are required in a sge
-type resource section:
frontend
: should contain the FQDN of the SGE front-end node. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.transport
: Possible values are:ssh
orlocal
. Ifssh
, we try to connect to the host specified infrontend
via SSH in order to execute SGE commands. Iflocal
, the SGE commands are run directly on the machine where GC3Pie is installed.
To submit parallel jobs to SGE, a “parallel environment” name must be
specified. You can specify the PE to be used with a specific
application using a configuration parameter application name +
_pe
(e.g., gamess_pe
, zods_pe
); the default_pe
parameter dictates the parallel environment to use if no
application-specific one is defined. If neither the
application-specific, nor the ``default_pe`` parallel environments are
defined, then it will not be possible to submit parallel jobs.
When a job has finished, the SGE batch system does not (by default)
immediately write its information into the accounting database. This
creates a time window during which no information is reported about
the job by SGE, as if it never existed. In order not to mistake this
for a “job lost” error, GC3Libs allow a “grace time”: qacct
job information lookups are allowed to fail for a certain time span
after the first time qstat failed. The duration of this
time span is set with the sge_accounting_delay
parameter, whose
default is 15 seconds (matches the default in SGE, as of release 6.2):
sge_accounting_delay
: Time (in seconds) a failure in qacct will not be considered critical.
GC3Pie uses standard command line utilities to interact with the
resource manager. By default these commands are searched using the
PATH
environment variable, but you can specify the full path of
these commands and/or add some extra options. The following options
are used by the SGE backend:
qsub
: submit a job.qacct
: get info on resources used by a job.qdel
: cancel a job.qstat
: get the status of a job or the status of available resources.
pbs
resources¶
The following configuration keys are required in a pbs
-type resource section:
transport
: Possible values are:ssh
orlocal
. Ifssh
, we try to connect to the host specified infrontend
via SSH in order to execute Troque/PBS commands. Iflocal
, the Torque/PBS commands are run directly on the machine where GC3Pie is installed.frontend
: should contain the FQDN of the Torque/PBS front-end node. This configuration item is only relevant iftransport
islocal
. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.
GC3Pie uses standard command line utilities to interact with the
resource manager. By default these commands are searched using the
PATH
environment variable, but you can specify the full path of
these commands and/or add some extra options. The following options
are used by the PBS backend:
queue
: the name of the queue to which jobs are submitted. If empty (the default), no job will be specified during submission.qsub
: submit a job.qdel
: cancel a job.qstat
: get the status of a job or the status of available resources.tracejob
: get info on resources used by a job.
lsf
resources¶
The following configuration keys are required in a lsf
-type resource section:
transport
: Possible values are:ssh
orlocal
. Ifssh
, we try to connect to the host specified infrontend
via SSH in order to execute LSF commands. Iflocal
, the LSF commands are run directly on the machine where GC3Pie is installed.frontend
: should contain the FQDN of the LSF front-end node. This configuration item is only relevant iftransport
islocal
. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.
GC3Pie uses standard command line utilities to interact with the
resource manager. By default these commands are searched using the
PATH
environment variable, but you can specify the full path of
these commands and/or add some extra options. The following options
are used by the LSF backend:
bsub
: submit a job.bjobs
: get the status and resource usage of a job.bkill
: cancel a job.lshosts
: get info on available resources.
slurm
resources¶
The following configuration keys are required in a slurm
-type resource section:
transport
: Possible values are:ssh
orlocal
. Ifssh
, we try to connect to the host specified infrontend
via SSH in order to execute SLURM commands. Iflocal
, the SLURM commands are run directly on the machine where GC3Pie is installed.frontend
: should contain the FQDN of the SLURM front-end node. This configuration item is only relevant iftransport
islocal
. An SSH connection will be attempted to this node, in order to submit jobs and retrieve status info.
GC3Pie uses standard command line utilities to interact with the
resource manager. By default these commands are searched using the
PATH
environment variable, but you can specify the full path of
these commands and/or add some extra options. The following options
are used by the SLURM backend:
sbatch
: submit a job.scancel
: cancel a job.squeue
: get the status of a job or of the available resources.sacct
: get info on resources used by a job.
shellcmd
resources¶
The following optional configuration keys are available in a
shellcmd
-type resource section:
transport
: Like any other resources, possible values aressh
orlocal
. Default value islocal
.frontend
: If transport is ssh, then frontend is the FQDN of the remote machine where the jobs will be executed.time_cmd
:ShellcmdLrms
needs the GNU implementation of the command time in order to get resource usage of the submitted jobs.time_cmd
must contains the path to the binary file if this is different from the standard (/usr/bin/time
).override
:ShellcmdLrms
by default will try to gather information on the system the resource is running on, including the number of cores and the available memory. These values may be different from the values stored in the configuration file. Ifoverride
is True, then the values automatically discovered will be used. Ifoverride
is False, the values in the configuration file will be used regardless of the real values discovered by the resource.spooldir
: Path to a filesystem location where to create temporary working directories for processes executed through this backend. The default value None means to use$TMPDIR
or/tmp
(see tempfile.mkftemp for details).
ec2+shellcmd
resource¶
The following configuration options are available for a resource of
type ec2+shellcmd
. If these options are omitted, then the default
of the boto python library will be used, which at the time of
writing means use the default region on Amazon.
ec2_url
: The URL of the EC2 frontend. On a typical OpenStackinstallation this will look like:
https://cloud.gc3.uzh.ch:8773/services/Cloud
, while for amazon it’s something likehttps://ec2.us-east-1.amazonaws.com
(this is valid for the zoneus-east-1
of course). If no value is specified, the environment variableEC2_URL
will be used, and if not found an error is raised.
ec2_region
: the region you want to access to. Most OpenStack installations only have one region callednova
.
keypair_name
: the name of the keypair to use when creating a new instance on the cloud. If it’s not found, a new keypair with this name and the key stored inpublic_key
will be used. Please note that if the keypair exists already on the cloud but the associated public key is different from the one stored inpublic_key
, then an error is raised and the resource will not be used.
public_key
: public key to use when creating the keypair. Please note that GC3Pie will assume that the corresponding private key is stored on a file with the same path but without the.pub
extension. This private key is necessary in order to access the virtual machines created on the cloud. Amazon users: Please note that Amazon does not accept DSA keys; use RSA keys only for Amazon resources.
vm_auth
: the name of a validauth
stanza used to connect to the virtual machine.
instance_type
: the instance type (aka flavor, aka size) you want to use for your virtual machines by default.
<application>_instance_type
: you can override the default instance type for a specific application by defining an entry in the configuration file for that application. For example:instance_type=m1.tiny gc_gps_instance_type=m1.largewill use instance type
m1.large
for thegc_gps
GC3Pie application, andm1.tiny
for all the other applications.
image_id
: the ami-id of the image you want to use. OpenStack users: please note that the ID you will find on the web interface is not the ami-id. To get the ami-id of an image you have to use the commandeuca-describe-images
from theeuca2ools
package.For Hobbes users: all virtual machines distributed by the GC3 team are in this list of appliances with the corresponding ami-id.
<application>_image_id
: you can override the default image id for a specific application by defining an entry in the configuration file for that specific application. For example:image_id=ami-00000048 gc_gps_image_id=ami-0000002awill use the image
ami-0000002a
forgc_gps
and imageami-00000048
for all other applications.
security_group_name
: the name of the security group to use. If not found, it will be created using the rules found insecurity_group_rules
. If the security group is found but some of the rules insecurity_group_rules
are not present, they will be added to the security groups. Please note that if the security group defines some rule which is not listed insecurity_group_rules
it will not be removed from the security group.
security_group_rules
: comma separated list of security rules thesecurity_group
must have. Each rule is in the form:PROTOCOL:PORT_RANGE_START:PORT_RANGE_END:IP_NETWORKwhere:
PROTOCOL
can be one oftcp
,udp
,icmp
PORT_RANGE_START
andPORT_RANGE_END
are integers and define the range of ports to allow. IfPROTOCOL
isicmp
please use-1
for both values since inicmp
there is no concept of port.IP_NETWORK
is a range of IP to allow in the formA.B.C.D/N
.For instance, to allow access to the virtual machine from any machine in the internet you can use:
tcp:22:22:0.0.0.0/0Please note that in order to be able to access the created virtual machines GC3Pie needs to be able to connect via ssh, so the above rule is probably necessary in any gc3pie configuration. (of course, you can allow only your IP address or the IPs of your institution)
vm_pool_max_size
: the maximum number of Virtual Machine GC3Pie will start on this cloud. If 0, there is no predefined limit to the number of virtual machines GC3Pie will spawn.
user_data
: the content of a script that will run after the startup of the machine. For instance, to automatically upgrade a ubuntu machine after startup you can use:user_data=#!/bin/bash aptitude -y update aptitude -y safe-upgradePlease note that if you need to span over multiple lines you have to indent the lines after
user_data
, as any indented line in a configuration file is interpreted as a continuation of the previous line.
<application>_user_data
: you can override the default userdata for a specific application by defining an entry in the configuration file for that specific application. For example:# user_data= warholize_user_data = #!/bin/bash aptitude update && aptitude -y install imagemagickwill install imagemagick only for the warholize application.
Example resource
sections¶
Example 1. This configuration stanza defines a resource smscg
representing the whole SMSCG infrastructure, accessed through the ARC
(version 0.8.x) middleware:
[resource/smscg]
# A whole ARC-based Grid
type = arc0
auth = <voms_auth_name>
arc_ldap = ldap://giis.smscg.ch:2135/o=grid/mds-vo-name=Switzerland
# These values are correct as of 2011-02-28; please
# ask on the SMSCG mailing list if unsure.
max_cores_per_job = 256
max_memory_per_core = 3
max_walltime = 9999
ncores = 1200
architecture = x86_64, i686
Example 2. This configuration stanza shows how to access a single
cluster through the ARC middleware (version 1.x) using the name
idgc3grid01
(which is also the internet host name of the cluster
front-end):
[resource/idgc3grid01]
# A single cluster, accessed through the ARC middleware
type = arc
auth = <auth_name> # pick a ``voms`` type auth
frontend = idgc3grid01.uzh.ch
name = gc3
arc_ldap = ldap://idgc3grid01.uzh.ch:2135/mds-vo-name=local,o=grid
max_cores_per_job = 32
max_memory_per_core = 2
max_walltime = 12
ncores = 80
Example 3. This configuration stanza defines a resource to submit
jobs to the Grid Engine cluster whose front-end host is
ocikbpra.uzh.ch
:
[resource/ocikbpra]
# A single SGE cluster, accessed by SSH'ing to the front-end node
type = sge
auth = <auth_name> # pick an ``ssh`` type auth, e.g., "ssh1"
transport = ssh
frontend = ocikbpra.uzh.ch
gamess_location = /share/apps/gamess
max_cores_per_job = 80
max_memory_per_core = 2
max_walltime = 2
ncores = 80
Example 4. This configuration stanza defines a resource to submit jobs on virtual machines that will be automatically started by GC3Pie on Hobbes, the private OpenStack cloud of the University of Zurich:
[resource/hobbes]
enabled=yes
type=ec2+shellcmd
ec2_url=http://hobbes.gc3.uzh.ch:8773/services/Cloud
ec2_region=nova
auth=ec2hobbes
# These values my be overwritten by the remote resource
max_cores_per_job = 8
max_memory_per_core = 2
max_walltime = 8
max_cores = 32
architecture = x86_64
keypair_name=my_name
# If keypair does not exists, a new one will be created starting from
# `public_key`. Note that if the desired keypair exists, a check is
# done on its fingerprint and a warning is issued if it does not match
# with the one in `public_key`
public_key=~/.ssh/id_dsa.pub
vm_auth=gc3user_ssh
instance_type=m1.tiny
warholize_instance_type = m1.small
image_id=ami-00000048
warholize_image_id=ami-00000035
security_group_name=gc3pie_ssh
security_group_rules=tcp:22:22:0.0.0.0/0, icmp:-1:-1:0.0.0.0/0
vm_pool_max_size = 8
user_data=
warholize_user_data = #!/bin/bash
aptitude update && aptitude install -u imagemagick
Enabling/disabling selected resources¶
Any resource can be disabled by adding a line enabled = false
to its
configuration stanza. Conversely, a line enabled = true
will undo
the effect of an enabled = false
line (possibly found in a different
configuration file).
This way, resources can be temporarily disabled (e.g., the cluster is down for maintenance) without having to remove them from the configuration file.
You can selectively disable or enable resources that are defined in
the system-wide configuration file. Two main use cases are supported:
the system-wide configuration file :file:/etc/gc3/gc3pie.conf
lists and
enables all available resources, and users can turn them off in their
private configuration file :file:~/.gc3/gc3pie.conf
; or the system-wide
configuration can list all available resources but keep them disabled,
and users can enable those they prefer in the private configuration
file.