Reliability, Risk and Safety in Engineering

Reliability - Risk - Safety page

Availability, Reliability, Maintainability

Note: The information below is provided as an overview of Availability, Reliability and Maintainability and .. More detailed and reliable information is provided at the sites linked at the bottom of this page...


An item or system is specified, procured, and designed to a functional requirement and it is important that it satisfies this requirement.  However it is also desirable that the the item or system should be predictably available and this depends upon the its reliability and availability.   For some disposable products in our modern society the availability requirement may be acceptably low.   For a large range of consumer products the availability, based on high reliability, is an important selling point.   For items and systems used in critical areas including military equipment, process plant , and the nuclear industry, the availability, reliability and maintainability considerations are vital.

The economic justification for a project is generally based on the lifetime cost of the project.  A major contribution to this cost involves an evaluation of the availability reliability and maintainability of the equipment..


The ability of an item to be in a state to perform a required function under given conditions at a given instant of time or during a given time interval, assuming that the required external resources are provided.

At its simplest level..

Availability = Uptime / (Downtime + Uptime)

The time units are generally hours and the time base is 1 year .  There are 8760 hours in one year.

From the design area of concern this equation translates to ..

Availability(Intrinsic) A i = MTBF / (MTBF + MTTR)

MTBF = Mean time between failures..
MTTR = Mean time to repair / Mean time to replace.

Operational availability is defined differently

Availability (Operational) A o = MTBM/(MTBM+MDT).

MTBM = Mean time between maintenance..
MDT = Mean Down Time


The ability of an item to perform a required function under given conditions for a given time interval.

The reliability is expressed as a probability (0-1 or 0 to 100%).  Thus the reliability of a component may be expressed as 99% that it will work successfully for one year. The reliability is essentially an indication of probability that a the item will not fail in the given time period.

A very generalised curve for the failure rates of components over time is the bathtub curve.   This shows that in the early period a number of failures result from manufacturing, assembly, commissioning, setting to work problems.  When all of the teething problems have been eliminated the remaining population has a useful life over which the items fail at a relatively low rate.  After a long operating time interval the items will fail at an increasing rate due to wear and other time related functions.   This curve applies mostly to electronic components which is why electronic products are operated continuously for set times (burn-in) prior to delivery to users..

The bathtub curve for mass produced mechanical items is controlled to minimise the initial early failure period by use of quality control to ensure uniformity of production of high reliability items.  Before items are introduced onto the market they are rigorously tested to identify and correct design and manufacturing problems.   A prime target of design, manufacturing and operation is to ensure that the useful life is extended by attention to the following factors.

  • Strength/ Life safety factors
  • Tribology considerations (Prevention of wear and lubrication )
  • Corrosion prevention
  • Protection against environment effects (temperature /humidity)
  • Fatigue
  • Vibration
  • Regular servicing (or elimination) of short life components (filters /brakes pads etc)

For systems with items in series the overall reliability is the product of the reliabilities of the individual components..

For systems with active items in parallel the resulting reliability is improved. For example if there are two items in parallel A (Reliability Ra) and B (Reliability Rb).  The overall reliability is = 1-(1-Ra)*(1-Rb)


The ability of an item under given conditions of use, to be retained in, or restored to, a state in which it can perform a required function, when maintenance is performed under given conditions and using stated procedures and resources.

When a piece of equipment has failed it is important to get it back into an operating condition as soon as possible, this is known as maintainability.   To calculate the maintainability or Mean Time To Repair (MTTR) of an item, the time required to perform each anticipated repair task is multiplied by the relative frequency with which that task is performed(e.g. no. of times per year).    MTTR data supplied by manufacturers will be purely repair time which will assume the fault is correctly identified and the required spares and personnel are available.   The MTTR to the user will include the logistic delay as shown below.  The MTTR should also include factors such as the skill of the maintenance engineers

MTTR User factors...

  • Detection of fault
  • Start Up mainenance team
  • Diagnose fault
  • Obtain Spare parts
  • Repair (MTTR-Manufacturers information)
  • Test and accept repair
  • Start up equipment

Safety Factors..

Basic Notes on Factor of Safety

The factor of safety also known as Safety Factor, is used to provide a design margin over the theoretical design capacity to allow for uncertainty in the design process.   The uncertainty could be any one of a number of the components of the design process including calculations, material strengths, duty, manufacture quality.  The value of the safety factor is related to the lack of confidence in the design process.   The simplest interpretation of the Factor of Safety is

FoS = Strength of Component / Load on component

If a component needs to withstand a load of 100 Newtons and a FoS of 4 is selected then it is designed with strength to support 400 Newtons...

The selection of the appropriate factor of safety to be used in design of components is essentially a compromise between the associated additional cost and weight and the benefit of increased safety and/or reliability.   Generally an increased factor of safety results from a heavier component or a component made from a more exotic material or / and improved component design

The factors of safety listed below are based on the yield strength..

Factor of SafetyApplication
1.25 - 1.5 Material properties known in detail.  Operating conditions known in detail   Loads and resultant stresses and strains known with with high degree of certainty.   Material test certificates, proof loading, regular inspection and maintenance.    Low weight is important to design.
1.5 - 2Known materials with certification under reasonably constant environmental conditions, subjected to loads and stresses that can be determined using qualified design procedures. Proof tests, regular inspection and maintenance required
2 - 2.5 Materials obtained for reputable suppliers to relevant standards operated in normal environments and subjected to loads and stresses that can be determined using checked calculations.
2.5 - 3 For less tried materials or for brittle materials under average conditions of environment, load and stress.
3 - 4 For untried materials used under average conditions of environment, load and stress.
3 - 4 Should also be used with better-known materials that are to be used in uncertain environments or subject to uncertain stresses.

Repeated Cyclic loads :
The factors established above must be based on the endurance limit ( fatigue strength ) rather than to the yield strength of the material.  The strength calculations should also include for stress concentration factors.

Impact Shock forces :
The factors given in items 3 to 6 are acceptable, but an impact factor (the above dynamic magnification factor) should be included.

Brittle materials :
The ultimate strength is used as the theoretical maximum, the factors presented in items 1 to 6 should be approximately doubled.

Impact Shock forces :
The higher factors of safety given above (2.5 to 4) may be used but based on stress levels calculated based on the resulting dissipated energy at impact.

Where higher factors might appear desirable, a more thorough analysis of the problem should be undertaken before deciding on their use.

Extreme care must be used in dealing with vibration loads, more so if the vibrations approach resonant frequencies.   The vibrations resulting from seismic disturbances are often important and need to be considered in detail.

Use of Standards and Codes

A convenient method of ensuring safe confident design is to use design codes; A good standard used by mechanical engineer is

BS 2573-Pt 1:1983 Rules For Design of Cranes.   Specification for Classification, stress, Calculations and design criteria for structures.

This standard (together with BS 2573 part 2) includes rules for completing calculations and applying factors and the relevant allowable stresses to be used for the different grades of materials.   This standard is primarily used for design of cranes and associated equipment but it is used widely for design of similar mechanical systems.   When designing systems based using the rules from this standard it is not generally necessary to include additional margins of safety.
When design engineering structures using structural steel section a useful standard is..

BS 5950-1:2000-Structural use of steelwork in building. Code of practice for design. Rolled and welded sections.

This standard together with BS 5950-Part 2,3-1,4,5,6,7,8 & 9 provide service factors and design stresses relevant to structural design.

In designing many equipment items including vessels, pumps, valves, piping systems there are equivalent standards and codes which should be followed.  These documents generally identify the necessary design procedures and the safety margins to be included.

Use of Proprietary Items
A mechanical design often includes rolling element bearings, gearbox units, shaft couplings, belt /chain drives etc.  When using these items it is necessary to strictly follow the design rules provided in the suppliers technical documents.  The operating duties and service factors to be used are generally clearly specified.   It not correct to simply use oversized equipment for convenience.  It is also recommended that the supplier is consulted on the duty.  

Failure Distributions

Note: The information below is provided as an overview of failure distributions.. More detailed and reliable information is provided at the sites linked at the bottom of this page...


In determining the lifetime reliability of a population of components (bearings, seals, gears etc.) sample information is obtained from testing programmes and operational feedback on the failure history of components belonging to the population.  From the information obtained it is possible to produce a graph of the probability density function f(t).   This is a plot of the frequency at which components fail as a function of time divided by the whole population.

The pdf function has the property

Associated with the pdf is the Cumulative Density Function F(t).  This is simply a plot of the cumulative fraction of the failure population against time.  It is the integral of the f(t) against time (t).

The CDF has the properties

This effectively means that at time 0 no failures have occurred.

At infinity the whole population of components will have failed.


The reliability may be expressed that.. for time = a ( e.g 10 years ) there is a 90% chance of the item surviving (not failing)... = 1 in 10 is likely to fail.

Hazard Rate

The hazard rate may be expressed as... the failure rate will be 2 x 10 -4 (failures /unit time) or 2 failures per 10 4 time units

Mean Life Function

The mean life provides the average life to failure of components is also called the Mean Life Between Failures (MLBF) and the Mean Time to Failures (MTTF)

The MTTF /MTBF may be expressed as say 1,000 hours at which 50% of units have failed

Failure Distributions

The pdf curve can take many forms....Some of the different distributions are listed below

Normal Distribution

One curve representing purely random events is the normal (gaussian) curve.
This is shown below with the associated CDF.

The equation for the normal distribution is :

  • μ = The mean (MTTF , MTBF)
  • σ = The standard Deviation

Both of these parameters are estimated from the data, i.e. the mean and standard deviation of the data.   From these parameters f(t) is fully defined enabling evaluation of f(t) from any value of t.

Note: The standard deviation is a measure of scatter of the information.    A small standard deviation is a thinner higher bell and a large standard deviation is a wider flatter bell.

Normal Distributions are appropriate in the following conditions

  • There is a strong tendency for the variable to take a central value;
  • Positive and negative deviations from this central value are equally likely;
  • The frequency of deviations falls off rapidly as the deviations become larger.

The Lognormal Distribution

The lognormal distribution is commonly used for general reliability analysis, cycles to failure in fatigue and material strengths and loading.

The data follows the lognormal distribution when the natural logarithms of the times-to-failure are normally distributed.

  • σT1 = Standard Deviations of the natural logarithm of times-to-failure
  • μ' = Mean of the natural logarithm of times-to-failure

Weibull Distribution

The Weibull distribution is a general-purpose reliability distribution used to model material strength, times-to-failure of electronic and mechanical components, equipment, or systems.   In its most general case, the three-parameter Weibull pdf is defined by:

with three parameters, where :

  • β = shape parameter
  • γ = location parameter
  • η = scale parameter

If the location parameter γ is assumed to be zero then the distribution is known as the two-parameter Weibull distribution...

The β = shape parameter gives indications on the prevalent failure modes.

  • β < 1 indicates 'infant mortality' due to poor production quality or insufficient burn-in
  • β = 1 indicates random failures which are independent of time. Human errors , natural events etc.
  • β = 1 to 4 indicates early wear out i.e. erosion, corrosion, early fatigue 1
  • β > 4 indicates old age and rapid wear out. bearing failures, corrosion, erosion, fatigue etc.

The Exponential Distribution

The exponential distribution is a commonly used distribution in reliability engineering.   Mathematically, it is a fairly simple distribution, which sometimes leads to its use in inappropriate situations.   This distribution is used to model the behavior of units that have a constant failure rate.

An exponential distribution can easily be described as follows...
If a thousand items have a constant failure rate of 10% per month.  After the first month 100 items have failed (0.1 x 1000) leaving 900 items.  After the second month 90 items will have failed (0.1 x 900) leaving 810 items...  After 12 months 31 items will fail leaving 282 items.

  • λ = Scaling factor=Failure Rate
  • γ = Location factor

The mean time to failure of this distribution is

If the location parameter γ is assumed to be zero then the distribution is called the one parameter exponential distribution.

The mean time to failure and the reliability of this distribution is