[Skip to content]

skip links | Accessibility| Site map| Text resizer : larger / normal / smaller| Screen : widescreen|
N.H.S. logo
.

Methods

 

 

Using and understanding public health data

 

Data sources are only the start of understanding local populations.  They provide useful information on demography, level of ‘need’ and service utilisation.  Use of appropriate statistical techniques can make this data more meaningful by allowing comparisons with other areas or by analysing trends over time.

 

The Kent & Medway Public Health Observatory makes use of a number of statistical techniques and public health tools such as direct and indirect standardisation; health needs assessment; health equity audit; health surveillance; and health impact assessment.

 

This section describes some of the techniques used to present, analyse and interpret data.

 

Numbers and rates

 

Data can be interpreted differently depending on the way it is presented.  Absolute numbers describe the ‘size’ of something in an abstract sense but do not give the reader any indication of the relative scale of the issue or the extent of the health need.

 

For example, when comparing the number of heart attacks in two wards Area A (100) and Area B (150), one may assume that heart attacks are a bigger issue in Area B than Area A.  Although this is correct from a purely numerical standpoint, this sort of simple assessment does not necessarily indicate which area has the greatest need.  From a public health perspective it is the health needs at a population level which are important.  To identify greater health need it is necessary to use rates.

 

There are occasions when the use of absolute numbers is more appropriate, for example in performance management or contract monitoring, where variance from plan requires monitoring and action needs to be immediate.

 

A ‘crude’ rate is simply the number of events (or people) sharing a common characteristic of interest observed in a given population divided by the total number of events (or people) in that population.  Depending on the size of the rate or the situation, it may be expressed in terms of a percentage, per 1,000, per 10,000, per 100,000 or even per million by multiplying by the respective number.

 

For example in Practice X there are 175 Diabetics and in Practice Y there are 200 Diabetics.  The populations are 5,000 and 5,900 respectively.  If we wanted to compare the relative Health Needs of the two, we would need to produce a Crude rate in the form of a percentage prevalence rate.

 

Practice X – (175 / 5,000) x 100 = 3.5%

 

Practice Y – (200 / 5,900) = 100 = 3.4%

 

What this sort of Analysis shows is that although the number of Diabetics is 25 higher in one practice compared to the other, when the size of the populations is taken into account, the prevalence figures are very similar.  In practice, with this kind of example, it would be appropriate to make some further inquiries into the level of socio-economic deprivation of the catchment area each practice serves, its ethnic breakdown and also its age structure as all these can influence the prevalence of certain diseases.

 

An extension of the crude rate for the whole population is an age-specific rate.  This is particularly useful in either of the following two situations:

 

1) where the characteristic of interest only affects a narrow band of the population so yielding more accurate rates.  These may also be further defined by gender – e.g. under 18 Conception rates (females ages 15-17) or general fertility rates (females 15-44).

 

2) where the likelihood of occurrence of the characteristic of interest varies with age so a series of crude rates over multiple age bands is produced for comparisons within the population but also with other populations – e.g. smoking prevalence from school age upwards.

 

It can be quite difficult to make sense of a series of age-specific crude rates and so it is often desirable to calculate an overall summary rate which retains the sensitivity to rate differences among the age bands of the population but which also provides a single summary figure for comparisons over time or with other areas.

 

Such summary rates are known as ‘age-standardised rates’.  These are used most commonly with mortality and hospital admissions.  There are two forms:

 

·          directly age-standardised rates

·          indirectly age-standardised rates also known as ‘standardised mortality / morbidity ratios’

 

A directly age-standardised rate is the rate of events that would occur in a chosen standard population if that standard population had the same age-specific rates of the subject population.  Here the ‘standard’ population is usually the European standard population but could be some other large area for which it is appropriate to compare with the subject population.

 

An indirectly age-standardised rate is the ratio of the observed total number of events in the subject population divided by the expected number of events in the subject population if it had the same age-specific rates of the standard population.  Here the ‘standard’ population has a different meaning.  It could be a regional comparator or the same area a number of years apart (if change over time is being analysed) but crucially it must be an area where it is possible to calculate age-specific rates to then apply to the subject population.

 

Sometimes, these measures of age-standardisation are further adjusted to take account of gender.

 

For further information or assistance, please contact the Kent & Medway Public Health Observatory.

 

 

Presenting data

 

Data may be presented in a number of formats such as charts, maps, tables and as text.  This section takes a look at some of the more commonly used methods highlighting where they may be most effectively used.

 

Charts for categorical data

 

Pie charts and bar charts are the most frequently used methods for representing categorical data.  Pie charts visualise the proportions within each category and always total 100%.  Therefore the categories must be mutually exclusive.  Bar charts however can be used to represent categories that are not mutually exclusive. 

 

Charts for trends and association

 

Line charts are used to represent trends over time.  These can sometimes be difficult to interpret if numbers are small as the trend would appear to be erratic with many peaks and troughs.  For indicators such as suicides where numbers are small it is better to present the data as 3 year rolling averages to minimise year on year variation.  Line charts should not be used to represent categorical data.

 

A visual representation of association can be shown in scatter plot diagram.  The independent (or exposure) variable should be plotted on the x axis and the dependent (or outcome) variable should be plotted on the y axis.  A line of best fit can be drawn to identify strength and direction of the relationship.

 

Geographical information systems (maps)

 

Many publications and documents present data using maps.  Maps are an effective way of displaying vast amounts of information.  The reader can see where geographical differences may exist at a glance.  Maps can be used to put locations of premises and services in context of need.  However, the geography of an area is determined by households and population size and although a segment of a map may appear to have a larger geographical boundary it could have a similar residential population to an area that appears to be half its size.

 

Confidence intervals

 

Charts are sometimes displayed with confidence intervals to help the reader determine the robustness of the data which is being presented.   Confidence intervals are a statistical measure of the likely accuracy of the rate.  Where rates are based on small numbers the confidence intervals will appear wider.  Confidence intervals are usually presented as 95%, although 99% or 90% may sometimes be used.  If a confidence limit is presented at 95% then this is interpreted as being 95% confident that the true value lies between the upper and lower limits. 

 

Example:

In NHS Eastern and Coastal Kent, the directly age standardised mortality rate, all circulatory disease, under 75 years for the period 2005-07 is 76.96 per 100,000.

 

The lower limit of the 95% confidence interval is 73.49 and the upper limit is 80.43 (when written in publications, this would generally be presented as 76.96 (CI 95% 73.49 to 80.43).  It can be stated that ‘we can be 95% confident that the true rate is somewhere between 73.49 and 80.43 per 100,000 population.

 

Further reading

 

The BMJ has produced a novice guide to epidemiology, entitled ‘Epidemiology for the Uninitiated' which is a useful starting point for anyone interested in finding out more.