Corporations spend billions of dollars ($370B in 2019) on training each year. Given this level of investment, it is surprising that a relatively insignificant portion of the expenditure is spent examining training efforts’ efficacy.

For over 40 years, the Kirkpatrick Model, named for its creator Dr. Donald Kirkpatrick, has provided the most extensively used training evaluation guidance. The original model had four levels, but many researchers refined it in the intervening years. Now the model is often shown with a fifth level.

Model for Measuring Training Effectiveness

The following are the levels in the Kirkpatrick Model that evaluate the outcome of training programs.

L1  Reaction

Learners’ reaction to the learning intervention (training). Questions are subjective, e.g., do you feel that the training was beneficial?

L2 – Learning

An evaluation of the knowledge transfer achieved by the learning intervention. Questions are objective, e.g., put the steps of this work procedure in their correct order.

L3 – Behavioral Change

An evaluation of whether learners apply the desired behavioral change as part of their job function.

L4 – Business Results

An evaluation of whether the targeted behavioral changes are translating into performance improvements.

L5 – Training ROI

A ratio derived by comparing the cost of training development and administration to the financial benefits derived from the behavioral change.

Barriers to Implement Evaluation Programs

Many organizations find themselves unwilling to follow up their training dollars with additional evaluation expenditures. However, this is both counterproductive and counterintuitive. Only by gathering and analyzing appropriate evaluative data can any organization hope to produce and iterate effective learning programs. This maxim becomes truer the larger an organization becomes. Small behavioral changes at scale can lead to millions in benefits over the life of a learning program. Why are organizations (even large multinationals) so hesitant to employ sound evaluation practices as part of their standard operating procedures? We believe it’s because most training departments find the prospect overwhelming and lack the experience to justify broader stakeholders’ justification. The questions at the outset of any evaluation effort may seem simple but can be daunting to organizations without that experience.

  • Where do we start?
  • Which data do we gather?
  • How do we gather them?
  • How do we analyze them?

Our Approach

The easiest way to ensure your evaluations are providing the requisite data to make decisions is to think about the data at the outset of your initiative. If possible, this should be the first step of program design, just after the gap analysis but before you begin delineating learning objectives.

A close up of a logo Description automatically generated

Figure 1: Continuous Improvement model for Training effectiveness using Analytics

Phase 1 – Data Identification

If you’ve performed a gap analysis, you will have identified improvement areas, even if relatively informal. It is at this stage that you should identify the evaluative data that you will gather. For example, if the gap was related to accidents on the job, the key performance indicator (KPI) that must be measured is the change in the number of accidents over a given timeframe. A learning program may have many such data points and associated underlying supporting data that must be gathered to make informed decisions on iteration, expansion, or cancellation of the program.

This effort is often skipped, but it should take place even if you never intend to evaluate Level 2. The reason why is related to one of the most fundamental premises of learning design: the purpose of learning is behavioral change. Thus, if you don’t know which metrics you want to affect, you can’t craft an informed behavioral change strategy. Subsequently, you cannot possibly create efficient learning interventions.

Phase 2 – Learning Design, Development, and Deployment

When armed with clear targets for the metrics to be gathered, learning design becomes much more straightforward. Instructional designers work with subject matter experts to develop an approach that elicits the behavioral changes likely to affect the metrics identified in Phase 1. Only that knowledge directly tied to the identified behaviors through learning objectives should be part of the design; anything not related is extraneous and should be jettisoned.

Phase 3 – Gathering Data in the Field

Implementing all levels of the Kirkpatrick Model can be an expensive and time-consuming process. However, it is unnecessary to measure everything. We follow industry experts such as Leslie Allan, who suggest applying the levels only as appropriate, our synthesis of this guidance:

  • Level 1 (Reaction) for all programs
  • Level 2 (Learning) for “hard-skills” programs
  • Level 3 (Behavior) for strategic programs
  • Level 4 (Results) for enterprise-wide programs or programs affecting tasks with high-cost impacts
  • Level 5 (ROI) for enterprise-wide programs or programs affecting tasks with high-cost impacts

Gathering Levels 1 and 2 is typically enabled by a learning management system and is relatively straightforward.

Level 3 may involve leveraging existing reporting avenues, or it may require new technology to be put in place to gather the needed data. For example, are workers performing every step in a given work task each time it performed? There may already be technology to measure this in an automated fashion, or it may require self-reporting, supervisor observation, or a combination of all three.

Level 4 will ultimately require you to gather the Level 3 behavioral data and the data related to the KPI(s) that you identified in Phase 1.

At this stage, the key to success is collecting data from multiple sources such as (1) Learning Management System, (2) Service Management Systems such as Service Now, and (3) Navigation data using UI Analytics tools, and (4) Surveys post-training. Though the data looks disjointed and discreet, it requires some knowledge of data aggregation and ingestion so that Data scientists and Analysts can draw the insights.

Phase 4 – Developing Insights

The Levels 1-3 data gathered in the previous phase include the raw figures, responses, feedback, and other logistical information obtained directly from the Digital platform. This data can be overwhelming and may not make sense by itself. Data has to be normalized for analysis and fed to analytics platforms to gain insights. Any insights gained should be compared with the objectives and goals. This is where the specialized skills of Data Management, Data Science, and Data Analytics are necessary to aggregate, persist, curate, train, and manage the data. Achieving the desired goals requires discipline and a commitment to constantly collecting and processing the information in a non-intrusive fashion.

Level 4 calls for a more rigorous analysis strategy because one must determine if the identified behavioral changes positively affect the bottom line. You could have a highly successful training program from a behavioral change standpoint, but it could fail to close the performance gap. This gap means that you failed to associate the correct behaviors with your identified KPI(s) and that the program needs modification.

Both Levels 4 and 5 require vetting from a wider stakeholder group with the expertise to reliably agree on the relationships between KPI(s), costs, and supporting behaviors. The effort and time involved make these levels only reasonable for large, high-impact programs.



Phillips, J. J., & Stone, R. D. (2000). How to measure training results: A practical guide to tracking the six key indicators. New York: McGraw Hill.

Mazareanu, E. (2020, May 04). Global workplace training: Market size 2007-2019. Retrieved July 20, 2020, from https://www.statista.com/statistics/738399/size-of-the-global-workplace-training-market/