DOE for Beginners: Running Your First Designed Experiment
Design of Experiments sounds intimidating. It does not have to be. At its core, DOE is just a structured way to figure out what affects what. Instead of changing one thing at a time and hoping for the best, you change multiple things simultaneously and let the math sort out which ones actually matter.
This guide walks through running your first experiment. No advanced statistics background needed.
## Why One-Factor-at-a-Time Does Not Work
The instinct most people have is to change one variable, hold everything else constant, and see what happens. Then change the next variable. Then the next. This is called OFAT (one-factor-at-a-time), and it has two big problems:
1. **It misses interactions.** Factor A might only matter when Factor B is at a certain level. OFAT will never find that because it does not test combinations.
2. **It is inefficient.** A 2-factor DOE with 2 levels each needs 4 runs. The OFAT equivalent needs the same 4 runs but gives you less information because you cannot estimate the interaction.
With 5 or 6 factors, the difference becomes huge. OFAT might need 30+ runs to cover what a fractional factorial handles in 16.
## Start Simple: The 2-Level Factorial
The most common starting point is a 2-level full factorial. You pick your factors, set each one at a "low" and a "high" level, and run every combination.
**Example:** You are trying to reduce surface roughness in a machining process. You suspect three factors matter:
| Factor | Low (-) | High (+) |
|--------|---------|----------|
| Speed | 800 RPM | 1200 RPM |
| Feed rate | 0.05 mm/rev | 0.15 mm/rev |
| Depth of cut | 0.5 mm | 1.5 mm |
A full factorial with 3 factors at 2 levels = 2 cubed = 8 runs. Each run is a unique combination of low/high settings.
## Planning the Experiment
This is the part that matters most. A well-planned experiment with simple analysis beats a poorly planned experiment with sophisticated statistics every time.
**1. Define the objective clearly.**
"Reduce surface roughness" is okay. "Identify which factors most affect surface roughness and find the settings that minimize it" is better.
**2. Pick your response variable.**
What are you measuring? Make sure you can measure it reliably (this is where your Gage R&R matters).
**3. Choose your factors and levels.**
Start with what you know. Talk to operators, review process history, look at scrap data. Pick the factors most likely to matter. Set the levels far enough apart that you would expect to see a difference, but not so extreme that you create unsafe or unrealistic conditions.
**4. Identify nuisance factors.**
These are things that might affect the result but are not what you are studying. Ambient temperature, material batch, operator. You deal with these through blocking or randomization.
**5. Determine the number of replicates.**
Running each combination once gives you the main effects and interactions but no estimate of pure error. Running each combination twice (2 replicates = 16 total runs) gives you error estimation and more statistical power.
**6. Randomize the run order.**
This is non-negotiable. If you run all the low-speed tests first and then all the high-speed tests, any drift over time gets confused with the speed effect. Randomize.
## Running the Experiment
- Follow the run order exactly. Do not skip ahead because a setup is convenient.
- Record everything. Not just the response, but the actual settings, the time, who ran it, any observations.
- If something goes wrong on a run, note it. Do not throw it out silently and do not re-run it without documenting why.
- Be consistent in how you measure the response. Same person, same method, same timing.
## Analyzing the Results
The analysis of a factorial experiment gives you:
**Main effects:** The average impact of changing each factor from low to high. If speed has a main effect of -2.3 on roughness, that means increasing speed from 800 to 1200 RPM reduces roughness by 2.3 units on average (across all levels of the other factors).
**Interactions:** When the effect of one factor depends on the level of another. If the speed x feed rate interaction is significant, it means the effect of speed is different at low feed rate versus high feed rate. You cannot just say "higher speed is better" without specifying what feed rate you are at.
**Pareto chart of effects:** Ranks all effects by magnitude. This is your quick visual answer to "what matters most?"
**Main effects plot:** Shows the average response at each level of each factor. The steeper the line, the bigger the effect.
**Interaction plot:** Parallel lines mean no interaction. Non-parallel lines mean the factors interact. Crossing lines mean a strong interaction.
## Making Decisions
Once you see the results:
1. **Identify the significant factors.** Use the Pareto chart and p-values. Factors with p < 0.05 are statistically significant at the 95% confidence level.
2. **Check for interactions.** If two factors interact, you need to consider them together, not independently.
3. **Find the best settings.** For a 2-level design, look at which combination of levels gives the best response.
4. **Run confirmation runs.** Set the process to your predicted best settings and run a few parts. Does the actual result match the prediction? If yes, you have found your improvement. If no, dig deeper.
## When You Need More Than 2 Levels
A 2-level factorial tells you direction (more vs. less) but not curvature (is there an optimum in the middle?). If you suspect there is a sweet spot between your low and high, you have two options:
- **Add center points** to your 2-level design. Run a few experiments at the midpoint of all factors. If the center point response is significantly different from the average of the corners, curvature exists.
- **Move to a Response Surface Design** (Central Composite or Box-Behnken) for full optimization. But do this after the screening phase, not as your first experiment.
## Common Mistakes
**Too many factors in the first experiment.** Start with 3-5 factors. If you have 15 possible factors, do a screening design first (Plackett-Burman or fractional factorial) to narrow it down.
**Levels too close together.** If low speed is 990 RPM and high speed is 1010 RPM, you are unlikely to see a difference. Be bold with your levels.
**No replication.** A single replicate gives you effects but no way to judge if they are real or just noise.
**Not randomizing.** Seriously. Randomize. Time-based confounding is one of the most common ways experiments go wrong.
**Skipping confirmation runs.** The model predicts an outcome. Verify it. Prediction without verification is just a hypothesis.