Peptide Evidence Levels Explained

Every article on Peptidings assigns a compound to one of five evidence tiers. Those tiers appear in cards, in tables, on featured images, and in the text of every research overview. But a tier badge is only as useful as your understanding of what it actually means—and what it does not mean.

This guide explains how Peptidings constructs its evidence framework, what each tier designation requires, why the distinctions matter for anyone evaluating research claims about peptides, and how to read a study well enough to form your own assessment rather than taking any site’s summary—including this one—on faith. The last section is the most important: evidence interpretation is a skill, and the most common errors in community discussion of peptide research are all errors in evidence interpretation.

Table of Contents

Why Evidence Tiers Matter
The Five Evidence Tiers Explained
The Evidence Hierarchy: Study Types Ranked
Understanding Preclinical Evidence: What Animal Studies Can and Cannot Tell You
Understanding Human Evidence: Trial Phases, Sample Sizes, and What “Statistically Significant” Means
Common Evidence Gaps in Peptide Research
How to Read a Study Yourself
The Most Common Evidence Interpretation Errors

Why Evidence Tiers Matter

Peptide research is plagued by a specific kind of miscommunication: evidence from one context is presented as though it applies to a different context. A compound with striking results in mouse models gets discussed as though those results tell us what will happen in humans. A study measuring a biomarker change gets discussed as though it established clinical benefit. A single small trial with positive results gets discussed as though the compound is proven effective.

Plain English

Evidence tiers are a shorthand for how confident we can be in a claim. “Approved Drug” means it’s been through the full gauntlet of human testing and regulatory review. “Preclinical Only” means it’s only been tested in animals or cells—promising perhaps, but not proven in people.

None of these translations are automatically valid. Each requires a specific set of conditions to be true—conditions that are often not met, and that are rarely stated when the evidence is cited. The result is that community discussions of peptide research are filled with claims that are technically sourced but substantively misleading.

Evidence tiers are a shortcut solution to a real problem: they force every compound into an explicit category that communicates the type and quality of evidence available, so readers can calibrate their interpretation accordingly. A compound in the Preclinical tier has a different evidentiary status than a compound in the Clinical Trials tier—and that difference has direct practical implications for what it is reasonable to expect, what risks are known, and what claims can be made with intellectual honesty.

↑ Back to contents

The Five Evidence Tiers Explained

APPROVED DRUGTier 1

An approved drug has been through the full regulatory review process of a major regulatory agency—the FDA (United States), EMA (European Union), or equivalent—and received marketing authorization for at least one therapeutic indication. Regulatory approval requires demonstrated safety and efficacy in controlled clinical trials that meet the agency’s evidentiary standards. It represents the highest level of evidence for a specific application in a specific population.

On Peptidings, only one compound currently holds Approved Drug status: Thymosin Alpha-1 (Zadaxin/thymalfasin), which is approved in more than 35 countries for hepatitis B and related indications. The GLP-1 receptor agonists (semaglutide, tirzepatide, liraglutide) are also approved drugs. Note that approval status is indication-specific—a compound may be an approved drug for one use while being an investigational compound for another.

CLINICAL TRIALSTier 2

Clinical trials tier compounds have completed at least one Phase I trial in humans and may be progressing through Phase II or Phase III development. This tier represents meaningful human evidence—safety data from Phase I, and potentially preliminary efficacy data from Phase II. These compounds are being evaluated in the standard regulatory pathway; they have not yet completed that process.

The gap between Phase I and Phase III is large and should not be minimized. Phase I demonstrates only that a compound can be given to humans without immediate serious toxicity and establishes basic pharmacokinetics. Phase II begins to examine efficacy but in small populations. Phase III is the large-scale randomized controlled trial that provides the definitive efficacy evidence for regulatory submission. Many compounds fail between Phase I and approval.

PILOT DATATier 3

Pilot data compounds have some human evidence—small open-label studies, safety observations, single-arm trials, or case series—but lack the controlled trial evidence of Tier 2 compounds. The human evidence is real but limited. It typically demonstrates that the compound has been administered to humans without catastrophic consequences and may show preliminary signals of biological activity, but it does not establish efficacy by controlled trial standards.

BPC-157 is the most prominent Tier 3 compound on this site. Its three human studies (interstitial cystitis, IV safety, knee injection) represent genuine human data but do not constitute controlled efficacy evidence for any indication. GHK-Cu in topical cosmetic formulations also falls here—human data exists for topical use but not for the injectable systemic applications common in self-experimentation.

PRECLINICAL ONLYTier 4

Preclinical compounds have no published human trial data for exogenous administration. Their evidence base consists entirely of in vitro (cell culture) studies and animal models. This tier includes many of the most commonly discussed compounds in self-experimentation communities—TB-500, KPV, LL-37, VIP (for most applications), and many others.

Preclinical tier does not mean “no evidence”—it means no human evidence. A compound can have an extensive, well-replicated, mechanistically sophisticated preclinical record and still be Preclinical tier because it has never been tested in a controlled human study. The distance from a well-characterized rodent model to human clinical application remains large. Preclinical data is a prerequisite for clinical development; it is not a substitute for it.

IT’S COMPLICATEDTier 5

The It’s Complicated designation is reserved for compounds where the evidence picture is genuinely nuanced in a way that a simple tier assignment would misrepresent. The most common scenario is route-dependent evidence—GHK-Cu is the clearest example: it has decades of human evidence for topical cosmetic use but no human evidence for injectable systemic use. Assigning it Tier 3 (based on topical data) or Tier 4 (treating it as purely preclinical) would both be misleading. The It’s Complicated designation flags that the evidence applies specifically to one context but not another, and the article explains which is which.

↑ Back to contents

The Evidence Hierarchy: Study Types Ranked

Within and across tiers, not all evidence is equal. The evidence hierarchy ranks study types by their ability to support causal conclusions—to establish that a compound causes an effect rather than merely being associated with it.

Study Type	Strength	Key Limitation
Systematic review / meta-analysis of RCTs	Highest	Only as good as the individual RCTs included; garbage in, garbage out
Randomized controlled trial (RCT)	High	Small samples, short duration, or specific populations may limit generalizability
Non-randomized controlled trial	Moderate	Selection bias in who receives treatment vs. control
Cohort study (observational)	Moderate–low	Confounders—other variables that explain outcomes—are not controlled
Case series / case reports	Low	No control group; cannot distinguish compound effect from natural history or placebo
Animal models (in vivo preclinical)	Low for human applications	Species differences in physiology, metabolism, and disease mechanisms; poor historical translation rate
Cell culture (in vitro)	Lowest for human applications	Isolated cells lack the complexity of whole organisms; pharmacokinetics, metabolism, and systemic effects are absent

↑ Back to contents

Understanding Preclinical Evidence: What Animal Studies Can and Cannot Tell You

Preclinical research—cell culture (in vitro) and animal model (in vivo) studies—is the foundation of drug development. It establishes mechanism, tests biological plausibility, identifies dose ranges, and flags safety signals before human trials begin. Without preclinical research, clinical development would be blind. With only preclinical research, the actual therapeutic potential in humans remains unknown.

Plain English

Animal studies are the starting point, not the finish line. A compound that works in mice may not work in humans—different biology, different metabolism, different doses. Preclinical evidence tells you something is worth investigating further, not that it works for people.

What Preclinical Studies Can Tell You

In vitro studies can establish whether a compound interacts with a specific receptor or enzyme, whether it inhibits or activates a specific pathway, and whether it has cytotoxic effects at given concentrations. They are the right tool for mechanism investigation and early safety screening. A compound that shows no activity in vitro is unlikely to show activity in animals or humans via that mechanism (though pharmacokinetics mean oral inactivity can still accompany parenteral activity).

Animal model studies can establish whether a compound produces measurable effects in a living system at biologically relevant concentrations. A well-designed rodent model—particularly a genetically validated model of a specific disease—can demonstrate pharmacological activity and provide a dose-response relationship. Consistently positive results across multiple independent research groups in well-validated models provide stronger evidence than single-group results in poorly characterized models.

What Preclinical Studies Cannot Tell You

Preclinical studies cannot establish that a compound works in humans at human-relevant doses via human administration routes. This is not a trivial limitation—it is the central limitation that separates preclinical from clinical evidence.

The historical translation rate from animal models to successful human therapeutic applications is poor. Estimates vary by therapeutic area, but across drug development broadly, fewer than 10% of compounds that show promising preclinical results are eventually approved for clinical use. In some therapeutic areas—neurological disease, oncology, sepsis—the translation rate is even lower. IBD drug development in particular has a long record of compounds that worked brilliantly in mouse colitis models and failed in human trials.

The reasons for poor translation are multiple: differences in metabolism between species (a compound metabolized quickly in humans may be slow in mice, producing different exposure profiles); differences in disease mechanisms (mouse experimental colitis shares some features with human IBD but differs in important ways); differences in receptor density and distribution; and the fundamental reality that humans are not large mice.

The Dose Translation Problem

A commonly seen practice in community discussions is direct dose translation from rodent studies to human use using body weight scaling. If a mouse study used 5 mg/kg, the argument goes, a 70 kg person should use 350 mg. This calculation is pharmacologically incorrect. Rodents have substantially higher metabolic rates per unit body weight than humans, and the correct conversion requires allometric scaling—a body surface area-based approach that produces human-equivalent doses substantially lower than naive weight-based calculations. Even after allometric correction, the resulting dose is an estimate that requires clinical pharmacokinetic validation, not an established human dose.

↑ Back to contents

Understanding Human Evidence: Trial Phases, Sample Sizes, and What “Statistically Significant” Means

Clinical Trial Phases

Phase I trials are primarily safety studies. They enroll a small number of subjects (typically 20–80), often healthy volunteers, and establish the safety and tolerability profile of a compound, characterize its pharmacokinetics (how it is absorbed, distributed, metabolized, and eliminated), and identify any dose-limiting toxicities. Phase I does not establish efficacy. A compound completing Phase I without serious adverse events has passed a safety threshold; it has not been shown to work for anything.

Phase II trials begin to examine efficacy in a target patient population. They are larger than Phase I (typically 100–300 subjects) but still relatively small. Phase II is exploratory—it looks for signals of efficacy and refines the dose and population for Phase III. A positive Phase II result is encouraging but historically unreliable; many compounds that showed efficacy in Phase II failed in Phase III trials with larger, more rigorous designs.

Phase III trials are the pivotal trials—large (hundreds to thousands of subjects), randomized, controlled, and designed to establish efficacy at a standard that meets regulatory requirements for approval. A positive Phase III result in a well-designed, pre-specified trial is the gold standard of clinical evidence. This is what Thymosin Alpha-1’s hepatitis B approvals were based on; this is what TESTS (the sepsis RCT) failed to achieve.

Sample Size and Statistical Power

Sample size matters because small trials are unreliable. A trial with 20 patients showing a positive effect may be a true signal or may be chance—the probability of a false positive result (finding an effect that is not real) is substantially higher in small trials than large ones. This is not a trivial concern: a positive result in a 20-patient trial has roughly a 1-in-5 chance of being a false positive under standard statistical thresholds, even when everything is done correctly.

The concept of statistical power—the probability that a trial will detect a real effect if one exists—is equally important. An underpowered trial that shows no effect does not prove the compound is ineffective; it may simply have been too small to detect the real effect. Interpreting a negative result requires knowing whether the trial was adequately powered, which requires knowing the expected effect size and the sample size calculation used in trial design.

What “Statistically Significant” Actually Means

Statistical significance (p < 0.05) means only that the observed result is unlikely to have occurred by chance if there were truly no effect—not that the effect is large, clinically meaningful, or reproducible. A trial can achieve statistical significance with a trivially small effect in a large enough sample, or fail to achieve it for a genuinely meaningful effect in a small sample. Statistical significance is a threshold, not a quality judgment.

The clinical significance of a result—whether it is large enough to matter to patients—is a separate question that requires understanding the effect size (how large the observed difference was) and its confidence interval (the range of values within which the true effect is likely to lie). A drug that produces a statistically significant improvement in a biomarker of 2% in a large trial may not be clinically useful; a drug that produces a 40% improvement in symptoms in a small trial may be extremely important if the effect size is real and replicable.

↑ Back to contents

Common Evidence Gaps in Peptide Research

Several specific evidence gaps recur across the peptide research literature. Recognizing them when you encounter them is more valuable than memorizing a list of safe and unsafe compounds.

The route-of-administration gap is among the most common and most consequential. Evidence from one administration route does not transfer to another. A compound with positive effects when administered intraperitoneally in rodents (a route not used in humans) may or may not show the same effects when administered subcutaneously. A compound with human topical efficacy data may or may not be effective systemically. GHK-Cu’s topical human data does not validate injectable GHK-Cu; KPV’s positive nanoparticle-formulated oral data does not validate free subcutaneous KPV.

The formulation gap is related. Aviptadil (inhaled/IV VIP) clinical trial data does not validate subcutaneous research-grade VIP. The nanoparticle-encapsulated KPV used in the most impressive IBD preclinical studies is not the same as free KPV available through research suppliers. When a compound’s positive evidence comes from a specific formulation, the evidence applies to that formulation.

The indication gap refers to evidence from one application being used to support a different application. BPC-157’s gut injury evidence does not automatically transfer to tendon injury evidence, even if the mechanism is broadly “tissue repair.” Thymosin Alpha-1’s hepatitis B approval does not establish its efficacy for general immune optimization in healthy adults.

The population gap involves evidence from diseased populations being extrapolated to healthy individuals. VIP’s positive RA trial enrolled patients with active rheumatoid arthritis. Whether VIP produces meaningful anti-inflammatory effects in healthy adults without autoimmune disease is not established—the physiological context and the inflammatory signaling environment are fundamentally different.

↑ Back to contents

How to Read a Study Yourself

Reading primary research literature is a learnable skill. It does not require a PhD—it requires a systematic approach and the willingness to ask specific questions of each paper.

Start with the Methods, not the Abstract. Abstracts are written to sell the paper. Methods are where the study design, sample size, controls, and outcome measures are specified. Before reading the Results, know: Was this randomized? Was there a control group? How many subjects? What was the primary endpoint? Was it pre-specified before data collection, or chosen after looking at the data?

Identify the primary endpoint. The primary endpoint is the outcome the study was designed and powered to detect. Secondary endpoints and post-hoc analyses have a much higher false-positive rate and should be treated as hypothesis-generating, not as established findings. If a trial fails its primary endpoint but shows positive results in a subgroup or secondary analysis, the appropriate interpretation is “this needs to be tested prospectively in a trial designed for that specific question”—not “the compound works for this subgroup.”

Look for the control condition. A study without a control group cannot distinguish compound effects from natural history, regression to the mean, or placebo effects. Case series, open-label uncontrolled studies, and before-after comparisons without controls produce findings that are real observations—but they cannot establish causation. People often get better on their own; apparent improvements in uncontrolled settings may have nothing to do with the compound being studied.

Check the sample size and funding source. Is the trial adequately powered for the reported effect size? Was the study funded by a company with a financial interest in the outcome? Industry funding is not automatically disqualifying, but industry-funded trials have historically shown larger effect sizes than independently funded trials of the same compounds—a systematic bias worth factoring into your reading.

Find the limitations section. Every well-written paper includes a limitations section. What does the authors’ own assessment of the study’s weaknesses say? Limitations sections written honestly are among the most useful parts of a paper.

↑ Back to contents

The Most Common Evidence Interpretation Errors

These errors recur constantly in community discussions of peptide research. Recognizing them when you encounter them is more protective than any specific compound knowledge.

Plain English

The most common mistake in peptide discussions: treating animal study results as if they apply directly to humans. “Studies show” often means “mouse studies show”—and that’s a fundamentally different statement than “human trials show.”

Treating mechanism as evidence of effect. “This compound activates VEGF, and VEGF promotes wound healing, therefore this compound heals wounds.” Mechanistic plausibility is a prerequisite for research interest, not evidence of clinical efficacy. Many mechanistically coherent compounds fail in clinical trials because the mechanism operates differently in disease contexts, because the magnitude of the effect is insufficient, or because off-target effects outweigh the intended benefits.

Treating “natural” or “endogenous” as equivalent to “safe.” Endogenous origin does not confer safety at exogenous doses. Insulin is endogenous; administering too much is life-threatening. Cortisol is endogenous; chronic elevated cortisol causes Cushing’s syndrome. LL-37 is endogenous; it drives psoriasis and rosacea pathology. The safety profile of exogenous administration must be established independently of the compound’s natural biological origin.

Treating positive preclinical results as established efficacy. The majority of compounds that succeed in preclinical development fail in human trials. A positive mouse model result is a preliminary signal that warrants further investigation—it is not evidence that the compound works in humans.

Treating absence of reported adverse events as proof of safety. Community self-experimentation reports capture obvious, acute, short-term adverse events. They do not capture subclinical effects, drug interactions, delayed adverse events, or effects that manifest only after chronic exposure. Absence of reported problems in an uncontrolled self-experimentation population is the weakest possible form of safety evidence.

Treating a single positive study as definitive. Individual studies are hypothesis generators and preliminary evidence generators. Replication—independently conducted studies by different groups in different populations—is what builds confidence in a finding. A single positive trial, especially a small one, should be taken as a reason to want more research, not as established fact.

Misreading secondary endpoints or subgroup analyses as primary findings. This is the most common form of selective reporting in the research literature. A trial fails its primary endpoint; the authors report that the compound worked in a specific subgroup or on a secondary outcome. This finding may be real, but it requires prospective validation in a trial designed specifically to test it. Until then, it is a hypothesis.

Evidence Level Browse

Browse all compounds by evidence tier

FDA and WADA Regulatory Status

Category 1/2/3, compounding rules, WADA list

Peptide Research Glossary

Definitions for RCT, p-value, Phase I, allometric scaling, and more