Information Theory
#cross-disciplinary #computational-lens
What It Is
Information theory, formalized by Claude Shannon in 1948, quantifies information as reduction in uncertainty measured in bits. One bit of information halves the uncertainty about a system's state. Information has cost (energy/resources required to acquire) and value (reduction in uncertainty enabling better decisions). This framework reveals why "gathering more information" is not always beneficial—information acquisition costs resources, and value depends entirely on whether information changes subsequent action.
The fundamental insight: information is not intrinsically valuable. Information has value only when it reduces uncertainty about decisions that matter. Acquiring information that doesn't change your next action wastes resources regardless of how interesting or comprehensive that information is. This applies directly to startup search strategies and cybernetic control loops.
Shannon's entropy formula quantifies uncertainty:
$$H(X) = -\sum P(x_i) \times \log_2(P(x_i))$$
Where:
H = entropy (uncertainty in bits)
X = random variable (system state)
P(x_i) = probability of state i
Higher entropy means higher uncertainty. Information acquisition reduces entropy by eliminating possibilities. The value of information is the entropy reduction it provides multiplied by the value of making correct decision.
Information Cost vs Information Value
Every measurement, experiment, or data collection has energy cost. The question is whether value exceeds cost.
Cost components:
| Cost Type | Startup Example | Personal Optimization Example |
|---|---|---|
| Direct cost | Runway spent on experiment | Time spent tracking |
| Opportunity cost | Alternative experiments not run | Alternative actions not taken |
| Cycle time cost | Delay before next iteration | Delay before course correction |
| Implementation cost | Infrastructure for measurement | Setup overhead for tracking system |
Value calculation:
Value = (Uncertainty_before - Uncertainty_after) × Value_of_correct_decision
Information has value when:
Value > Cost_direct + Cost_opportunity + Cost_delay
Example - Customer Interview:
Before interview:
- Uncertainty about problem severity: H = 3 bits (8 equally likely hypotheses)
After interview (high-quality behavioral data):
- Uncertainty reduced to: H = 1 bit (2 remaining hypotheses)
- Information gained: 2 bits
If decision value = 500 (time + runway), then:
- Value = 2 bits × $100K (weighted by probability of using info) = high
- Cost = $500
- Net value: Positive → run the interview
Counter-example - Feature Polish:
Before polish:
- Uncertainty about user retention: H = 2 bits
After 2 weeks of polish:
- Uncertainty: H = 2 bits (unchanged—polish doesn't test retention drivers)
- Information gained: 0 bits
Cost = 2 weeks runway, Value = 0 (no uncertainty reduction), Net value: Negative → skip the polish
Information Acquisition Strategies
High Information/Cost Ratio Activities
These provide maximum entropy reduction per resource unit:
Behavioral observation (S/N ratio: High)
- Watch what users actually do vs what they say
- Cost: Low (observation time only)
- Information: High (reveals true preferences)
- Entropy reduction: ~2-3 bits per observation session
Payment behavior (S/N ratio: Very High)
- Test willingness to pay with real pricing
- Cost: Low (pricing page + payment processing)
- Information: Very High (strongest signal of value)
- Entropy reduction: ~3-4 bits (eliminates most hypothetical interest)
Retention curves (S/N ratio: High)
- Measure if users return after first use
- Cost: Medium (requires product deployment + time)
- Information: High (predicts long-term engagement)
- Entropy reduction: ~2 bits per cohort
Low Information/Cost Ratio Activities
These provide minimal entropy reduction despite high resource cost:
Hypothetical surveys (S/N ratio: Low)
- "Would you pay for this?" questions
- Cost: Medium (survey design + analysis time)
- Information: Low (stated preferences unreliable)
- Entropy reduction: ~0.5 bits (minimal signal)
Feature development before validation (S/N ratio: Very Low)
- Build complete feature before testing core value
- Cost: Very High (weeks to months of runway)
- Information: Low (learn only whether complete package works, not which components)
- Entropy reduction: ~1 bit (binary: works or doesn't)
Long beta programs (S/N ratio: Low)
- Months of private testing before public launch
- Cost: High (delayed feedback, slow cycles)
- Information: Medium (contaminated by selection bias of beta users)
- Entropy reduction: ~1-2 bits but with delay cost
Shannon's Channel Capacity and Feedback Loops
Shannon's channel capacity theorem establishes maximum information transmission rate through noisy channel:
C = B × log₂(1 + S/N)
Where:
C = channel capacity (bits per second)
B = bandwidth (measurements per unit time)
S/N = signal-to-noise ratio (sensor accuracy)
For cybernetic control loops, this means:
- Increase bandwidth (faster measurement cycles)
- Increase signal/noise ratio (better sensors, multi-sensor fusion)
Application to personal cybernetics:
| Feedback Loop | Bandwidth (B) | Signal/Noise (S/N) | Capacity (C) | Example |
|---|---|---|---|---|
| Monthly review | 12/year | 0.6 | Low | Generic "how am I doing?" introspection |
| Weekly review | 52/year | 0.7 | Medium | Weekly pattern analysis |
| Daily tracking | 365/year | 0.8 | High | Daily braindump + whiteboard |
| Real-time biometric | Continuous | 0.9 | Very High | HRV monitoring, sleep tracking |
Higher bandwidth (more frequent measurement) × higher S/N (better sensors) = higher channel capacity = faster learning = more rapid optimization. Daily loops with calibrated sensors dominate monthly introspection with vague feelings.
Mutual Information and Sensor Correlation
Mutual information I(X;Y) measures how much knowing Y reduces uncertainty about X:
I(X;Y) = H(X) - H(X|Y)
Where:
H(X) = entropy before measurement
H(X|Y) = entropy after knowing Y
For startup sensors:
| Sensor Pair | Mutual Information | Interpretation |
|---|---|---|
| Usage × Revenue | High (I ≈ 2.5 bits) | Strong correlation; one predicts other |
| Interest × Usage | Low (I ≈ 0.8 bits) | Weak correlation; interest often doesn't convert |
| Team quality × Success | Medium (I ≈ 1.5 bits) | Moderate correlation but noisy |
| Competitor success × Your success | Low (I ≈ 0.5 bits) | Market validation but execution independent |
High mutual information between sensors enables sensor fusion—combining multiple correlated sensors improves accuracy. Low mutual information means sensors measure independent dimensions—both should be monitored separately.
Information Acquisition vs Action Execution
The critical trade-off in resource-constrained systems: spending energy gathering information vs spending energy executing based on current information.
Total_runway = E_information_acquisition + E_execution
Optimal split when:
Marginal_value(more info) = Marginal_value(more execution)
Over-information failure mode:
- 90% of runway spent researching, 10% executing
- High confidence in correct direction, insufficient time to capitalize
- "Analysis paralysis" - perpetual information gathering
Under-information failure mode:
- 10% of runway spent researching, 90% executing
- Confidently building in wrong direction
- "Premature commitment" - execution before adequate validation
Optimal balance (varies by uncertainty):
| Initial Uncertainty | Info Acquisition % | Execution % | Reasoning |
|---|---|---|---|
| Very High | 40-50% | 50-60% | Need substantial validation before commitment |
| High | 25-35% | 65-75% | Balance exploration and exploitation |
| Medium | 15-25% | 75-85% | Some validation then mostly execution |
| Low | 5-15% | 85-95% | Confirmed direction, focus on execution |
The 30-day pattern demonstrates this in habit formation: days 1-7 require high information acquisition (learning what works), days 8-30 transition to execution (repeating validated pattern).
The Value of Information Formula
From decision theory, the expected value of perfect information (EVPI):
$$EVPI = E[ ext{Value}_{ ext{perfect}}] - E[ ext{Value}_{ ext{current}}]$$
Where E[Value] = ∑ P(state) × Value(optimal decision | state)
Information worth acquiring when:
EVPI > Cost_acquisition
Practical application:
Should you spend 2 weeks building analytics dashboard?
Value if you have perfect usage data:
Build high-value features = +$200K expected value
Value with current vague intuition:
Random feature work = +$50K expected value
$EVPI = \$200K - \$50K = \$150K$
Cost = 2 weeks runway ≈ $40K
EVPI > Cost → Yes, build the dashboard
This formalizes the intuition that measurement infrastructure has high ROI when it enables better decisions at scale.
Integration with Mechanistic Framework
Information theory formalizes several mechanistic concepts:
Question Theory as information acquisition:
- Questions initiate search processes (spend cognitive energy)
- Good questions provide high information/cost ratio
- Bad questions: high computational cost, low entropy reduction
Tracking as sensor system:
- External logs reduce uncertainty about behavioral patterns
- Cost: Setup + maintenance overhead
- Value: Enables data-driven decisions vs mood-based guesses
- Net value positive when: Decision_frequency × Decision_impact > Setup_cost
The Braindump as information processing:
- Externalizes working memory state (reduces internal uncertainty)
- Processing externalized information reveals patterns (entropy reduction)
- Cost: 10 minutes daily
- Value: Clarity on next actions (eliminates ambiguity)
Related Concepts
- Startup as a Bug - Information acquisition under resource constraints
- Cybernetics - Feedback loops and sensor systems
- Optimal Foraging Theory - Search strategies under energy limits
- Expected Value - Decision-making under uncertainty
- Tracking - Sensor systems reducing uncertainty
- Question Theory - Computational cost of information acquisition
- The Braindump - Information processing and uncertainty reduction
Key Principle
Acquire information only when value exceeds cost - Information is reduction in uncertainty measured in bits. Information has value only when it enables better decisions. Acquiring information costs resources (energy, time, runway). Optimal strategy maximizes (Value_decisions_enabled - Cost_acquisition). Use high-S/N sensors (behavioral data, payment) over low-S/N sensors (expressed interest). Increase feedback loop bandwidth (daily tracking) to increase channel capacity. Spend information-acquisition budget on experiments that eliminate maximum uncertainty about critical decisions. "Learning more" is not inherently valuable—learning what changes next action is valuable. The mosquito that survives acquires just enough information to find blood before energy depletion, not perfect information about all possible blood locations.
Information without action is waste. Action without information is gambling. Optimize the trade-off: acquire information that changes decisions, execute based on current information, measure results, update beliefs, repeat.