Information Theory

#cross-disciplinary #computational-lens

What It Is

Information theory, formalized by Claude Shannon in 1948, quantifies information as reduction in uncertainty measured in bits. One bit of information halves the uncertainty about a system's state. Information has cost (energy/resources required to acquire) and value (reduction in uncertainty enabling better decisions). This framework reveals why "gathering more information" is not always beneficial—information acquisition costs resources, and value depends entirely on whether information changes subsequent action.

The fundamental insight: information is not intrinsically valuable. Information has value only when it reduces uncertainty about decisions that matter. Acquiring information that doesn't change your next action wastes resources regardless of how interesting or comprehensive that information is. This applies directly to startup search strategies and cybernetic control loops.

Shannon's entropy formula quantifies uncertainty:

$$H(X) = -\sum P(x_i) \times \log_2(P(x_i))$$

Where:
  H = entropy (uncertainty in bits)
  X = random variable (system state)
  P(x_i) = probability of state i

Higher entropy means higher uncertainty. Information acquisition reduces entropy by eliminating possibilities. The value of information is the entropy reduction it provides multiplied by the value of making correct decision.

Information Cost vs Information Value

Every measurement, experiment, or data collection has energy cost. The question is whether value exceeds cost.

Cost components:

Cost Type Startup Example Personal Optimization Example
Direct cost Runway spent on experiment Time spent tracking
Opportunity cost Alternative experiments not run Alternative actions not taken
Cycle time cost Delay before next iteration Delay before course correction
Implementation cost Infrastructure for measurement Setup overhead for tracking system

Value calculation:

Value = (Uncertainty_before - Uncertainty_after) × Value_of_correct_decision

Information has value when:
  Value > Cost_direct + Cost_opportunity + Cost_delay

Example - Customer Interview:

Before interview:

  • Uncertainty about problem severity: H = 3 bits (8 equally likely hypotheses)

After interview (high-quality behavioral data):

  • Uncertainty reduced to: H = 1 bit (2 remaining hypotheses)
  • Information gained: 2 bits

If decision value = 100K(revenueimpactofbuildingrightfeature)andinterviewcost=100K (revenue impact of building right feature) and interview cost = 500 (time + runway), then:

  • Value = 2 bits × $100K (weighted by probability of using info) = high
  • Cost = $500
  • Net value: Positive → run the interview

Counter-example - Feature Polish:

Before polish:

  • Uncertainty about user retention: H = 2 bits

After 2 weeks of polish:

  • Uncertainty: H = 2 bits (unchanged—polish doesn't test retention drivers)
  • Information gained: 0 bits

Cost = 2 weeks runway, Value = 0 (no uncertainty reduction), Net value: Negative → skip the polish

Information Acquisition Strategies

High Information/Cost Ratio Activities

These provide maximum entropy reduction per resource unit:

Behavioral observation (S/N ratio: High)

  • Watch what users actually do vs what they say
  • Cost: Low (observation time only)
  • Information: High (reveals true preferences)
  • Entropy reduction: ~2-3 bits per observation session

Payment behavior (S/N ratio: Very High)

  • Test willingness to pay with real pricing
  • Cost: Low (pricing page + payment processing)
  • Information: Very High (strongest signal of value)
  • Entropy reduction: ~3-4 bits (eliminates most hypothetical interest)

Retention curves (S/N ratio: High)

  • Measure if users return after first use
  • Cost: Medium (requires product deployment + time)
  • Information: High (predicts long-term engagement)
  • Entropy reduction: ~2 bits per cohort

Low Information/Cost Ratio Activities

These provide minimal entropy reduction despite high resource cost:

Hypothetical surveys (S/N ratio: Low)

  • "Would you pay for this?" questions
  • Cost: Medium (survey design + analysis time)
  • Information: Low (stated preferences unreliable)
  • Entropy reduction: ~0.5 bits (minimal signal)

Feature development before validation (S/N ratio: Very Low)

  • Build complete feature before testing core value
  • Cost: Very High (weeks to months of runway)
  • Information: Low (learn only whether complete package works, not which components)
  • Entropy reduction: ~1 bit (binary: works or doesn't)

Long beta programs (S/N ratio: Low)

  • Months of private testing before public launch
  • Cost: High (delayed feedback, slow cycles)
  • Information: Medium (contaminated by selection bias of beta users)
  • Entropy reduction: ~1-2 bits but with delay cost

Shannon's Channel Capacity and Feedback Loops

Shannon's channel capacity theorem establishes maximum information transmission rate through noisy channel:

C = B × log₂(1 + S/N)

Where:
  C = channel capacity (bits per second)
  B = bandwidth (measurements per unit time)
  S/N = signal-to-noise ratio (sensor accuracy)

For cybernetic control loops, this means:

  1. Increase bandwidth (faster measurement cycles)
  2. Increase signal/noise ratio (better sensors, multi-sensor fusion)

Application to personal cybernetics:

Feedback Loop Bandwidth (B) Signal/Noise (S/N) Capacity (C) Example
Monthly review 12/year 0.6 Low Generic "how am I doing?" introspection
Weekly review 52/year 0.7 Medium Weekly pattern analysis
Daily tracking 365/year 0.8 High Daily braindump + whiteboard
Real-time biometric Continuous 0.9 Very High HRV monitoring, sleep tracking

Higher bandwidth (more frequent measurement) × higher S/N (better sensors) = higher channel capacity = faster learning = more rapid optimization. Daily loops with calibrated sensors dominate monthly introspection with vague feelings.

Mutual Information and Sensor Correlation

Mutual information I(X;Y) measures how much knowing Y reduces uncertainty about X:

I(X;Y) = H(X) - H(X|Y)

Where:
  H(X) = entropy before measurement
  H(X|Y) = entropy after knowing Y

For startup sensors:

Sensor Pair Mutual Information Interpretation
Usage × Revenue High (I ≈ 2.5 bits) Strong correlation; one predicts other
Interest × Usage Low (I ≈ 0.8 bits) Weak correlation; interest often doesn't convert
Team quality × Success Medium (I ≈ 1.5 bits) Moderate correlation but noisy
Competitor success × Your success Low (I ≈ 0.5 bits) Market validation but execution independent

High mutual information between sensors enables sensor fusion—combining multiple correlated sensors improves accuracy. Low mutual information means sensors measure independent dimensions—both should be monitored separately.

Information Acquisition vs Action Execution

The critical trade-off in resource-constrained systems: spending energy gathering information vs spending energy executing based on current information.

Total_runway = E_information_acquisition + E_execution

Optimal split when:
  Marginal_value(more info) = Marginal_value(more execution)

Over-information failure mode:

  • 90% of runway spent researching, 10% executing
  • High confidence in correct direction, insufficient time to capitalize
  • "Analysis paralysis" - perpetual information gathering

Under-information failure mode:

  • 10% of runway spent researching, 90% executing
  • Confidently building in wrong direction
  • "Premature commitment" - execution before adequate validation

Optimal balance (varies by uncertainty):

Initial Uncertainty Info Acquisition % Execution % Reasoning
Very High 40-50% 50-60% Need substantial validation before commitment
High 25-35% 65-75% Balance exploration and exploitation
Medium 15-25% 75-85% Some validation then mostly execution
Low 5-15% 85-95% Confirmed direction, focus on execution

The 30-day pattern demonstrates this in habit formation: days 1-7 require high information acquisition (learning what works), days 8-30 transition to execution (repeating validated pattern).

The Value of Information Formula

From decision theory, the expected value of perfect information (EVPI):

$$EVPI = E[	ext{Value}_{	ext{perfect}}] - E[	ext{Value}_{	ext{current}}]$$

Where E[Value] = ∑ P(state) × Value(optimal decision | state)

Information worth acquiring when:

EVPI > Cost_acquisition

Practical application:

Should you spend 2 weeks building analytics dashboard?

Value if you have perfect usage data:
  Build high-value features = +$200K expected value

Value with current vague intuition:
  Random feature work = +$50K expected value

$EVPI = \$200K - \$50K = \$150K$
Cost = 2 weeks runway ≈ $40K

EVPI > Cost → Yes, build the dashboard

This formalizes the intuition that measurement infrastructure has high ROI when it enables better decisions at scale.

Integration with Mechanistic Framework

Information theory formalizes several mechanistic concepts:

Question Theory as information acquisition:

  • Questions initiate search processes (spend cognitive energy)
  • Good questions provide high information/cost ratio
  • Bad questions: high computational cost, low entropy reduction

Tracking as sensor system:

  • External logs reduce uncertainty about behavioral patterns
  • Cost: Setup + maintenance overhead
  • Value: Enables data-driven decisions vs mood-based guesses
  • Net value positive when: Decision_frequency × Decision_impact > Setup_cost

The Braindump as information processing:

  • Externalizes working memory state (reduces internal uncertainty)
  • Processing externalized information reveals patterns (entropy reduction)
  • Cost: 10 minutes daily
  • Value: Clarity on next actions (eliminates ambiguity)

Key Principle

Acquire information only when value exceeds cost - Information is reduction in uncertainty measured in bits. Information has value only when it enables better decisions. Acquiring information costs resources (energy, time, runway). Optimal strategy maximizes (Value_decisions_enabled - Cost_acquisition). Use high-S/N sensors (behavioral data, payment) over low-S/N sensors (expressed interest). Increase feedback loop bandwidth (daily tracking) to increase channel capacity. Spend information-acquisition budget on experiments that eliminate maximum uncertainty about critical decisions. "Learning more" is not inherently valuable—learning what changes next action is valuable. The mosquito that survives acquires just enough information to find blood before energy depletion, not perfect information about all possible blood locations.


Information without action is waste. Action without information is gambling. Optimize the trade-off: acquire information that changes decisions, execute based on current information, measure results, update beliefs, repeat.