Information Theory

#cross-disciplinary #computational-lens

What It Is

Information theory, formalized by Claude Shannon in 1948, quantifies information as reduction in uncertainty measured in bits. One bit of information halves the uncertainty about a system's state. Information has cost (energy/resources required to acquire) and value (reduction in uncertainty enabling better decisions). This framework reveals why "gathering more information" is not always beneficial—information acquisition costs resources, and value depends entirely on whether information changes subsequent action.

The fundamental insight: information is not intrinsically valuable. Information has value only when it reduces uncertainty about decisions that matter. Acquiring information that doesn't change your next action wastes resources regardless of how interesting or comprehensive that information is. This applies directly to startup search strategies and cybernetic control loops.

Shannon's entropy formula quantifies uncertainty:

$$H(X) = -\sum P(x_i) \times \log_2(P(x_i))$$

Where:
  H = entropy (uncertainty in bits)
  X = random variable (system state)
  P(x_i) = probability of state i

Higher entropy means higher uncertainty. Information acquisition reduces entropy by eliminating possibilities. The value of information is the entropy reduction it provides multiplied by the value of making correct decision.

Information Cost vs Information Value

Every measurement, experiment, or data collection has energy cost. The question is whether value exceeds cost.

Cost components:

Cost Type	Startup Example	Personal Optimization Example
Direct cost	Runway spent on experiment	Time spent tracking
Opportunity cost	Alternative experiments not run	Alternative actions not taken
Cycle time cost	Delay before next iteration	Delay before course correction
Implementation cost	Infrastructure for measurement	Setup overhead for tracking system

Value calculation:

Value = (Uncertainty_before - Uncertainty_after) × Value_of_correct_decision

Information has value when:
  Value > Cost_direct + Cost_opportunity + Cost_delay

Example - Customer Interview:

Before interview:

Uncertainty about problem severity: H = 3 bits (8 equally likely hypotheses)

After interview (high-quality behavioral data):

Uncertainty reduced to: H = 1 bit (2 remaining hypotheses)
Information gained: 2 bits

If decision value = $100K (revenue impact of building right feature) and interview cost =$ 500 (time + runway), then:

Value = 2 bits × $100K (weighted by probability of using info) = high
Cost = $500
Net value: Positive → run the interview

Counter-example - Feature Polish:

Before polish:

Uncertainty about user retention: H = 2 bits

After 2 weeks of polish:

Uncertainty: H = 2 bits (unchanged—polish doesn't test retention drivers)
Information gained: 0 bits

Cost = 2 weeks runway, Value = 0 (no uncertainty reduction), Net value: Negative → skip the polish

Information Acquisition Strategies

High Information/Cost Ratio Activities

These provide maximum entropy reduction per resource unit:

Behavioral observation (S/N ratio: High)

Watch what users actually do vs what they say
Cost: Low (observation time only)
Information: High (reveals true preferences)
Entropy reduction: ~2-3 bits per observation session

Payment behavior (S/N ratio: Very High)

Test willingness to pay with real pricing
Cost: Low (pricing page + payment processing)
Information: Very High (strongest signal of value)
Entropy reduction: ~3-4 bits (eliminates most hypothetical interest)

Retention curves (S/N ratio: High)

Measure if users return after first use
Cost: Medium (requires product deployment + time)
Information: High (predicts long-term engagement)
Entropy reduction: ~2 bits per cohort

Low Information/Cost Ratio Activities

These provide minimal entropy reduction despite high resource cost:

Hypothetical surveys (S/N ratio: Low)

"Would you pay for this?" questions
Cost: Medium (survey design + analysis time)
Information: Low (stated preferences unreliable)
Entropy reduction: ~0.5 bits (minimal signal)

Feature development before validation (S/N ratio: Very Low)

Build complete feature before testing core value
Cost: Very High (weeks to months of runway)
Information: Low (learn only whether complete package works, not which components)
Entropy reduction: ~1 bit (binary: works or doesn't)

Long beta programs (S/N ratio: Low)

Months of private testing before public launch
Cost: High (delayed feedback, slow cycles)
Information: Medium (contaminated by selection bias of beta users)
Entropy reduction: ~1-2 bits but with delay cost

Shannon's Channel Capacity and Feedback Loops

Shannon's channel capacity theorem establishes maximum information transmission rate through noisy channel:

C = B × log₂(1 + S/N)

Where:
  C = channel capacity (bits per second)
  B = bandwidth (measurements per unit time)
  S/N = signal-to-noise ratio (sensor accuracy)

For cybernetic control loops, this means:

Increase bandwidth (faster measurement cycles)
Increase signal/noise ratio (better sensors, multi-sensor fusion)

Application to personal cybernetics:

Feedback Loop	Bandwidth (B)	Signal/Noise (S/N)	Capacity (C)	Example
Monthly review	12/year	0.6	Low	Generic "how am I doing?" introspection
Weekly review	52/year	0.7	Medium	Weekly pattern analysis
Daily tracking	365/year	0.8	High	Daily braindump + whiteboard
Real-time biometric	Continuous	0.9	Very High	HRV monitoring, sleep tracking

Higher bandwidth (more frequent measurement) × higher S/N (better sensors) = higher channel capacity = faster learning = more rapid optimization. Daily loops with calibrated sensors dominate monthly introspection with vague feelings.

Mutual Information and Sensor Correlation

Mutual information I(X;Y) measures how much knowing Y reduces uncertainty about X:

I(X;Y) = H(X) - H(X|Y)

Where:
  H(X) = entropy before measurement
  H(X|Y) = entropy after knowing Y

For startup sensors:

Sensor Pair	Mutual Information	Interpretation
Usage × Revenue	High (I ≈ 2.5 bits)	Strong correlation; one predicts other
Interest × Usage	Low (I ≈ 0.8 bits)	Weak correlation; interest often doesn't convert
Team quality × Success	Medium (I ≈ 1.5 bits)	Moderate correlation but noisy
Competitor success × Your success	Low (I ≈ 0.5 bits)	Market validation but execution independent

High mutual information between sensors enables sensor fusion—combining multiple correlated sensors improves accuracy. Low mutual information means sensors measure independent dimensions—both should be monitored separately.

Information Acquisition vs Action Execution

The critical trade-off in resource-constrained systems: spending energy gathering information vs spending energy executing based on current information.

Total_runway = E_information_acquisition + E_execution

Optimal split when:
  Marginal_value(more info) = Marginal_value(more execution)

Over-information failure mode:

90% of runway spent researching, 10% executing
High confidence in correct direction, insufficient time to capitalize
"Analysis paralysis" - perpetual information gathering

Under-information failure mode:

10% of runway spent researching, 90% executing
Confidently building in wrong direction
"Premature commitment" - execution before adequate validation

Optimal balance (varies by uncertainty):

Initial Uncertainty	Info Acquisition %	Execution %	Reasoning
Very High	40-50%	50-60%	Need substantial validation before commitment
High	25-35%	65-75%	Balance exploration and exploitation
Medium	15-25%	75-85%	Some validation then mostly execution
Low	5-15%	85-95%	Confirmed direction, focus on execution

The 30-day pattern demonstrates this in habit formation: days 1-7 require high information acquisition (learning what works), days 8-30 transition to execution (repeating validated pattern).

The Value of Information Formula

From decision theory, the expected value of perfect information (EVPI):

$$EVPI = E[	ext{Value}_{	ext{perfect}}] - E[	ext{Value}_{	ext{current}}]$$

Where E[Value] = ∑ P(state) × Value(optimal decision | state)

Information worth acquiring when:

EVPI > Cost_acquisition

Practical application:

Should you spend 2 weeks building analytics dashboard?

Value if you have perfect usage data:
  Build high-value features = +$200K expected value

Value with current vague intuition:
  Random feature work = +$50K expected value

$EVPI = \$200K - \$50K = \$150K$
Cost = 2 weeks runway ≈ $40K

EVPI > Cost → Yes, build the dashboard

This formalizes the intuition that measurement infrastructure has high ROI when it enables better decisions at scale.

Integration with Mechanistic Framework

Information theory formalizes several mechanistic concepts:

Question Theory as information acquisition:

Questions initiate search processes (spend cognitive energy)
Good questions provide high information/cost ratio
Bad questions: high computational cost, low entropy reduction

Tracking as sensor system:

External logs reduce uncertainty about behavioral patterns
Cost: Setup + maintenance overhead
Value: Enables data-driven decisions vs mood-based guesses
Net value positive when: Decision_frequency × Decision_impact > Setup_cost

The Braindump as information processing:

Externalizes working memory state (reduces internal uncertainty)
Processing externalized information reveals patterns (entropy reduction)
Cost: 10 minutes daily
Value: Clarity on next actions (eliminates ambiguity)

Startup as a Bug - Information acquisition under resource constraints
Cybernetics - Feedback loops and sensor systems
Optimal Foraging Theory - Search strategies under energy limits
Expected Value - Decision-making under uncertainty
Tracking - Sensor systems reducing uncertainty
Question Theory - Computational cost of information acquisition
The Braindump - Information processing and uncertainty reduction

Key Principle

Acquire information only when value exceeds cost - Information is reduction in uncertainty measured in bits. Information has value only when it enables better decisions. Acquiring information costs resources (energy, time, runway). Optimal strategy maximizes (Value_decisions_enabled - Cost_acquisition). Use high-S/N sensors (behavioral data, payment) over low-S/N sensors (expressed interest). Increase feedback loop bandwidth (daily tracking) to increase channel capacity. Spend information-acquisition budget on experiments that eliminate maximum uncertainty about critical decisions. "Learning more" is not inherently valuable—learning what changes next action is valuable. The mosquito that survives acquires just enough information to find blood before energy depletion, not perfect information about all possible blood locations.

Information without action is waste. Action without information is gambling. Optimize the trade-off: acquire information that changes decisions, execute based on current information, measure results, update beliefs, repeat.