Scenario Memory Experiment Plan

Exact files and modules likely to use

Primary ProsperPath components:

Existing data assets:

Read-only Cypher inspections:

count Episodic nodes for the target group
count entities by label
count relationships by type
count Action, Decision, Recommendation, Challenge, and FinancialPeriod
count edges where r.episodes IS NOT NULL
sample action-to-goal, action-to-challenge, and recommendation-linked relationships

These queries are sufficient for:

From the pilot pipeline:

Current pilot artifacts:

At minimum:

For the paper later:

a compact task table describing each decision situation
a results table aggregating rubric scores across fragment vs scenario conditions

Yes.

For the current assets, human judgment is necessary for the main pilot scoring because:

Automatic proxy evaluation is realistic only as a secondary signal, for example:

search retrieval may surface cross-domain or weakly related facts
episode-aware retrieval may still fail to isolate one coherent business scenario
some tasks may have recommendations but weak outcome traces
some tasks may have goals and outcomes but weak challenge linkage
graph relation labels may be semantically weak, for example many RELATES_TO edges

These failure modes do not invalidate the pilot, but they must be reported explicitly.

Minimum success:

at least one or two cases where the scenario-style package is clearly better than the fragment package on completeness and traceability
the improvement is inspectable in retrieved nodes and relation facts
the limitations are transparent and do not require invented evidence

Stronger success:

the same pattern repeats across several cases
the scenario-style package consistently surfaces goal-challenge-action-outcome structure more clearly than the baseline
the results support a narrow empirical statement in the paper

The current run should therefore be treated as a first inspectable pilot, not as the final paper experiment.