Hippocampal-prefrontal orchestration supports higher-order learning for creative ideation

Table of Contents

Participants

All participants were right-handed, native Chinese speakers with normal or corrected-to-normal vision. Sex was determined by self-report. For the fMRI studies (Exps. 1, 2, and S2), initial recruitment included 30, 40, and 27 individuals, respectively. In Exp. 1, four participants were excluded due to equipment malfunction (fMRI data not available), and three others were excluded for extremely poor performance in the creating phase (defined as more than two SD below the group mean). In Exp. 2, one participant withdrew midway due to being physically sick. In Exp. S2, data from four participants were excluded from analysis due to poor performance in the cued-recall test (remembering fewer than ten items out of 100 trials). The final sample sizes were as follows: Exp. 1, n = 23 for fMRI data (15 females, mean age = 23.17 ± 1.64 years) and n = 27 for behavioral data (16 females, mean age = 23.33 ± 1.57 years); Exp. 2, n = 39 (24 females, mean age = 22.13 ± 3.14 years); Exp. S2, n = 23 (16 females, mean age = 22.87 ± 1.92 years). No participant took part in more than one experiment. All received 150 RMB for participation. For the behavioral Exp. S1a–c, S3–S7, the sample sizes were as follows: Exp. S1a, n = 36 (23 females, mean age = 21.42 ± 0.73 years); Exp. S1b, n = 40 (20 females, mean age = 21.20 ± 2.08 years); Exp. S1c, n = 34 (15 females, mean age = 21.97 ± 2.38 years); Exp. S3, n = 33 (23 females, mean age = 23.30 ± 2.76 years); Exp. S4, n = 35 (28 females, mean age = 22.00 ± 2.98 years); Exp. S5, n = 38 (23 females, mean age = 22.70 ± 2.61 years); Exp. S6, n = 32 (23 females, mean age = 22.53 ± 2.29 years); and Exp. S7, n = 36 (23 females, mean age = 22.16 ± 2.50 years). No participants overlapped across experiments. Compensation ranged from 30 to 60 RMB depending on study duration. No sex- or gender-based analysis was conducted, as the study was not designed or powered to detect sex-related effects.

Experimental procedures

The procedures of Exp. 1

Experimental materials comprised 100 creative exemplars of everyday objects, drawn from a previously established AUT dataset, and presented as item-use pairs (e.g., “whisk-candlestick”). Exemplars were selected based on pre-experimental ratings to ensure moderate-to-high creativity (M = 3.43, SD = 0.27) and novelty (M = 3.85, SD = 0.33), while maintaining high understandability (M = 4.53, SD = 0.28). All ratings were made on a 5-point scale (method. S2). This selection was guided by previous showing that exposure to creative exemplars robustly enhances subsequent creative performance and associated neural activity compared to common materials^1,3. To minimize material-related confounds, between-item variability was carefully controlled, with coefficients of variation for creativity (CV = 0.08), novelty (CV = 0.09), and understandability (CV = 0.06) all below 0.1. An additional 20 non-creative exemplars (M = 2.20, SD = 0.46) served as filler items in the learning phase only, preventing participants from exclusively encoding creative prototypes. The experiment comprised three consecutive phases, each preceded by standardized instructions and practice trials.

Exemplar-learning (phase 1): The phase 1 involved an event-related fMRI design distributed over four runs. Each run comprised 25 creative exemplars and 5 uncreative fillers, presented in random order. During each trial (Fig. 2A), an object-usage pair was displayed for 6 s. Participants were asked to attentively comprehend the exemplar and judge its creativity via button press (“yes” for creative, “no” for uncreative). Importantly, participants were not informed about the subsequent self-generated creating (phase 2) or cued-recall (phase 3) tasks to avoid any relevant attentive processing. Each decision was followed by a jittered fixation (4–6 s) to maintain attention and prepare for the next trial.

Self-generated creating (phase 2): After the exemplar-learning phase, participants were taken to a quiet room for behavioral testing. A surprise AUT was administered: in each trial, a random object (from the 100 learned creative exemplars) was presented for 12 s, during which participants were asked to generate and verbally report novel and original uses for the object—distinct from any previously learned exemplars. Responses were audio-recorded, and the experimenter also transcribed them. Trials were separated by a 2 s fixation. After the task, participants confirmed the accuracy of the recorded responses.

Cued-recall (phase 3): In phase 3, participants completed a cued-recall test for the exemplars learned in phase 1. Object names from the creative object-use pairs were presented in random order as cues (maximum 4 s each). For each cue, participants indicated whether they could recall the associated creative use; if so, they typed their answer; otherwise, the next object was presented. Each trial was separated by a 2 s fixation. Notably, the self-generated creating task (phase 2) preceded this recall test to allow an accurate assessment of the impact of exemplar-learning on creative output independently of memory retrieval effects.

The procedures of Exps. S1–S7

Exp. S1a–c

To systematically examine the effect of creative exemplar-learning on subsequent creative ideation, we conducted three control experiments (Exp. S1a–c). In Exp. S1a, participants completed the AUT with the same set of 100 target objects as in Exp. 1, but without any prior exemplar-learning. In Exp. S1b, a separate group learned the conventional uncreative uses of the 100 target objects as adopted in Exp. 1 (e.g., “newspaper–reading news”) in random order before performing the AUT. In Exp. S1c, another group learned 200 single words derived from the creative object-use pairs in Exp. 1 (e.g., “newspaper–lampshade” presented separately as “newspaper” and “lampshade”, randomly mixed with other words taken from other pairs), each displayed for 3 s during which participants judged whether it was a real word, followed by the same creative ideation task as in phase 2 of Exp. 1. Across all three experiments, the creative ideation procedures were identical to those in Exp. 1, and none included a cued-recall phase.

Exp. S2

This experiment aimed to identify the neural correlates of successful memory encoding for creative exemplars—referring to as SME⁸². The exemplar-learning materials and procedures were identical to those in phase 1 of Exp. 1. Following the learning phase, participants exited the MRI scanner and were taken to a testing room, where they first completed a cued-recall task for the learned exemplars (identical to phase 3 of Exp. 1). Subsequently, they performed a recognition memory test, in which 220 object–usage pairs were presented sequentially for old–new memory judgments. The test set comprised 100 targets (objects paired with their original learned creative uses), 100 lures (objects paired with creative uses that were slightly different from the correct answer), and 20 entirely new object–usage pairs. For each item, participants indicated whether it matched the originally learned exemplar and rated their confidence on a five-point scale (1 = low confidence, 5 = high confidence; see Fig. S9A).

Exp. S3

This behavioral experiment utilized a directed forgetting paradigm to detect the relationship between subsequent forgetting of learned exemplars and creative ideation for the same target object. Participants were presented with 52 creative exemplars (object-use pairs) of moderately high creativity, selected from the same AUT Creative Uses Dataset as in Exp. 1. For each participant, half of the items were randomly assigned to the directed remembering (DR) condition and the other half to the directed forgetting (DF) condition. During the exemplar-learning phase, each object-use pair was displayed for 4 s, followed by a 0.3 s blank screen. A cue indicator (“√“ for remember; “×“ for forget) then appeared for 2 s, instructing participants either to intentionally remember or forget the just-presented exemplar. In the test phase, the names of the 52 target objects were presented in random order. For each object, participants first attempted to recall the previously learned creative alternate use (cued recall; 4 s per item) and then generated their own creative ideas for using the same object (self-generated creation; 8 s verbal response). Responses were recorded, and the task structure is illustrated in Fig. S10A.

Exp. S4

This experiment examined whether the presence or absence of recollective experiences during exemplar-learning influenced subsequent creative ideation. Exp. S4 comprised three phases: exemplar-learning, exemplar recognition with Remember/Know (R/K) judgments, and self-generated creation. During the exemplar-learning phase, participants viewed the same 100 creative exemplars (object–use pairs) as in phase 1 of Exp. 1 in random order. Each exemplar was displayed for 4 s, with a 1 s inter-trial interval, during which participants judged its creativity. Following a 7-min numerical distraction task, participants completed the recognition phase, which consisted of 200 trials in which object–use pairs (100 targets and 100 lures) were presented sequentially in random order. For each trial (maximum 6 s), participants judged whether the presented pair matched an exemplar learned earlier. If they responded “yes,” they were prompted to make an R/K judgment: an “R” (remember) response indicated retrieval of vivid episodic details (e.g., visual appearance, thoughts, or feelings at encoding), whereas a “K” (know) response indicated recognition without specific episodic content⁶⁵. This secondary judgment was also allotted up to 6 s, separated by a 0.5 s interval (Fig. S10C). Finally, participants were presented with the 100 target objects and instructed to generate their own creative uses following the same procedure as in Exp. 1.

Exps. S5–S7

Exps. S5–S7 explored the effect of learning-creating incubation delay on creative ideation. Across these three experiments, procedures and materials were closely matched, with critical modifications described below.

Exp. S5 involved 40 moderately high-creative exemplars that served as the exemplar material. The set was divided into four sub-groups, each assigned to one of four conditions: (1) exemplar-learning with incubation (EI); (2) exemplar-learning without incubation (E); (3) incubation without exemplar-learning (I); (4) a control condition (C), with 10 items per group. Statistical analysis confirmed no significant differences among the four sub-groups in novelty, appropriateness, or creativity (all ps > 0.2). Assignment of materials to conditions was counterbalanced across participants, and the presentation order of conditions was determined using a Latin square design. Each trial in all four conditions followed a classic “pretest–incubation–posttest” structure. The trial began with an initial AUT (AUT1), in which participants generated as many creative uses as possible for the target object, without a time limit. Participants then experienced one of the four experimental conditions before completing a second AUT (AUT2), in which they again generated creative uses for the same target object within 20 s.

(1) EI condition (exemplar-learning + incubation delay): Before AUT2, participants learned a creative exemplar (object-usage pair) for 4 s, with instructions to attend carefully to its content. This was followed by a 40 s incubation period in which numbers were presented sequentially at a rate of one per second, interspersed with a 1 s inter-trial interval. During this period, participants performed a 0-back working memory (WM) task, pressing the enter key whenever a number other than “3” appeared. The incubation phase was designed to provide a brief cognitive distraction, thereby facilitating incubation and potentially enhancing creative performance in AUT2.

(2) E condition (exemplar-learning only): This condition mirrored the EI condition, except that the 40 s incubation task was omitted. Participants learned the creative exemplar for 4 s and then proceeded immediately to AUT2, allowing assessment of the effect of exemplar-learning in the absence of incubation.

(3) I condition (incubation delay only): Participants completed the same 40 s 0-back WM incubation task as in the EI condition, but without any prior exemplar-learning. This condition isolated the effect of incubation delay on creative performance, independent of exemplar-learning.

(4) C condition (control): Neither exemplar-learning nor incubation was included. Instead, a 6 s fixation period separated AUT1 and AUT2, providing a baseline for evaluating the independent and combined effects of exemplar-learning and incubation (Fig. S11A).

Exp. S6

Exp. S6 utilized the same materials and procedures as Exp. S5 but compared only two conditions: E and EI_long. The E condition was identical to the E condition in Exp. S5, while the EI_long condition was identical to the EI condition in Exp. S5 except that the incubation interval (0-back WM task) was extended from 40 to 180 s, allowing examination of the effect of prolonged incubation on creative performance (because no significant difference was found between the E and EI conditions in Exp. S5; Fig. S11C).

Exp. S7

Exp. S7 further examined the impact of WM load during the incubation period by expanding on the EI_long condition from Exp. S6, with the hypothesis that high vs. low WM load would respectively produce stronger interference or greater incubation-induced optimization of learned exemplar knowledge. Two incubation conditions were implemented to manipulate WM load. In Low-load incubation (LI), participants performed a 0-back WM task, the same as in the EI_long condition of Exp. S6. In High-load incubation (HI), participants engaged in a high-load 2-back WM task. Here, numbers were presented sequentially, and participants were required to compare each number with the one presented two trials earlier, responding when the current number matched the number two positions back (Fig. S11E).

The procedures of Exp. 2

The procedure of Exp. 2 closely paralleled that of Exp. 1, comprising three phases: exemplar-learning (phase 1), self-generated creating (phase 2), and cued-recall (phase 3). The key distinction was the use of pictorial materials to illustrate creative uses of everyday objects, with both the learning and creating phases conducted during MRI scanning (Fig. 4A). This design enabled in-depth investigation of hippocampal and prefrontal representations acquired during exemplar-learning and their transfer to creative ideation. To address a limitation of Exp. 1—in which the creating phase was conducted outside the scanner due to technical challenges in recording speech⁹³—Exp. 2 employed a sparse-sampling fMRI protocol⁹⁴ during the creating phase to enable clear in-scanner audio recording of participants’ verbal responses. During phase 1, participants learned a list of pictorial creative exemplars (object–use pairs) from the established database (method. S3), which were rated highly for novelty (M = 5.78, SD = 0.58), usefulness (M = 5.41, SD = 0.57), and creativity (M = 5.13, SD = 0.54) on a 1-7 point rating scale. To prevent habituation, additional non-creative pictorial exemplars were included but not analyzed. Each trial presented an object–use pair for 6 s, during which participants judged its creativity (“yes”/“no” via keypress), followed by a jittered fixation (4–6 s). In phase 2, participants generated creative ideas for the previously learned objects while undergoing fMRI. A sparse-sampling protocol (2 s scanning + 5 s silent gap) ensured accurate capture of verbal responses: each object was displayed for 16 s, during which participants silently conceived creative uses, followed by a 5 s yellow-screen cue (synchronous with scanning silence) during which they verbally reported their ideas. Trials were separated by a 7 s fixation cross. Finally, in phase 3, participants completed a recall test for the learned exemplars, using the same procedure as in Exp. 1.

Image acquisition and preprocessing

Participants in Exps. 1, 2, and S2 underwent MRI scans at the Center for Biomedical Imaging Research, Tsinghua University, using a 3T Philips Achieva 3.0T TX MRI scanner with a 32-channel head coil. Foam padding was employed to minimize head movement during scans. The acquisition parameters for T1*-weighted anatomical images included: 180 slices encompassing the whole brain, repetition time (TR) = 7.65 ms, echo time (TE) = 3.73 ms, flip angle (FA) = 8°, field of view (FOV) = 230 mm × 230 mm, voxel size = 1 mm × 1 mm × 1 mm, 1 mm thickness. For T2*-weighted functional images, the parameters were: TR = 2000ms, TE = 35 ms, FA = 90°, FOV = 200 mm × 200 mm, 64 × 64 matrix, voxel size = 2.5 × 2.5 × 4 mm, and 30 slices covering the whole brain with 4 mm thickness and no inter-slices gap. Phase 2 of Exp. 2 employed a sparse sampling acquisition technique, introducing 5-s silent intervals between the acquisitions of 2-s MR volumes. This modification facilitated the collection of overt participant responses during phase 2 without the interference of scanner noise. Visual stimuli were delivered using E-prime 2.0 software and projected onto a screen within the participant’s view via a mirror system.

fMRI data preprocessing was conducted using fMRIPrep 21.0.0 ( a software built on Nipype 1.6.1 ( This comprehensive preprocessing pipeline addressed several key aspects of data correction and standardization: it corrected for slice timing discrepancies, realigned all images to a mean reference image, and registered the data to the MNI152 template at a resolution of 2 mm isotropic voxels. Spatial smoothing was intentionally omitted to preserve the original spatial properties of the data. For a detailed account of the preprocessing steps and results, please refer to the HTML reports generated by fMRIPrep, accessible via the fMRIPrep Documentation.

Data analysis

Behavioral data analysis

Creativity assessment

The creativity of self-generated AUT ideas was evaluated using the Consensus Assessment Technique⁹⁵. Three independent raters—master’s students specializing in creative psychology research with prior experience evaluating AUT responses—independently assessed each idea offline. Ratings were based on both the novelty and appropriateness of the idea in determining its degree of creativity. In Exp. 1, participants generated a total of 3430 ideas in phase 2 across 100 target objects. Each idea was rated, yielding a high inter-rater reliability (ICC) of 0.85 across the three raters. For the other experiments, ICCs were as follows: Exp. 2 = 0.83, Exp. S1a = 0.80, Exp. S1b = 0.74, Exp. S1c = 0.88, Exp. S3 = 0.82, Exp. S4 = 0.81, Exp. S5 = 0.88, Exp. S6 = 0.82, and Exp. S7 = 0.93.

HSC/LSC trial classification (Exps. 1 and 2)

For each trial in phase 2, the creativity score was calculated as the mean rating across the three raters. Trials in which participants either failed to produce a response or simply repeated the learned exemplar were assigned a creativity score of zero. For trials with one or more creative responses, the highest-scoring idea was used as the trial’s creativity index, reflecting the common practice of taking a person’s most creative work as representative of their creative ability in real-world contexts. An individual-specific median split was then applied to each participant’s trial list in the exemplar-learning phase, classifying trials as higher subsequent creation (HSC; above-median creativity score) or lower subsequent creation (LSC; below-median creativity score). To further ensure the robustness of results, we also conducted analyses using the mean creativity scores across all participants as an alternative criterion for HSC/LSC categorization.

Detailed behavioral data analysis

Comprehensive procedures and statistical approaches for behavioral data analysis across all experiments are provided in the Supplementary Information. Specifically, see results. S1–S3 and methods. S1 for Exp. 1; result. S4 for Exp. S1a–c; results S11–S13 for Exps. S3–S7; and result. S14 for Exp. 2.

fMRI data analysis

Single-item response estimation (Exps. 1 and 2)

To investigate the fixed effects of each regressor across participants, general linear model (GLM) analyses were utilized. We adopted a least squares all (LSA) model, which delineates each trial of each condition of interest as a distinct entity within the design matrix. Researchers have advocated for the superiority of LSA over other single-item estimation methods, such as least squares separate (LSS), particularly in scenarios of substantial trial variability⁹⁶. Given the variability inherent in the stimuli presented—novel usages of different objects—the LSA method was selected for its robustness. Regressors were convolved with the canonical hemodynamic response function to model the expected physiological changes. Each GLM also incorporated six head-motion parameters derived from head-motion correction as nuisance regressors. To mitigate low-frequency signal drift, a 128-s high-pass filter was applied to each run.

Definition of ROIs (Exps. 1 and 2)

Our analysis focused on the encoding features of the hippocampus and its interaction with the inferior frontal gyrus (IFG; ventrolateral prefrontal cortex, vlPFC) in fostering creative ideation. Both the hippocampus and IFG were delineated using the AAL atlas⁶⁴. To segment the hippocampus into anterior and posterior sections, we applied a MNI coordinate-based segmentation method. This division was based on the location of the uncal apex within the MNI space, specifically at a y-coordinate of −21 mm⁹⁷. Additionally, vision-related brain areas were identified utilizing the Neurosynth meta-analytic database by searching for the keyword “vision”⁹⁸. The association test map for “vision” from Neurosynth was thresholded at a q < 0.01 FDR-corrected and obtained as the mask to generate the mask for these areas.

Estimation of low-dimensional representation (Exps. 1 and 2)

We hypothesized that hippocampal representational dimensionality reduction during exemplar-learning correlates with enhanced performance in subsequent creation. To quantify the extent of this neural compression, we measured the representational dimensionality of hippocampal neural activity patterns within PCA space, based on unsmoothed functional data. Activation patterns for each exemplar-learning trial were estimated individually, and activation vectors spanning all voxels within the hippocampus were extracted for both HSC and LSC trials. As shown in Fig. 3A, representational dimensionality for the HSC and LSC conditions was estimated separately using the following procedure: To ensure robust dimensionality estimation—given that the number of HSC and LSC trials for each participant may differ slightly (k₁ ≠ k₂)—a 1000-sample bootstrap procedure was employed. An equal number of trials, determined by the smaller trial count across conditions, were randomly sampled from each condition to construct an n × p (voxels × trials) activity matrix. PCA was then performed to estimate the number of principal components (PCs) required to explain each variance threshold from 70% to 80% in 1% increments. The number of PCs at each threshold was averaged across bootstrap samples, and the final dimensionality estimate for each condition was obtained by averaging across the 70–80% explained variance range⁹⁹. Dimensionality estimates were compared between HSC and LSC conditions using a group-level paired t-test. Given that 80% explained variance remains a widely used criterion in neural dimensionality studies^56,57—for its effective balance between capturing task-relevant signals and minimizing noise—we also estimated hippocampal representational dimensionality as the number of PCs required to explain 80% of the variance in the activation patterns. This estimation was performed using the same bootstrap approach described above to ensure statistical robustness.

Estimation of global pattern similarity (Exps. 1 and 2)

To assess the relationship between the neural representations of different exemplars, pairwise Pearson correlations were calculated. For each exemplar i (denoted as A_i), its activation pattern was compared to the activation pattern of each other exemplar j in the set of all exemplars (k_total) learned during phase 1. This method was systematically repeated for each exemplar, providing a comprehensive matrix of correlation coefficients, according to the formula:

$${{{\rm{GPS}}}}(i)={\sum}_{j=1}^{{k}_{{{{\rm{total}}}}}}{{corr}}({A}_{i},{A}_{j})$$

(1)

To normalize the distribution of similarity scores, we transformed these values into Fisher’s z-scores. These transformed scores were then averaged to calculate the global pattern similarity value for each trial type (HSC and LSC). These global pattern similarity values for each trial type were then averaged and statistically contrasted at the individual level to investigate differences in neural representation. For group-level analysis, we utilized a random-effects model. This approach was chosen to accommodate inter-subject variability, ensuring robust statistical inference across the study population.

Psychophysiological interaction (PPI) analysis (Exp. 1)

We conducted a PPI analysis¹⁰⁰ with the left hippocampus as the seed region. The analysis workflow commenced with smoothing each subject’s preprocessed fMRI data using a Gaussian kernel with a full width at half-maximum (FWHM) of 6 mm. The BOLD signals were then extracted from the seed region to construct the PPI design matrix. This matrix included the seed region’s time series (physiological variable), the task-related time series (psychological variable), and their interaction term. The interaction term was derived by element-wise multiplication of the seed region’s time course with the task time course, representing a specific task condition (HSC vs. LSC), and subsequently convolved with the canonical HRF. Additionally, a high-pass filter with a cut-off of 128-s was applied, and six motion parameters obtained from realignment were incorporated as nuisance regressors. The PPI model estimation and statistical map generation were then performed for each subject individually. These subject-level statistical maps were entered into a second-level group analysis. This correction was applied with a cluster-wise FWE correction set at p < 0.05 and a cluster-forming threshold at p < 0.001, using SPM 12.

Representational connectivity analysis (Exps. 1 and 2)

To evaluate the degree to which the left hippocampus and the left IFG/vlPFC encode similar information, we compared the RDMs of these two regions. RDMs were constructed separately for trials under HSC and LSC conditions in both the hippocampus and IFG (anatomical ROI). The similarity between the RDMs for the two regions was quantified by calculating the Pearson correlation coefficient (r) between the lower triangles of the respective matrices (see Fig. 3E). For each participant, correlation coefficients were converted to Fisher’s z-scores to normalize the distribution. Group-level t-tests were then performed to assess differences in representational similarity between the HSC and LSC conditions.

Regression analysis (Exp. 1)

We employed a regression analysis using the robust regression toolbox (available at to identify voxels in the neocortex that are coupled with the left hippocampus and contribute to neural compression. The searchlight-based representational connectivity analysis yielded searchlight maps for two conditions (HSC and LSC). We generated representational connectivity difference maps by subtracting LSC maps from HSC maps. Within the left hippocampus, the representations dimensionality score for the LSC condition was subtracted from the representations dimensionality score for the HSC condition, each calculated at 80% explained variance, resulting in the representations dimensionality difference score. We then regressed the voxel values from the representational connectivity difference maps onto these representations’ dimensionality difference scores. Leveraging the representational connectivity analysis outcomes, we applied a small volume correction (SVC) procedure, focusing on the bilateral prefrontal, and subsequently extended this analysis to encompass the entire brain. Significant clusters were identified using voxel-wise FDR correction (q < 0.05), which was the default setting for multiple comparisons correction in the robust regression toolbox.

Comparison of cross-phase low-dimensional representational transfer (Exp. 2)

We used PCA to analyze the low-dimensional representational pattern similarity between learning and creating phases. Let the neural data matrix be:

$${{{\bf{X}}}}\in {{{{\bf{R}}}}}^{n\times p}$$

(2)

where n is the number of voxels and p is the number of trials within the left hippocampus. PCA decomposes X as

$${{{\bf{X}}}}={{{{\bf{Z}}}}{{{\bf{V}}}}}^{{{{\bf{T}}}}},{{{\bf{Z}}}}={{{\bf{XV}}}}$$

(3)

where ${{{\bf{V}}}}\in {{{{\bf{R}}}}}^{p\times k}$ contains the first k principal axes (loadings), and ${{{\bf{Z}}}}\in {{{{\bf{R}}}}}^{n\times k}$ contains the corresponding PC scores, ordered by explained variance. The number of retained PCs (k) was chosen to capture a fixed proportion of explained variance (70–90%).

(1) Within a unified PCA-space: We combined activity patterns of HSC and LSC trials within the left hippocampus during both the learning and creating phases into a single matrix X_all, PCA was performed on X_all:

$${{{{\bf{X}}}}}_{{{{\bf{all}}}}}={{{\bf{Z}}}}{{{{\bf{V}}}}}^{{{{\bf{T}}}}}$$

(4)

From the resulting score matrix Z, we extracted trial-wise PC scores for HSC-learning, HSC-creating, LSC-learning, and LSC-creating. For each PC i, Spearman correlations were computed between learning-phase and creating-phase scores within each condition. The similarity score was defined as

$${{{{\rm{Loading\; similarity}}}}}_{{{{\rm{HSC}}}}/{{{\rm{LSC}}}}}=\frac{1}{k}{\sum}_{i=1}^{k}{{{\rm{corr}}}}({L}_{{{{\rm{HSC}}}}/{{{\rm{LSC}}}}}(:,i),{C}_{{{{\rm{HSC}}}}/{{{\rm{LSC}}}}}(:,i))$$

(5)

where L_HSC/LSC$\left(:,i\right)$ and C_HSC/LSC$\left(:,i\right)$ denote the PC scores of learning and creating trials for condition HSC or LSC, respectively. Loading similarity scores were computed across explained-variance thresholds (70–90%) and compared between HSC and LSC using a paired t-test (Fig. 5A).

(2) Within category-specific PCA-space: Based on the category-specific view of neural representation abstraction¹⁸, we also utilized PCA to project the activation patterns of HSC and LSC conditions into their respective PCA spaces, then calculated the pattern similarity within each condition and subsequently compared. For each condition (HSC or LSC), hippocampal activation during the learning phase (X₁) was decomposed as

$${{{{\bf{X}}}}}_{1}={{{{\bf{Z}}}}}_{1}{{{{\bf{V}}}}}_{1}^{{{{\bf{T}}}}},{{{{\bf{Z}}}}}_{1}={{{{\bf{X}}}}}_{1}{{{{\bf{V}}}}}_{1}$$

(6)

The first k PCs were retained, yielding a score matrix:

$${{{\bf{L}}}}={{{{\bf{Z}}}}}_{{{{\bf{1}}}}}[:,1:k]$$

(7)

where each column L_:,i represents the trial-wise scores of the ith PC in the learning phase.

The creating phase data (X₂) were then projected onto the same set of principal axes (V₁) that had been derived from the learning phase, resulting in:

$${{{\bf{C}}}}={{{{\bf{X}}}}}_{{{{\bf{2}}}}}{{{{\bf{V}}}}}_{{{{\bf{1}}}}}[:,1:k]$$

(8)

where each column C_:,j represents the trial-wise scores of the jth PC in the creating phase.

To quantify low-dimensional representational transfer from learning to creating, we computed neural pattern similarity (NPS) scores. Specifically, for each PC i from the learning phase, we calculated the correlation between its score vector L_:,i and the score vectors of all k PCs from the creating phase (C_:,j):

$${{\mbox{NPS}}}(i)={\sum}_{j=1}^{k}{{{\rm{corr}}}}\left({{{{\bf{L}}}}}_{:,{{{\bf{i}}}}},{{{{\bf{C}}}}}_{:,{{{\bf{j}}}}}\right)$$

(9)

This procedure ensures that each learning-phase PC is not only compared to its “matched” PC in the creating phase, but also to all possible PCs, thereby capturing a more generalizable measure of exemplars’ low-dimensional representational overlap between learning and creating. Finally, the overall learning–creating similarity index (LC-NPS) was obtained by averaging across the k PCs:

$${{{\rm{LC}}}}-{{{\rm{NPS}}}}=\frac{1}{k}{\sum}_{i=1}^{k}{{{\rm{nPS}}}}(i)$$

(10)

Thus, LC-NPS reflects the mean cross-phase similarity between the low-dimensional representations formed during learning and those engaged during creating. LC-NPS values were calculated for multiple explained-variance thresholds (70–90%), and the averaged similarity scores across thresholds were statistically compared between HSC and LSC conditions using paired t-tests (Fig. 5C).

Comparison of cross-phase full-dimensional representational connectivity and transfer (Exp. 2)

To examine the cross-phase differences in representational connectivity between the hippocampus and IFG/vlPFC under HSC and LSC conditions, we conducted cross-phase representational similarity analysis in uncompressed (full-dimensional) spaces. RDMs were computed separately for HSC and LSC conditions during both the exemplar-learning and creating phases. Pearson correlation coefficients were then calculated between RDMs from the learning and creating phases within each condition. These coefficients were Fisher z-transformed and subjected to group-level t-tests (Fig. 6A). Cross-phase representational similarity between HSC and LSC conditions in the uncompressed space was also compared, allowing us to directly assess the transfer of exemplar representations from learning to creative ideation in both the hippocampus and IFG.

Cross-phase multivoxel pattern decoding and the contribution of low-dimensional hippocampal representations (Exp. 2)

We first trained a linear decoder to distinguish HSC and LSC trials during the exemplar-learning phase, and tested its classification performance on the creating phase, which was previously unseen by the classifier (Fig. 6F). For each participant, we implemented a whole-brain searchlight analysis to assess local activation patterns associated with HSC and LSC trials across both phases. Within each searchlight (spherical mask, radius ~100 voxels), a linear support vector machine (SVM) classifier was trained to discriminate HSC from LSC trials in the learning phase. The trained classifier was then tested on data from the creating phase to evaluate cross-phase generalization. Classification accuracy was defined as the proportion of correctly predicted labels in the creating phase, with above-chance performance determined by subtracting the empirical chance level (50%):

$${{{{\rm{Accuracy}}}}}_{{{{\rm{cross}}}}-{{{\rm{phase}}}}}=\frac{1}{{n}_{{{{\rm{c}}}}}}\displaystyle {\sum}_{i=1}^{{n}_{{{{\rm{c}}}}}}{{{\rm{I}}}}(f({x}_{i}^{{{{\rm{c}}}}})={y}_{i}^{{{{\rm{c}}}}})$$

(11)

where n_c denotes the total number of trials in the creating phase, ${x}_{i}^{{{{\rm{c}}}}}$ is the multivoxel activation pattern for the ith creating-phase trial, ${y}_{i}^{{{{\rm{c}}}}}$ is the ground-truth class label (i.e., HSC or LSC), f (⋅) represents the classifier trained on learning phase data, and I (⋅) is the indicator function returning 1 if the predicted label matches the true label, and 0 otherwise. This metric quantifies the fraction of creating-phase trials that were correctly classified based on neural representations learned during the exemplar-learning phase.

To address potential imbalances in HSC and LSC trials, the dataset was partitioned into multiple balanced subsets using the cosmo_balance_partitions function in the CoSMoMVPA toolbox, ensuring equal representation of each label and comprehensive inclusion of all trials. Classification accuracies were averaged across all subsets. Individual classification maps were subjected to a group-level random-effects analysis (one-sample t-test), with multiple comparison correction performed using cluster-level FDR (q < 0.05) as implemented in SnPM13 (10,000 permutations). At the group level, we identified a peak in the left vlPFC (x = –38, y = 16, z = 28) showing robust cross-phase decoding performance. For each participant, mean classification accuracy was extracted from a 10 mm spherical ROI centered on this peak, and these values were correlated with post-learning creativity scores to assess behavioral relevance.

To determine whether low-dimensional hippocampal representations formed during the learning phase support vlPFC cross-phase decoding ability, we conducted the following analyses. For each participant, low-dimensional hippocampal features were extracted using PCA, retaining PCs explaining 80% of the variance. These signals were regressed out from trial-wise vlPFC activation patterns (within the same ROI) by fitting, for each voxel j, the linear model:

$${{{{\bf{P}}}}}_{{{{\bf{j}}}}}={{{{\bf{H}}}}}^{{{{\bf{PC}}}}}{\beta }_{j}+{\varepsilon }_{j}$$

(12)

where P_j is the activation vector for voxel j across trials, H^PC is the matrix of retained hippocampal low-dimensional representations (trials × PCs), β_j is the regression coefficient vector, and ε_j denotes the residuals. The residuals (vlPFC activity patterns not explained by hippocampal low-dimensional signals) were then used for cross-phase decoding. A linear SVM classifier was again trained on residual vlPFC activity patterns from the learning phase and tested on the creating phase. The same cross-phase decoding procedure was also applied within the left hippocampus (anatomical ROI). All decoding accuracies were statistically compared to chance (50%) using permutation tests. All analyses were conducted using the CoSMoMVPA toolbox, custom MATLAB (2020a) scripts, and Python (3.8) packages, including scikit-learn (version 0.24.1) and nilearn (version 0.8.1).

Ethic approval

All experiments received approval from the Institutional Review Board of Capital Normal University and/or the Ethics Committee of the Center for Biomedical Imaging Research at Tsinghua University. All participants provided informed consent in accordance with the guidelines approved by the Institutional Review Board.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

link