medeq.MED#

class medeq.MED(parameters, response_names=None, sampler=<class 'medeq.med.DVASampler'>, scheduler=LocalScheduler(python_executable=['/home/docs/checkouts/readthedocs.org/user_builds/med/envs/latest/bin/python']), seed=None, verbose=3)[source]#

Bases: object

Autonomously explore system responses and discover underlying physical laws or correlations.

Exploring systems responses can be done in one of two ways:

Locally / manually: running experiments / simulations, then feeding results back to MED.
Massively parallel: for complex simulations that can be launched in Python, MED can automatically change simulation parameters and run them in parallel on OS processes (locally) or SLURM jobs (distributed clusters).

A typical local workflow is:

Define free parameters to explore as a pd.DataFrame - you can use the medeq.create_parameters function for this.

>>> import medeq
>>> parameters = medeq.create_parameters(
>>>     ["A", "B"],
>>>     minimums = [-5., -5.],
>>>     maximums = [10., 10.],
>>> )
>>> print(parameters)
   value  min   max
A    2.5 -5.0  10.0
B    2.5 -5.0  10.0

Create a medeq.MED object and generate samples (i.e. parameter combinations) to evaluate - the default sampler covers the parameter space as efficiently as possible, taking previous results into account; use the MED.sample(n) method to get n samples to try.

>>> med = medeq.MED(parameters)
>>> print(med)
MED(seed=42)
---------------------------------------
parameters =
     value  min   max
  A    2.5 -5.0  10.0
  B    2.5 -5.0  10.0
response_names =
  None
---------------------------------------
sampler =   DVASampler(d=2, seed=42)
samples =   np.ndarray[(0, 2), float64]
responses = NoneType
epochs =    list[0, tuple[int, int]]

>>> med.sample(5)
array([[-3.33602115, -0.45639296],
       [ 5.55496225,  5.554965  ],
       [ 2.72771903, -3.48852585],
       [-0.45639308,  8.33602069],
       [ 8.48852568,  2.27228172]])

For a local / offline workflow, these samples can be evaluated in one of two ways:
- Evaluate samples manually, offline - i.e. run experiments, simulations, etc. and feed them back to MED.
- Let MED evaluate a simple Python function / model.

>>> # Evaluate samples manually - run experiments, simulations, etc.
>>> to_evaluate = med.queue
>>> responses = [1, 2, 3, 4, 5]
>>> med.evaluate(responses)
>>>
>>> # Or evaluate simple Python function / model
>>> def instrument(sample):
>>>     return sample[0] + sample[1]
>>>
>>> med.evaluate(instrument)
>>> med.results
          A         B  variance   response
0 -3.336021 -0.456393  0.037924  -3.792414
1  5.554962  5.554965  0.111099  11.109927
2  2.727719 -3.488526  0.007608  -0.760807
3 -0.456393  8.336021  0.078796   7.879628
4  8.488526  2.272282  0.107608  10.760807

For a massively parallel workflow, e.g. using a complex simulation, all you need is a standalone Python script that:

Defines its free parameters between two “# MED PARAMETERS START / END” directives.
Runs the simulation in _any_ way - define simulation inline, launch it on a supercomputer and collect results, etc.
Defines a variable “response” for the simulated output of interest - either as a single number or a list of numbers (multi-response).

Here is a simple example of a MED script:

# In file `simulation_script.py`

# MED PARAMETERS START
import medeq

parameters = medeq.create_parameters(
    ["A", "B", "C"],
    [-5., -5., -5.],
    [10., 10., 10.],
)
# MED PARAMETERS END

# Run simulation in any way, locally, on a supercomputer and collect
# results - then define the variable `response` (float or list[float])
values = parameters["value"]
response = values["A"]**2 + values["B"]**2

If you have previous, separate experimental data, you can MED.augment the dataset of responses:

>>> # Augment dataset of responses with historical data
>>> samples = [
>>>     [1, 1],
>>>     [2, 2],
>>>     [1, 2],
>>> ]
>>>
>>> responses = [1, 2, 3]
>>> med.augment(samples, responses)

To discover the underlying equation, you need to install Julia (a beautiful, high-performance programming language) on your system and the PySR library:

Install Julia manually (see https://julialang.org/downloads/).
pip install pysr
python -c 'import pysr; pysr.install()'

And now discover underlying equations!

>>> med.discover(binary_operators = ["+", "*"])
Hall of Fame:
-----------------------------------------
Complexity  Loss       Score     Equation
1           2.412e+01  5.296e-01  B
3           0.000e+00  1.151e+01  (A + B)

Attributes

parameterspd.DataFrame: A
response_nameslist[str] or None: A
samplerobject: Any Python object defining a method .sample(n, med) returning n samples to evaluate.
schedulercoexist.schedulers.Scheduler subclass or None: An object implementing the coexist.schedulers.Scheduler interface, defining a method for scheduling function evaluations in a massively parallel context. Only relevant if parameters is given as a user script.
samples(M, P) np.ndarray: The parameter samples generated.
responses(N, K) np.ndarray or None: The responses evaluated.
response_nameslist[str] or None: The response names, if given.
epochslist[tuple[int, int]]: The sample generation-evaluation batches, as indices ranges within samples and responses.
seedint: The random seed used for deterministic results.
verboseint: Verbosity level, between 0 and 5.
queuenp.ndarray: [Generated] The queue of unevaluated samples.
evaluated(N, P) np.ndarray: [Generated] The evaluated samples.
resultspd.DataFrame: [Generated] Neat DataFrame of samples tried and corresponding results.
variances(N, K) np.ndarray: [Generated] The uncertainties in the results found.
gplist[fvgp.gp.GP] or None: [Internal] List of Gaussian Process objects for each response.
srpysr.PySRRegressor or None: [Internal] PySR symbolic regressor for equation discovery.
pathsmedeq.med.MEDPaths or None: [Internal] Structure containing paths for saving the MED object.

__init__(parameters, response_names=None, sampler=<class 'medeq.med.DVASampler'>, scheduler=LocalScheduler(python_executable=['/home/docs/checkouts/readthedocs.org/user_builds/med/envs/latest/bin/python']), seed=None, verbose=3)[source]#

Methods

`__init__`(parameters[, response_names, ...])
`augment`(samples, responses)	Augment set of samples with manually-evaluated responses.
`discover`([response, binary_operators, ...])	Discover analytical models of varying complexity for the evaluated samples and responses.
`evaluate`([f])	Evaluate the current samples online or offline.
`load`(dirpath)	Load MED instance from a directory (eg "med_seed123").
`plot_gp`([response, resolution, verbose])	Plot interactive 2D slices of the response and uncertainty.
`plot_response`([f, colors])
`plot_samples`([colors, marker_size])
`sample`(n)	Generate and return `n` new parameter samples; they will be added to the `.samples` and `.queue` attribute.
`save`([directory])	Save all data about the MED object to disk in a given directory.
`subset`(select)	Select a subset of the current samples, returning a new `MED` object that can e.g.

Attributes

`evaluated`
`queue`
`results`
`variances`

property queue#

property evaluated#

property variances#

property results#

save(directory=None)[source]#: Save all data about the MED object to disk in a given directory.

static load(dirpath)[source]#: Load MED instance from a directory (eg “med_seed123”).

sample(n)[source]#: Generate and return n new parameter samples; they will be added to the .samples and .queue attribute.

subset(select)[source]#: Select a subset of the current samples, returning a new MED object that can e.g. discover equations for only the selected subset.

evaluate(f=None)[source]#

Evaluate the current samples online or offline.

There are 3 possible workflows:

The user evaluated the MED.queue values separately (e.g. ran experiments) - then simply supply a NumPy array of responses.

>>> med.evaluate([1, 2, 3])

A simple Python function is supplied that will be evaluated for each sample; the function must accept a single NumPy vector.

def instrument(params):
    # `params` is a NumPy array of parameter combinations to try
    return params[0] + params[1]

med.evaluate(instrument)

If a separate Python script was provided when the class was created, nothing else is needed; this function will launch jobs and collect responses from the user script.

augment(samples, responses)[source]#: Augment set of samples with manually-evaluated responses.

discover(response=None, binary_operators=['+', '-', '*', '/', '^'], unary_operators=[], maxsize=50, maxdepth=None, niterations=100, populations=32, parsimony=0.0032, constraints=None, nested_constraints=None, denoise=False, select_k_features=None, turbo=True, equation_file='equations.csv', progress=False, **kwargs)[source]#

Discover analytical models of varying complexity for the evaluated samples and responses.

The most important equation discovery parameters are given to this method; a full reference is here: https://astroautomata.com/PySR/api/

Parameters

responsestr or int, optional: The response name or index to find an equation for; only needs to be specified for more than 1 reponse.
binary_operatorslist[str], default [“+”, “-”, “*”, “/”, “^”]: Operators taking two real numbers as input, using Julia syntax, e.g. +(x, y) === x + y. Can define custom operators like binaryfunc(x, y) = x^2 * y.
unary_operatorslist[str], default []: Operators taking a single real number as input, using Julia syntax, e.g. sin(x) or log(x). Can define custom operators like unaryfunc(x) = x^2.
maxsizeint, default 50: Maximum size of equation; smaller values correpond to shorter equations and faster searching.
maxdepthint, optional: If defined, limit depth of equation tree - i.e. stacked operations.
niterationsint, default 100: Number of equation finding iterations to run.
populationsint, default 32: Number of equation populations to use; a larger value corresponds to more equations being sampled and slower search.
parsimonyfloat, default 0.0032: How much to punish complexity; larger value prefers smaller expressions.
constraintsdict[str], optional: Optional dictionary of complexity constraints on operators, e.g. don’t allow the exponent of a power operator to have complexity larger than 2 with constraints = {“pow”: (-1, 2)}; -1 means any complexity.
nested_constraintsdict[str], optional: Number of times a combination of operators can be nested, e.g. {“sin”: {“cos”: 0}} specifies that cos cannot be found inside a sin.
denoisebool, default False: Symbolic regression should be robust with noise, but can further denoise data with a Gaussian Process.
select_k_featuresint, optional: If defined, use at most k parameters from the given samples.
turbobool, default True: Whether to use extra, experimental optimisations internally.
equation_filestr, default “equations.csv”: Where to save equations.
progressbool, default False: Whether to show interactive equation discovery progress.
**kwargsother keyword arguments: Other keyword arguments to be passed to PySR.

Returns

list[Equation]: List of equations found.

plot_gp(response=0, resolution=(32, 32), verbose=True)[source]#: Plot interactive 2D slices of the response and uncertainty.

plot_response(f=None, colors=['rgb(55,126,184)', 'rgb(77,175,74)', 'rgb(152,78,163)', 'rgb(255,127,0)', 'rgb(255,255,51)', 'rgb(166,86,40)', 'rgb(247,129,191)', 'rgb(153,153,153)'])[source]#

plot_samples(colors=['rgb(55,126,184)', 'rgb(77,175,74)', 'rgb(152,78,163)', 'rgb(255,127,0)', 'rgb(255,255,51)', 'rgb(166,86,40)', 'rgb(247,129,191)', 'rgb(153,153,153)'], marker_size=10, **kwargs)[source]#