medeq.MED#

class medeq.MED(parameters, response_names=None, sampler=<class 'medeq.med.DVASampler'>, scheduler=LocalScheduler(python_executable=['/home/docs/checkouts/readthedocs.org/user_builds/med/envs/latest/bin/python']), seed=None, verbose=3)[source]#

Bases: object

Autonomously explore system responses and discover underlying physical laws or correlations.

Exploring systems responses can be done in one of two ways:

  1. Locally / manually: running experiments / simulations, then feeding results back to MED.

  2. Massively parallel: for complex simulations that can be launched in Python, MED can automatically change simulation parameters and run them in parallel on OS processes (locally) or SLURM jobs (distributed clusters).

A typical local workflow is:

  1. Define free parameters to explore as a pd.DataFrame - you can use the medeq.create_parameters function for this.

>>> import medeq
>>> parameters = medeq.create_parameters(
>>>     ["A", "B"],
>>>     minimums = [-5., -5.],
>>>     maximums = [10., 10.],
>>> )
>>> print(parameters)
   value  min   max
A    2.5 -5.0  10.0
B    2.5 -5.0  10.0
  1. Create a medeq.MED object and generate samples (i.e. parameter combinations) to evaluate - the default sampler covers the parameter space as efficiently as possible, taking previous results into account; use the MED.sample(n) method to get n samples to try.

>>> med = medeq.MED(parameters)
>>> print(med)
MED(seed=42)
---------------------------------------
parameters =
     value  min   max
  A    2.5 -5.0  10.0
  B    2.5 -5.0  10.0
response_names =
  None
---------------------------------------
sampler =   DVASampler(d=2, seed=42)
samples =   np.ndarray[(0, 2), float64]
responses = NoneType
epochs =    list[0, tuple[int, int]]
>>> med.sample(5)
array([[-3.33602115, -0.45639296],
       [ 5.55496225,  5.554965  ],
       [ 2.72771903, -3.48852585],
       [-0.45639308,  8.33602069],
       [ 8.48852568,  2.27228172]])
  1. For a local / offline workflow, these samples can be evaluated in one of two ways:

    • Evaluate samples manually, offline - i.e. run experiments, simulations, etc. and feed them back to MED.

    • Let MED evaluate a simple Python function / model.

>>> # Evaluate samples manually - run experiments, simulations, etc.
>>> to_evaluate = med.queue
>>> responses = [1, 2, 3, 4, 5]
>>> med.evaluate(responses)
>>>
>>> # Or evaluate simple Python function / model
>>> def instrument(sample):
>>>     return sample[0] + sample[1]
>>>
>>> med.evaluate(instrument)
>>> med.results
          A         B  variance   response
0 -3.336021 -0.456393  0.037924  -3.792414
1  5.554962  5.554965  0.111099  11.109927
2  2.727719 -3.488526  0.007608  -0.760807
3 -0.456393  8.336021  0.078796   7.879628
4  8.488526  2.272282  0.107608  10.760807

For a massively parallel workflow, e.g. using a complex simulation, all you need is a standalone Python script that:

  • Defines its free parameters between two “# MED PARAMETERS START / END” directives.

  • Runs the simulation in _any_ way - define simulation inline, launch it on a supercomputer and collect results, etc.

  • Defines a variable “response” for the simulated output of interest - either as a single number or a list of numbers (multi-response).

Here is a simple example of a MED script:

# In file `simulation_script.py`

# MED PARAMETERS START
import medeq

parameters = medeq.create_parameters(
    ["A", "B", "C"],
    [-5., -5., -5.],
    [10., 10., 10.],
)
# MED PARAMETERS END

# Run simulation in any way, locally, on a supercomputer and collect
# results - then define the variable `response` (float or list[float])
values = parameters["value"]
response = values["A"]**2 + values["B"]**2

If you have previous, separate experimental data, you can MED.augment the dataset of responses:

>>> # Augment dataset of responses with historical data
>>> samples = [
>>>     [1, 1],
>>>     [2, 2],
>>>     [1, 2],
>>> ]
>>>
>>> responses = [1, 2, 3]
>>> med.augment(samples, responses)

To discover the underlying equation, you need to install Julia (a beautiful, high-performance programming language) on your system and the PySR library:

  1. Install Julia manually (see https://julialang.org/downloads/).

  2. pip install pysr

  3. python -c 'import pysr; pysr.install()'

And now discover underlying equations!

>>> med.discover(binary_operators = ["+", "*"])
Hall of Fame:
-----------------------------------------
Complexity  Loss       Score     Equation
1           2.412e+01  5.296e-01  B
3           0.000e+00  1.151e+01  (A + B)
Attributes
parameterspd.DataFrame

A

response_nameslist[str] or None

A

samplerobject

Any Python object defining a method .sample(n, med) returning n samples to evaluate.

schedulercoexist.schedulers.Scheduler subclass or None

An object implementing the coexist.schedulers.Scheduler interface, defining a method for scheduling function evaluations in a massively parallel context. Only relevant if parameters is given as a user script.

samples(M, P) np.ndarray

The parameter samples generated.

responses(N, K) np.ndarray or None

The responses evaluated.

response_nameslist[str] or None

The response names, if given.

epochslist[tuple[int, int]]

The sample generation-evaluation batches, as indices ranges within samples and responses.

seedint

The random seed used for deterministic results.

verboseint

Verbosity level, between 0 and 5.

queuenp.ndarray

[Generated] The queue of unevaluated samples.

evaluated(N, P) np.ndarray

[Generated] The evaluated samples.

resultspd.DataFrame

[Generated] Neat DataFrame of samples tried and corresponding results.

variances(N, K) np.ndarray

[Generated] The uncertainties in the results found.

gplist[fvgp.gp.GP] or None

[Internal] List of Gaussian Process objects for each response.

srpysr.PySRRegressor or None

[Internal] PySR symbolic regressor for equation discovery.

pathsmedeq.med.MEDPaths or None

[Internal] Structure containing paths for saving the MED object.

__init__(parameters, response_names=None, sampler=<class 'medeq.med.DVASampler'>, scheduler=LocalScheduler(python_executable=['/home/docs/checkouts/readthedocs.org/user_builds/med/envs/latest/bin/python']), seed=None, verbose=3)[source]#

Methods

__init__(parameters[, response_names, ...])

augment(samples, responses)

Augment set of samples with manually-evaluated responses.

discover([response, binary_operators, ...])

Discover analytical models of varying complexity for the evaluated samples and responses.

evaluate([f])

Evaluate the current samples online or offline.

load(dirpath)

Load MED instance from a directory (eg "med_seed123").

plot_gp([response, resolution, verbose])

Plot interactive 2D slices of the response and uncertainty.

plot_response([f, colors])

plot_samples([colors, marker_size])

sample(n)

Generate and return n new parameter samples; they will be added to the .samples and .queue attribute.

save([directory])

Save all data about the MED object to disk in a given directory.

subset(select)

Select a subset of the current samples, returning a new MED object that can e.g.

Attributes

evaluated

queue

results

variances

property queue#
property evaluated#
property variances#
property results#
save(directory=None)[source]#

Save all data about the MED object to disk in a given directory.

static load(dirpath)[source]#

Load MED instance from a directory (eg “med_seed123”).

sample(n)[source]#

Generate and return n new parameter samples; they will be added to the .samples and .queue attribute.

subset(select)[source]#

Select a subset of the current samples, returning a new MED object that can e.g. discover equations for only the selected subset.

evaluate(f=None)[source]#

Evaluate the current samples online or offline.

There are 3 possible workflows:

  1. The user evaluated the MED.queue values separately (e.g. ran experiments) - then simply supply a NumPy array of responses.

>>> med.evaluate([1, 2, 3])
  1. A simple Python function is supplied that will be evaluated for each sample; the function must accept a single NumPy vector.

def instrument(params):
    # `params` is a NumPy array of parameter combinations to try
    return params[0] + params[1]

med.evaluate(instrument)
  1. If a separate Python script was provided when the class was created, nothing else is needed; this function will launch jobs and collect responses from the user script.

augment(samples, responses)[source]#

Augment set of samples with manually-evaluated responses.

discover(response=None, binary_operators=['+', '-', '*', '/', '^'], unary_operators=[], maxsize=50, maxdepth=None, niterations=100, populations=32, parsimony=0.0032, constraints=None, nested_constraints=None, denoise=False, select_k_features=None, turbo=True, equation_file='equations.csv', progress=False, **kwargs)[source]#

Discover analytical models of varying complexity for the evaluated samples and responses.

The most important equation discovery parameters are given to this method; a full reference is here: https://astroautomata.com/PySR/api/

Parameters
responsestr or int, optional

The response name or index to find an equation for; only needs to be specified for more than 1 reponse.

binary_operatorslist[str], default [“+”, “-”, “*”, “/”, “^”]

Operators taking two real numbers as input, using Julia syntax, e.g. +(x, y) === x + y. Can define custom operators like binaryfunc(x, y) = x^2 * y.

unary_operatorslist[str], default []

Operators taking a single real number as input, using Julia syntax, e.g. sin(x) or log(x). Can define custom operators like unaryfunc(x) = x^2.

maxsizeint, default 50

Maximum size of equation; smaller values correpond to shorter equations and faster searching.

maxdepthint, optional

If defined, limit depth of equation tree - i.e. stacked operations.

niterationsint, default 100

Number of equation finding iterations to run.

populationsint, default 32

Number of equation populations to use; a larger value corresponds to more equations being sampled and slower search.

parsimonyfloat, default 0.0032

How much to punish complexity; larger value prefers smaller expressions.

constraintsdict[str], optional

Optional dictionary of complexity constraints on operators, e.g. don’t allow the exponent of a power operator to have complexity larger than 2 with constraints = {“pow”: (-1, 2)}; -1 means any complexity.

nested_constraintsdict[str], optional

Number of times a combination of operators can be nested, e.g. {“sin”: {“cos”: 0}} specifies that cos cannot be found inside a sin.

denoisebool, default False

Symbolic regression should be robust with noise, but can further denoise data with a Gaussian Process.

select_k_featuresint, optional

If defined, use at most k parameters from the given samples.

turbobool, default True

Whether to use extra, experimental optimisations internally.

equation_filestr, default “equations.csv”

Where to save equations.

progressbool, default False

Whether to show interactive equation discovery progress.

**kwargsother keyword arguments

Other keyword arguments to be passed to PySR.

Returns
list[Equation]

List of equations found.

plot_gp(response=0, resolution=(32, 32), verbose=True)[source]#

Plot interactive 2D slices of the response and uncertainty.

plot_response(f=None, colors=['rgb(55,126,184)', 'rgb(77,175,74)', 'rgb(152,78,163)', 'rgb(255,127,0)', 'rgb(255,255,51)', 'rgb(166,86,40)', 'rgb(247,129,191)', 'rgb(153,153,153)'])[source]#
plot_samples(colors=['rgb(55,126,184)', 'rgb(77,175,74)', 'rgb(152,78,163)', 'rgb(255,127,0)', 'rgb(255,255,51)', 'rgb(166,86,40)', 'rgb(247,129,191)', 'rgb(153,153,153)'], marker_size=10, **kwargs)[source]#