medeq.MED#
- class medeq.MED(parameters, response_names=None, sampler=<class 'medeq.med.DVASampler'>, scheduler=LocalScheduler(python_executable=['/home/docs/checkouts/readthedocs.org/user_builds/med/envs/latest/bin/python']), seed=None, verbose=3)[source]#
Bases:
objectAutonomously explore system responses and discover underlying physical laws or correlations.
Exploring systems responses can be done in one of two ways:
Locally / manually: running experiments / simulations, then feeding results back to MED.
Massively parallel: for complex simulations that can be launched in Python, MED can automatically change simulation parameters and run them in parallel on OS processes (locally) or SLURM jobs (distributed clusters).
A typical local workflow is:
Define free parameters to explore as a
pd.DataFrame- you can use themedeq.create_parametersfunction for this.
>>> import medeq >>> parameters = medeq.create_parameters( >>> ["A", "B"], >>> minimums = [-5., -5.], >>> maximums = [10., 10.], >>> ) >>> print(parameters) value min max A 2.5 -5.0 10.0 B 2.5 -5.0 10.0
Create a
medeq.MEDobject and generate samples (i.e. parameter combinations) to evaluate - the default sampler covers the parameter space as efficiently as possible, taking previous results into account; use theMED.sample(n)method to getnsamples to try.
>>> med = medeq.MED(parameters) >>> print(med) MED(seed=42) --------------------------------------- parameters = value min max A 2.5 -5.0 10.0 B 2.5 -5.0 10.0 response_names = None --------------------------------------- sampler = DVASampler(d=2, seed=42) samples = np.ndarray[(0, 2), float64] responses = NoneType epochs = list[0, tuple[int, int]]
>>> med.sample(5) array([[-3.33602115, -0.45639296], [ 5.55496225, 5.554965 ], [ 2.72771903, -3.48852585], [-0.45639308, 8.33602069], [ 8.48852568, 2.27228172]])
For a local / offline workflow, these samples can be evaluated in one of two ways:
Evaluate samples manually, offline - i.e. run experiments, simulations, etc. and feed them back to MED.
Let MED evaluate a simple Python function / model.
>>> # Evaluate samples manually - run experiments, simulations, etc. >>> to_evaluate = med.queue >>> responses = [1, 2, 3, 4, 5] >>> med.evaluate(responses) >>> >>> # Or evaluate simple Python function / model >>> def instrument(sample): >>> return sample[0] + sample[1] >>> >>> med.evaluate(instrument) >>> med.results A B variance response 0 -3.336021 -0.456393 0.037924 -3.792414 1 5.554962 5.554965 0.111099 11.109927 2 2.727719 -3.488526 0.007608 -0.760807 3 -0.456393 8.336021 0.078796 7.879628 4 8.488526 2.272282 0.107608 10.760807
For a massively parallel workflow, e.g. using a complex simulation, all you need is a standalone Python script that:
Defines its free parameters between two “# MED PARAMETERS START / END” directives.
Runs the simulation in _any_ way - define simulation inline, launch it on a supercomputer and collect results, etc.
Defines a variable “response” for the simulated output of interest - either as a single number or a list of numbers (multi-response).
Here is a simple example of a MED script:
# In file `simulation_script.py` # MED PARAMETERS START import medeq parameters = medeq.create_parameters( ["A", "B", "C"], [-5., -5., -5.], [10., 10., 10.], ) # MED PARAMETERS END # Run simulation in any way, locally, on a supercomputer and collect # results - then define the variable `response` (float or list[float]) values = parameters["value"] response = values["A"]**2 + values["B"]**2
If you have previous, separate experimental data, you can
MED.augmentthe dataset of responses:>>> # Augment dataset of responses with historical data >>> samples = [ >>> [1, 1], >>> [2, 2], >>> [1, 2], >>> ] >>> >>> responses = [1, 2, 3] >>> med.augment(samples, responses)
To discover the underlying equation, you need to install Julia (a beautiful, high-performance programming language) on your system and the PySR library:
Install Julia manually (see https://julialang.org/downloads/).
pip install pysrpython -c 'import pysr; pysr.install()'
And now discover underlying equations!
>>> med.discover(binary_operators = ["+", "*"]) Hall of Fame: ----------------------------------------- Complexity Loss Score Equation 1 2.412e+01 5.296e-01 B 3 0.000e+00 1.151e+01 (A + B)
- Attributes
- parameters
pd.DataFrame A
- response_names
list[str]orNone A
- sampler
object Any Python object defining a method
.sample(n, med)returningnsamples to evaluate.- scheduler
coexist.schedulers.SchedulersubclassorNone An object implementing the
coexist.schedulers.Schedulerinterface, defining a method for scheduling function evaluations in a massively parallel context. Only relevant ifparametersis given as a user script.- samples(
M,P)np.ndarray The parameter samples generated.
- responses(
N,K)np.ndarrayorNone The responses evaluated.
- response_names
list[str]orNone The response names, if given.
- epochs
list[tuple[int,int]] The sample generation-evaluation batches, as indices ranges within samples and responses.
- seed
int The random seed used for deterministic results.
- verbose
int Verbosity level, between 0 and 5.
- queue
np.ndarray [Generated] The queue of unevaluated samples.
- evaluated(
N,P)np.ndarray [Generated] The evaluated samples.
- results
pd.DataFrame [Generated] Neat DataFrame of samples tried and corresponding results.
- variances(
N,K)np.ndarray [Generated] The uncertainties in the results found.
- gp
list[fvgp.gp.GP]orNone [Internal] List of Gaussian Process objects for each response.
- sr
pysr.PySRRegressororNone [Internal] PySR symbolic regressor for equation discovery.
- paths
medeq.med.MEDPathsorNone [Internal] Structure containing paths for saving the MED object.
- parameters
- __init__(parameters, response_names=None, sampler=<class 'medeq.med.DVASampler'>, scheduler=LocalScheduler(python_executable=['/home/docs/checkouts/readthedocs.org/user_builds/med/envs/latest/bin/python']), seed=None, verbose=3)[source]#
Methods
__init__(parameters[, response_names, ...])augment(samples, responses)Augment set of samples with manually-evaluated responses.
discover([response, binary_operators, ...])Discover analytical models of varying complexity for the evaluated samples and responses.
evaluate([f])Evaluate the current samples online or offline.
load(dirpath)Load MED instance from a directory (eg "med_seed123").
plot_gp([response, resolution, verbose])Plot interactive 2D slices of the response and uncertainty.
plot_response([f, colors])plot_samples([colors, marker_size])sample(n)Generate and return
nnew parameter samples; they will be added to the.samplesand.queueattribute.save([directory])Save all data about the MED object to disk in a given directory.
subset(select)Select a subset of the current samples, returning a new
MEDobject that can e.g.Attributes
- property queue#
- property evaluated#
- property variances#
- property results#
- sample(n)[source]#
Generate and return
nnew parameter samples; they will be added to the.samplesand.queueattribute.
- subset(select)[source]#
Select a subset of the current samples, returning a new
MEDobject that can e.g. discover equations for only the selected subset.
- evaluate(f=None)[source]#
Evaluate the current samples online or offline.
There are 3 possible workflows:
The user evaluated the MED.queue values separately (e.g. ran experiments) - then simply supply a NumPy array of responses.
>>> med.evaluate([1, 2, 3])
A simple Python function is supplied that will be evaluated for each sample; the function must accept a single NumPy vector.
def instrument(params): # `params` is a NumPy array of parameter combinations to try return params[0] + params[1] med.evaluate(instrument)
If a separate Python script was provided when the class was created, nothing else is needed; this function will launch jobs and collect responses from the user script.
- discover(response=None, binary_operators=['+', '-', '*', '/', '^'], unary_operators=[], maxsize=50, maxdepth=None, niterations=100, populations=32, parsimony=0.0032, constraints=None, nested_constraints=None, denoise=False, select_k_features=None, turbo=True, equation_file='equations.csv', progress=False, **kwargs)[source]#
Discover analytical models of varying complexity for the evaluated samples and responses.
The most important equation discovery parameters are given to this method; a full reference is here: https://astroautomata.com/PySR/api/
- Parameters
- response
strorint, optional The response name or index to find an equation for; only needs to be specified for more than 1 reponse.
- binary_operators
list[str],default[“+”, “-”, “*”, “/”, “^”] Operators taking two real numbers as input, using Julia syntax, e.g.
+(x, y) === x + y. Can define custom operators likebinaryfunc(x, y) = x^2 * y.- unary_operators
list[str],default[] Operators taking a single real number as input, using Julia syntax, e.g.
sin(x)orlog(x). Can define custom operators likeunaryfunc(x) = x^2.- maxsize
int,default50 Maximum size of equation; smaller values correpond to shorter equations and faster searching.
- maxdepth
int, optional If defined, limit depth of equation tree - i.e. stacked operations.
- niterations
int,default100 Number of equation finding iterations to run.
- populations
int,default32 Number of equation populations to use; a larger value corresponds to more equations being sampled and slower search.
- parsimony
float,default0.0032 How much to punish complexity; larger value prefers smaller expressions.
- constraints
dict[str], optional Optional dictionary of complexity constraints on operators, e.g. don’t allow the exponent of a power operator to have complexity larger than 2 with constraints = {“pow”: (-1, 2)}; -1 means any complexity.
- nested_constraints
dict[str], optional Number of times a combination of operators can be nested, e.g. {“sin”: {“cos”: 0}} specifies that cos cannot be found inside a sin.
- denoisebool,
defaultFalse Symbolic regression should be robust with noise, but can further denoise data with a Gaussian Process.
- select_k_features
int, optional If defined, use at most k parameters from the given samples.
- turbobool,
defaultTrue Whether to use extra, experimental optimisations internally.
- equation_file
str,default“equations.csv” Where to save equations.
- progressbool,
defaultFalse Whether to show interactive equation discovery progress.
- **kwargs
otherkeywordarguments Other keyword arguments to be passed to PySR.
- response
- Returns
list[Equation]List of equations found.
- plot_gp(response=0, resolution=(32, 32), verbose=True)[source]#
Plot interactive 2D slices of the response and uncertainty.