Molecular data — global interaction network — drug response
EviCor supports simple user requests regarding correlation of molecular profiles in large scale datasets. This may be helpful in a range of contexts, from evaluating agreement between omics platforms to identifying potential markers of drug response.
The data from major public resources, The Cancer Genome Atlas and The Cancer Cell Line Encyclopedia can be either explored as plots of gene, protein, and drug sensitivity profiles or help in identifying correlates of response to anti-cancer drugs. Furthermore, such correlaions can be traced to clinical treatment profiles in TCGA. An extra feature of EviCor is network context. This is instrumental in both biomarker discovery and evaluation of candidate molecules in the network context.
Refer to this page to get answers to your questions and instructions for some typical tasks.
EviCor website offers a variety of interactive plots. User can use plots for explorative data analysis. Plots can be shared vi automatically generated urls.
|Bar||1D||Basic bar plot for categorial data|
|Pie||1D||Alternative representation for categorial data|
|Histogram||1D/2D||1D: basic histogram, 2D: stacked histogram (by categories)|
|Venn||2D||Venn diagrams for categorial data, shows intersection of 2 categories|
|Box||2D||Interactive box plots with whiskers and outliers|
|Kaplan-Meier plot||2D/3D||Kaplan-Meier survival plots, numeric and categorial data can be used to define groups|
|Scatter||2D/3D||2D plot for numerical data. In case of 3D plot, 3rd dimension is represented via data points color and/or shape, data for the 3rd dimension can be either numeric or categorial.|
|Original||Keep numbers as they are stored in EviCor DB|
|Square root (sqrt)||Apply square root tranformation to data|
|Logarithmic (log)||Apply log transformation to data|
|Beta||Convert values into beta-scale (methylation data only)|
|M (M-value)||Convert values into M-values (methylation data only)|
all. Use this variable if you want, for example, to test all genes from a certain platform as predictive variables.
|AIC||Akaike information criterion|
|BIC||Bayesian information criterion|
|RSS||Residual sum of squares|
|Accuracy||Proportion of correctly classified entities at 0.5 threshold|
|p||Statistical significance (for survival prediction only)|
model). Refer to glmnet documentation for model usage tips.
EviCor offers REST API for all functional tabs. Some functions, such as batch jobs, are available only via REST API. All scripts are accessible through https://www.evicor.org/cgi/script_name.
Script name: cor_datatables_json.cgi
Returns: correlations in JSON format
source: mandatory, correlation source (TCGA/CCLE);
datatype: mandatory, data type (e.g. GE, COPY...). Can be set to 'all';
cohort: mandatory, data cohort, can be set to 'all' for TCGA, must be set to 'all' for CCLE;
platform: mandatory, correlaion platform (e.g. Agilent), can be set to 'all';
screen: mandatory, correlation screen (e.g. GDSC1), can be set to 'all' for CCLE, must be set to 'all' for TCGA; id: mandatory, gene/protein/pathway/drug name;
fdr: mandatory, FDR, use dot as delimeter;
mindrug: mandatory, minimal number of samples treated with drug;
data_columns: mandatory, comma-separated list of columns to retrieve from SQL:
for CCLE: gene,feature,ancova_p_1x,ancova_p_2x_cov1,ancova_p_2x_feature,ancova_q_2x_feature,ancova_q_2x_feature
for TCGA: gene,feature,followup_part,interaction,drug,expr,n_patients,n_treated,followup,q
filter_columns: mandatory, columns to use for filtering (e.g. column with FDR values);
concat_operator: mandatory if more than one value specified in filter_columns, comma-separated list of logical operators to use for filtering conditions generation. For n columns in filter_columns this list should either contain n-1 operators (applied in order) or one operator (applied to all).
Usage tips: Option concat_operator may be confusing. The following short example is designed to clarify it. Let's say columns col1, col2 and col3 are chosen for filtering, FDR threshold is 0.05. If concat_operator is set to OR,AND the resulting condition will be: (col1<0.05) OR (col2<0.05) AND (col3<0.05). Setting this parameter to AND will result in the following filtering condition: (col1<0.05) OR (col2<0.05) AND (col3<0.05).
Script name: rplot.cgi
Returns: plot file name
type: mandatory, plot type (e.g. scatter);
source: mandatory, data source (TCGA or CCLE);
cohort: mandatory, used cohort;
datatypes: mandatory, list of datatypes spearated by commas;
platforms: mandatory, list of platforms separated by commas;
ids: mandatory, list of genes/antibodies/pathways/drugs names, must match length of datatypes/platforms, if platform has no id - id should be skipped (e.g. 'tp53,,mdm2');
codes: mandatory, TCGA codes (01, 06...) or metacodes (all, healthy...) for TCGA, 'all' or tissue name for CCLE;
scales: mandatory for the most types of plots, axis scales (original...).
Usage tips:This script returns plot filename and plot metadata. Plot can either be an html page (plotly plots, for the majority of plot types) or png image (for Venn diagrams). If only the plot name was returned - this is a sign of mistake. REST API does not provide mistake explanations.
Script name: model_predict.cgi
Returns: model name (without any file extenstions)
method: mandatory, method for creating models, at the moment only 'glmnet' is supported;
source: mandatory, data source for model creation (TCGA/CCLE);
cohort: mandatory, cohort for model creation;
multiopt: mandatory, TCGA codes or metacodes (for TCGA) or CCLE tissues;
xdatatypes: mandatory, comma-separated list of data types for independent variables;
xplatforms: mandatory, comma-separated list of platforms for independent variables;
xids: mandatory, lists of ids (if applicable) for independent variables, empty space if ids do not exist for the chosen platforms (refer to the examples for format);
rdatatype: mandatory, datatype for independent (response) variable;
rplatform: mandatory, platform for independent (response) variable;
rid: mandatory, response variable id (if applicable, otherwise empty);
family: mandatory for glmnet, glmnet family (gaussian/cox...), refer to glmnet documentation for further explanation;
measure: mandatory for glmnet with cross-validation, which loss should be used to pick the best model, refer to glmnet documentation;
alpha: mandatory for glmnet, mixing parameter describing balance between lasso and ridge regression (1 - lasso, 0 - ridge);
nlambda: mandatory for glmnet, number of lambda values, refer to glmnet documentation;
minlambda: mandatory for glmnet, min lambda value, refer to glmnet documentation;
validation: mandatory, if cross-validation should be used (binary flag);
validation_fraction: mandatory for glmnet with validation flag set to TRUE, numeric, defines share of records to be used for independent validation, use dot as decimal separator;
nfolds: mandatory for glmnet with validation flag set to TRUE, number of folds for N-folds cross-validation (see cv.glmnet in glmnet documentation);
standardize: mandatory for glmnet, if variables should be standardized (binary flag);
stat_file: mandatory, filename to which performance metrics should be saved (use auto to automatically assign the name - see the table below);
extended_output: mandatory, if model creation parameters should be saved to the specified stat_file;
header: mandatory, if specified stat_file should have header; recommended to be set to true. You can collect information on many models in one file, in this case specify this parameter as true only for the first query and make sure that all models belong to the same family and validation option is always the same - so columns are the same (also, extended_output should be set the same for all the models).
Usage tips: Using just model name, you can access graphical interpretation of the created model, RData model file, coefficients (in JSON format) and performance metrics (in JSON and csv format). If you have received value 'model578045416464' from the script, you can access the following files in https://www.evicor.org/pics/plots/ :
|model578045416464.RData||glmnet model for R environment|
|coeff.model578045416464.json||Model coefficients (with weights) in JSON format|
|model578045416464_model.png||Graphical abstract of the created model|
|model578045416464_training.png||Graphical abstract on training step|
|model578045416464_validation.png||Graphical abstract on validation step (if set)|
|model578045416464.csv||Performance metrics and some additional information in csv format with header|
|perf.model578045416464.json||Performance metrics in JSON format|
Script name: models_from_correlations_batch.cgi
Returns: error or nothing ('done' returned only after all models are created, so don't use synchronous calls with it, link with the results will be sent to the specified email)
source: mandatory, data source (TCGA/CCLE) for both correlation retrieval and model creation;
datatype: mandatory, same as datatype for cor_datatables_json.cgi; however, several comma-separated values are allowed, number should be the same as number of predictor types;
cohort: mandatory, same as cohort for cor_datatables_json.cgi; may be a comma-separated list, length matches with datatype parameter list;
screen: mandatory, same as screen for cor_datatables_json.cgi; similar to datatype and cohort may be a list;
id: mandatory, same as id for cor_datatables_json.cgi; similar to previous parameters;
fdr: mandatory, same as fdr for cor_datatables_json.cgi; similar to previous parameters;
mindrug: mandatory, same as mindrug for cor_datatables_json.cgi; unlike previous parameters, this one is always a single value;
columns: mandatory, same as columns for cor_datatables_json.cgi;
method: mandatory, same as method for model_predict.cgi;
model_cohort: mandatory, same as cohort for model_predict.cgi;
multiopt: mandatory, same as multiopt for model_predict.cgi;
xdatatypes: mandatory, same as xdatatypes for model_predict.cgi;
xplatforms: mandatory, same as xplatforms for model_predict.cgi;
additional_xids: optional, specifies which ids should be passed to glmnet even if they do not suffice filters; recommended, if no additional xids - leave blank;
rdatatype: mandatory, same as rdatatype for model_predict.cgi;
rplatform: mandatory, same as rplatform for model_predict.cgi;
rid: mandatory, same as rid for model_predict.cgi;
family: mandatory for glmnet, same as family for model_predict.cgi;
measure: mandatory for glmnet, same as measure for model_predict.cgi;
alpha: mandatory for glmnet, same as alpha for model_predict.cgi;
nlambda: mandatory for glmnet, same as nlambda for model_predict.cgi;
minlambda: mandatory for glmnet, same as minlambda for model_predict.cgi;
validation: mandatory, same as validation for model_predict.cgi;
validation_fraction: mandatory for glmnet with validation flag set to TRUE, same as validation_fraction for model_predict.cgi;
nfolds: mandatory for glmnet with validation flag set to TRUE, same as nfolds for model_predict.cgi;
standardize: mandatory, same as standardize for model_predict.cgi;
iter: mandatory, desired number of models;
stat_file: mandatory, name of the file without extenstion to save your data in (try to give a unique name to your stat_file);
extended_output: mandatory, binary flag, if set to TRUE - some additional info will be written into stat_file (such as source, rdatatype etc.);
mail: mandatory, provide the correct email, links with the results will be sent to it when your batch job is done;
Usage tips: Always try to specify a unique name for your output file! Files are created in the append mode, so if you (or someone) has used the specified file name before - you will get "mixed" results! When the job is done, the reults will be available in file https://www.evicor.org/pics/plots/your_file.csv
All the aforementioned functions are available through API specified in drugs.js module. Some parameters (like columns) are specified in druggable_config.js.
Below you can find valid names of datatypes, platforms, variables to use with REST API.
Marcela Franco, Ashwini Jeggari, Sylvain Peuget, Franziska Böttger, Galina Selivanova, Andrey Alexeyenko Prediction of response to anti-cancer drugs becomes robust via network integration of molecular data Sci Rep 9, 2379 (2019) doi: 10.1186/1471-2105-13-226.EviNet web resource:
Ashwini Jeggari, Zhanna Alekseenko, José Dias, Johan Ericson, and Andrey Alexeyenko EviNet: a web platform for network enrichment analysis with flexible definition of gene sets Nucleic Acids Res 2018 Jul 2;46(W1):W163-W170. doi: 10.1002/1878-0261.12350.