To perform a formulae lists analysis (i.e. to visualize (interactive plots: formulae abundances, Van Krevelen diagram); calculate basic statistics (Average metrics (O/C, H/C, molecular weight), formulae compositions); Metrics (Euclidean, Cosine, Jaccard, Wasserstein); Multidimensional scaling (MDS), Principal component analysis (PCA); Non-negative matrix factorization (NMF) for input lists of molecular species)

or

to perform an analysis of formulae differences (i.e. to calculate formulae differences distributions; visualize (interactive plot: first 50000 of formulae differences within the distributions); calculate formulae differences compositions; Metrics (Euclidean, Cosine, Jaccard, Wasserstein); Multidimensional scaling (MDS), Principal component analysis (PCA); Non-negative matrix factorization (NMF) for input lists of molecular species)

— the input .csv file should include at least 3 formulae lists. If you need to analyze less than 3 formulae lists, see tips .

To compare against toy database (i.e. to calculate the distance between the input lists of molecular species and the samples present in the database in terms of the FDCEL measure)

— the input .csv file can include up to 10 formulae lists. If you need to compare more than 10 formulae lists, see the first question here.

The maximum file size for the input file is 20MB. The name of the file will be used as a title for each plot.

An example of the proper format for formulae lists input as well as examples of input .csv files can be found here.

After the calculation is completed, a user would receive an email notification by formulae.lists.analysis (keep in mind that depending on your email domain, the notification might be marked as spam; to avoid it, gmail domain is recommended). The notification will also specify the date and time of the upload that has been processed.

After receiving the notification, the registered users can access Results tab which contains the results of all completed calculations.

formulae abundances; Van Krevelen diagram.

Results of comparison against toy database are illustrated in the following examples:

1) DB comparison of formulae lists acquired via high-resolution mass spectrometry with various mild ionization techniques (SRFA-r,s,t,u,v,w) as well as SRFA-O (the formulae list from the dataset employed for DB sample record). Despite clear differences between the formulae lists (formulae abundances; Van Krevelen diagram) and even a lack of formulae overlap for different ionization methods, all of the formulae lists were correctly assigned to the corresponding sample (SRFA) except for SRFA-t and SRFA-r. SRFA-t mislabeling can be associated with its sheer size as it includes 16471 molecular species while the average molecular lists throughout the samples are under 6000 formulae. The number of unique FDs within SRFA-t is more than 2 million, while the average FDs number for samples in the database was under 300 thousand. Consequently, within the FDCEL space, SRFA-t appears close to all of the samples within the database. As to SRFA-r, its mass-spectrum was bimodal, which is not typical for SRFA and other NOM samples (formulae abundances; Van Krevelen diagram).

2) DB comparison of formulae lists acquired for various geochemical samples.