Computational & Technology Resources
an online resource for computational,
engineering & technology publications
Civil-Comp Proceedings
ISSN 1759-3433
CCP: 73
Edited by: B.H.V. Topping
Paper 104

Geotechnical Parameter Prediction from Large Data Sets

I. Davey-Wilson

School of Computing and Mathematical Sciences, Oxford Brookes University, Oxford, England

Full Bibliographic Reference for this paper
I. Davey-Wilson, "Geotechnical Parameter Prediction from Large Data Sets", in B.H.V. Topping, (Editor), "Proceedings of the Eighth International Conference on Civil and Structural Engineering Computing", Civil-Comp Press, Stirlingshire, UK, Paper 104, 2001. doi:10.4203/ccp.73.104
Keywords: geotechnical, laboratory, testing, parameters, database, computer system, similarity function.

Geotechnical laboratory testing of soils produces two main categories of results: soil identification (classification) parameters and soil behaviour parameters. Although no two soils are the same, there are similarities in behaviour for similar types of soil. From a large database of geotechnical test results it is likely that a knowledge of the soil identification parameters will suggest a certain range of behaviour parameters for similar soils. When presented with a new soil from a site investigation, an engineer would be interested in knowing which other soils have similar parameters. With a large database of results, soils with similar properties to a new soil can be found. Also, preliminary site investigation results (classification parameters) could be used to suggest unknown parameters of the new soil. This paper describes research into a spreadsheet-based parameter analysis system.

Geotechnical laboratory test results, accumulated by testing laboratories, consultants and clients, have been used for database population and analysis. 19 parameters have been established in the database as useful indicators for cross referencing and forecasting. These parameters are the ones commonly used in many site investigations, rather than those found only in finite element, critical state or research work. A large, well populated, database was essential for this work, so availability of sufficient examples of each parameter was a consideration.

The system runs on a standard Excel spreadsheet. Around 800 lines (spreadsheet rows) of data have been amassed from a number of sources. Each line contains data from one specific soil specimen at one depth in a borehole or trial pit. Parameters are grouped in spreadsheet columns. Inevitably there are large gaps in most lines of data as many specimens will only have been use for a few soil tests. The system has been designed to cope with sparse data. There are three main system function: database browsing and analysis; finding a `best' match to a user line of data; forecasting parameters based on a new input soil.

As the system is mounted on a common spreadsheet, all the standard functions are available. Of particular interest is the ability to select, from a column of data (i.e. one parameter), soils with a particular parameter or range of parameters. The system allows ranges of several parameters to be selected thus enabling properties of soils to be compared with a new soil.

A similarity function is used to find the nearest match of a new input line to lines in the database. Details of a new soil are input, together with a weighting value to give a relative importance to each parameter so that matching can be biased more on particular parameters than others. The similarity function generates a value, between 0 and 1.0, that gives the relative similarity between the new input soil data and each line in the database. The similarity value for each database line is constructed from values of similarity for each parameter compared to the same parameter in the input line, factored for the weighting function. The similarity values for each line are ranked and a `top ten' list of best matches is output.

The forecasting mechanism uses a covariance function to derive soil parameter values from the existing database and the input soil. A database covariance is established between each column of parameters and all the other parameters to derive a correlation function matrix. This matrix presents a view of the parameter pairs that show good correlations. When a new soil parameter value is inserted in the input line, the covariance relationships work to forecast new values for all parameters.

The system is contained on three sheets within the spreadsheet workbook. The first sheet holds the database itself together with a matrix of similarity function calculations that derive the similarity value for each cell entry. The second sheet holds the main interface. This consists of a line for new data entry (for comparison with the database) and cells for weight value entry. Output blocks consist of matrices of data showing best comparisons with the database and a variety of forecast predictions for soil parameters. The third sheet holds statistical analyses and correlation functions.

Collecting and analysing data has proved to be a difficult and frurtrating job. A great deal of data is held by consulting engineers overseeing projects; geotechnical engineers responsible for site investigations; soil testing laboratories undertaking site investigation; civil engineering contractors executing construction; clients financing the projects and other organisations. However, obtaining copies of this data was difficult as many custodians deemed the information to be confidential and therefore not for distribution. A hit rate of about one in ten was achieved from those organisations contacted. Information came in many forms but was primarily contained in site investigation reports. Occasionally the soils testing data was in a concise tabular appendix enumerating all tests and results for each soil specimen - in SI units, Sadly, the reality was often of unindexed, unreferenced, imprecise data in a variety of non-standard units. However, with ruthless editing enough data was gleaned to make the project viable.

The quality of the data obtained was considered. As data quality from testing and reporting was unknown it was decided to leave all data in its raw form and use the correlation function as a guide to consistency. Graphs were drawn of each parameters against each other so correlations could be rated visually. A few lines containing far outliers were removed from the database where it is possible that errors could have been made in testing or reporting.

purchase the full-text of this paper (price £20)

go to the previous paper
go to the next paper
return to the table of contents
return to the book description