You don't need to be an 'investor' to invest in Singletrack: 6 days left: 95% of target - Find out more
I'm currently in a lab trying to figure out a new way of detecting/quantifying something but there is a lot going on in my data - too much to model with "normal" maths. I've been told that Partial Least Squares Regression may be a way of making sense of the data but the last proper maths I did was at A-level around half a lifetime ago. PLSR looks to be pretty involved and all the stuff I've found online so far assumes a level of knowledge I'm not sure I have.
Can anyone recommend a primer for PLSR that builds on maths at A-level standard?
Least squares regression is a very common form of analysis.
PLSR - no idea.
In my experience the data is plugged into special software and you press Run. Voila.
(Or the statistician does all this for you in their wee office in Hamburg or wherever statisticians live).
The challenge is working out whether it's a valid analysis or not.
I’m aiming you’ve put it into excel and tried all the lines of best fit options
I think these are least squares regressions but it’s an easy start
It's not really a single line-of-best-fit situation unfortunately. I've got a lot of competing information, too much to filter through manually, and I need to separate the data from the noise and then figure out a correlation from that using datapoints that appear to be unrelated at first glance. When I described the problem to someone he said "PLSR is what you need" but then didn't elaborate.
To get useful advice about statistical analysis, you really need to explain what your data represents (conceptually at least) and its structure. The data should have been collected with a clear idea of how it would be analyzed. Asking random people on the internet for advice after data collection is unlikely to produce good results.
For example, if you have a single measurement of the height and weight of individual persons (i.e. cross-sectional data across a population) that would be different from having repeated measurements for individuals undergoing some treatment (i.e. longitudinal data). The two datasets would be collected for different purposes and would require different analyses.
It's (potentially) quantitative FTIR spectra, looking at ppm concentrations of monoethylene glycol in water and taken from a flow cell of known pathlength. We currently get our results by sending samples to an external lab for them to analyse and it can take weeks. It would be hugely to our advantage if we could obtain results within minutes as we could then react to process conditions, rather than just having week-old data to confirm what we thought was happening at the time. We want to try to create a model that uses the FTIR peaks at various wavelengths to predict the MEG concentration. There are more components in our sample than just MEG and water so we have a lot of potentially extraneous data, but those additional components may also contribute to a matrix effect which then changes the transmittance of the peaks we're looking for meaning a simple single-variable correlation is not feasible.
With specialized analyses like that, it's best to go through the published literature and see what other people have done. It's often cheaper to pay to take a course or workshop than to spend weeks or months trying to learn it by yourself. For example, there may be R packages that will do it for you, but you will need to learn how to manage the dataflow and how to interpret the results.
This is chemometrics. Tons of literature on use with vibrational spectra and in process control.
It's not my thing but drop me a PM if you need to get hold of some journal papers
See this paper for something similar.
Blimey. The knowledge in this place astounds me sometimes!
Thinking out loud, because I don’t know your process, what kit you have or how valuable this may be…
Would you be better off using HPLC with a UV-Vis detector? With a few known standards you’d then be able to work out the concentrations of everything you have using the Beer-Lambert law.
I'm guessing that the off-site analyses they have to wait for are done by something like HPLC so they get a definitive but delayed answer. However, they have on site FTIR so they'd like to use that but lots of components of the sample contribute to the FTIR spectrum so it's complicated and they need some sort of multivariate approach to separate out the signal they want.
I must admit, if I owned their lab, I'd be asking the boss for HPLC although you can do FTIR online so maybe they are just looking into the operating process plant, whereas you might have to tweak things to get samples for HPLC. But they do that anyway.....
Blimey. The knowledge in this place astounds me sometimes!
Same, that's why I asked the question here.
Offsite analysis is GC-FID (iirc) rather than HPLC, but we have our own issues with the lab that performs this. Very few places in country that cater to our needs so they've pretty much got a monopoly and as a consequence are a bit shit. I'll ask about purchasing a HPLC for our lab though, would be very useful for more than just this.
And thank you both for pointing me in the direction of chemometrics. Extremely helpful indeed.