Erroneous Mode value
I am teaching students to use JASP and had them calculate descriptive stats for a simple data set.
Five of the students came up with the same, incorrect mode of 42.23, which is not even in the data set at all. If I run a frequency distribution, I see the correct value of 45. My own use of JASP derived the correct answer.
That this was replicated 5x is very confusing, while the majority of the class had correct calculations.
The variable was set to Scale.
Please advise.
Comments
We did make some changes to the mode calculation recently. When you say: "the correct value of 45", I assume that the value 45 occurs the most? This makes sense if the data is discrete. But if the data is continuous (scale), the exact same value cannot occur multiple times (other than through rounding), so we need to fit a distribution and determine the highest point. I will attend our expert to this issue, maybe he has some words of wisdom as well.
Hi cedmonds,
As EJ said,
The mode for a discrete variable (Nominal/ Ordinal) is the most frequent value.
The mode for a continuous variable (Scale) is the x-coordinate at which the density is highest.
Now, in previous versions of JASP, we always used the discrete definition, but this was clearly incorrect for continuous variables (it would show nonsense). In the latest version, we use the type of variable to determine which definition of the mode to use. So if the students used different versions of JASP, they may have gotten different answers here. In descriptives, it's possible to change the scale of the variable by clicking the icon next to the variable:
I hope that resolves the issue, please let us know if anything is still unclear.
Best,
Don
Wow nice I didn't know you could change the variable type on the fly
EJ & Vandenman,
I never considered that there would be a different mode for continuous versus discrete data. The most frequent value is the most frequent, whatever the case. All of the data were whole numbers, and constructed by me to illustrate some descriptive stats principles.
Students were less forthright in reporting the presumed Mode error than I had thought them to be. It turns out that everyone had the mode of 42.23, not just 5 of them. They are running JASP 0.95.1 and the Column Type was set to Scale.
I asked them to switch the Column type from Scale to Ordinal, and the correct Mode of 45 was reported.
As an odd aside, I appear to be running JASP 0.19.1 and was able to calculate what I believed to be the correct Mode (45) in the data set, Column Type set to Scale.
I am puzzled how my version of JASP and the students could be so far apart when they were downloaded in what I believed to be only a few months apart.
In any case, many thanks for your swift and helpful insight.
I have much to learn, "and miles to go before I sleep"
Chris
In a continuous distribution, all values are unique, so there is no single value that is the most frequent. The reason why they may appear to be the same is rounding. On the other hand, modes in continuous distributions are well defined. From Wikipedia:
"For a sample from a continuous distribution, such as [0.935..., 1.211..., 2.430..., 3.668..., 3.874...], the concept is unusable in its raw form, since no two values will be exactly the same, so each value will occur precisely once. In order to estimate the mode of the underlying distribution, the usual practice is to discretize the data by assigning frequency values to intervals of equal distance, as for making a histogram, effectively replacing the values by the midpoints of the intervals they are assigned to. The mode is then the value where the histogram reaches its peak. For small or middle-sized samples the outcome of this procedure is sensitive to the choice of interval width if chosen too narrow or too wide; typically one should have a sizable fraction of the data concentrated in a relatively small number of intervals (5 to 10), while the fraction of the data falling outside these intervals is also sizable. An alternate approach is kernel density estimation, which essentially blurs point samples to produce a continuous estimate of the probability density function which can provide an estimate of the mode."
We use the density estimation approach.
EJ
And it is a recent change, so that explain the difference. :-)
EJ
I was unfamiliar with the density estimation approach, and presumed that even a continuous distribution could (could!) have repeated values in it. My bad.
"and miles to go before I sleep"
Many thanks!
I think I see a difficulty here in that JASP does not have a way to specify a discrete, ratio scale.
Take, for example, a set of counts in the zero-to-ten: [8, 3, 5, 3, 3, 2, 4, 2, 7, 7, 3, 6] . Because they are counts, they are discrete and not continuous. The mode is 3. However, as I understand it, JASP permits computing that mode (3) only if the variable is coded, incorrectly, as nominal or ordinal.
R
True, the measurement scale classification is incomplete. This was a conscious choice at the time, to keep things simple (and more in line with SPSS, probably). We could possibly change it -- I am not sure whether adding a scale is easy or whether it would be a ton of work. Maybe a good suggestion for our GitHub page.