Motivation:

The overall aims of this work are to provide users automatically with suggestions about similar papers and about connections between papers, and to present these similarities and connections in ways that are both meaningful and searchable.

In order to achieve this, we integrated three different approaches to linking and analysing the LAK dataset. These were:

  1. network analysis 
  2. rhetorical analysis 
  3. visualization of the results 

In order to improve the precision of information about the content of the connections among the papers, we have carried out advanced semantic and rhetorical analysis of the LAK dataset. On the one hand, we have extracted similar concepts in order to provide topical similarity indicators and, on the other hand, we have extracted salient sentences that point out the main research topics of the papers. We applied the same statistical analysis to the reduced data (the list of concepts) and to the list of salient sentences. The results from the reduced data yielded partially different results.

Addendum

XIP DASHBOARD

Cohere SQL dump

Analysing the network of Papers

network edm lak

 

Rhetorical Analysis

 First tests on the validity of similarity detection

distributions

 

 

Similar articles according to SNA analysis

Full text similarity measure

XIP sentence similarity measure

XIP concept similarity measure

Title similarity:

number of common words[1]

Author similarity:

number of common authors

Keyword similarity:

number of common keywords1

Reference similarity:

number of common references

The highest similarity for full text

Si_2 Lak_12_15

0.76

0.34

0.49

1

2/2

2/2

3

39/101

39/56

Mutual: -

The highest similarity for XIP sentences

Si_11

Lak_11_II_04

-

0.60

0.51

1

-

2.5

3/43

3/22

Mutual: -

The highest similarity for XIP concepts

Si_4

Lak_12_07

-

0.58

0.64

1

2/2

2/2

1

10/42

10/25

Mutual: Lak_12_07 refers to Si_4

The highest similarity for full text without XIP sentence or concept similarity

Si_10

Lak_11_11

0.68

-

-

1,5

4/4

4/6

1

3/38

3/34

Mutual: Si_10 refers to Lak_11_11

The highest similarity for XIP concept without full text or XIP sentence similarity

Si_2

Lak_12_80

-

-

0.37

0,5

-

2,5

101/3

23/3

Mutual:

Lak_12_80 refers to Si_2

The highest similarity for XIP sentence without full text or XIP concept similarity

Lak_12_36

Lak_11_07

-

0.50

-

1

3/6

3/4

2

9/30

9/22

Mutual: -

Table 1. Independent similarity indicators for the LAK collection

 

Criterion

Similar articles according to SNA analysis

Full text similarity measure

XIP sentence similarity measure

XIP concept similarity measure

Title similarity:

number of common words

Author similarity:

number of common authors

Keyword similarity:

number of common keywords1

Reference similarity:

number of common references

The highest similarity for full text

edm11_17-edm11_10p

0.97809

0.86029

0.87906

3

2/2

2/2

3

7/16

7/8

both Mutual: papers refer to the other

The highest similarity for XIP sentences

edm08_23-edm09_abbas

0.91749

0.89270

0.62412

1/2

2/2

2/2

 

8/13

8/18

Mutual: -

The highest similarity for XIP concepts

edm11_17-edm11_10p

0.97809

0.86029

0.87906

3

2/2

2/2

3

7/16

7/8

Mutual: both papers refer to the other

The highest similarity for full text without XIP sentence or concept similarity

edm10_47-edm10_53

0.90043

   

-

5/5

5/5

 

14/20

14/17

Mutual: -

The highest similarity for XIP concept without full text or XIP sentence similarity

edm09_prata-edm12_15

   

0.5628

0

0/6

0/4

 

0/26

0/23

Mutual: -

The highest similarity for XIP sentence without full text or XIP concept similarity

edm08_18-edm11_19

-

0.5457

 

0

0/3

0/2

 

1/9

1/20

Mutual: -

 Table 2. Independent similarity indicators for the EDM collection

 

Criterion

Similar articles according to SNA analysis

Full text similarity measure

XIP sentence similarity measure

XIP concept similarity measure

Title similarity:

number of common words[2]

Author similarity:

number of common authors

Keyword similarity:

number of common keywords1

Reference similarity:

number of common references

The highest similarity for full text

edm11_08p - Lak_11_02

0.83618

0.79762

0.62395

3

1/1

1/1

4

3/4

2/24

MUTUAL: edm11_08p refers to Lak_11_02

The highest similarity for XIP sentences in different collections

Si_5 -             

edm11_19p

-

0.84330

0.82693

1

4/5

4/4

3

47/0

1/0

MUTUAL: -

The highest similarity for XIP concepts in different collections

Si_5 -             

edm11_19p

-

0.84330

0.82693

       

High similarity for XIP sentences in different collections

edm11_25 - Lak_11_02

0.69611

0.74107

0.52578

1

0/2

0/1

1/2

1/7

1/24

MUTUAL: -

 Table 3. Independent similarity indicators for the LAK + EDM collections

In Tables 1, 2 and 3 we intend to compare the similarities detected by our method with some independent traditional similarity indicators: title, author, keyword and reference similarities. We also carried out some preliminary content analysis in cases where our method provides similarities, but the traditional methods do not. Since we do not have at this moment at our disposal tools for a systematic evaluation, we have chosen some special cases, and checked them manually.

In the three tables we highlighted the cases where the pairs of papers detected with our method show evident similarities according to the independent similarity indicators: they have the same authors, they have a significant overlap between their references or they mutually refer to one another.

We observe that in every table the papers with the highest similarity (full text, XIP concept or XIP sentence) have highlighted cases, which justifies the similarity detected.

In the cases of only XIP concept similarities without independent similarity indicators (edm09_prata-edm12_15)we do not have a method for comparison at this moment.

As for the cases of XIP sentence similarities without independent similarity indicators we carried out content comparison in some pairs of papers: In order to see if there are deeper underlying similarities between the two papers even if they do not have highlighted cases, we compared the XIP - i.e. the rhetorically salient sentences - sentences in the two articles that contain the words that are responsible for the similarity measures. A few of these sentences demonstrate interesting related ideas between two papers.

The following tables show the result of our analysis: sentence pairs that show interesting similar relevant ideas. We highlighted in bold the relevant comparable parts of the sentences.

Si_11

Lak_11_II_04

Comment

We were invited to undertake a current state analysis of enterprise LMS use in a large research - intensive university , to provide data to inform and guide an LMS review and strategic planning process .

This paper shows an approach using Virtual Machines by which a set of events occurring outside of the LMS are recorded and sent to a central server in a scalable and unobtrusive manner .

Si_11 studies LMS use, whereas Lak_11_II_04 studies events outside LMS. This contrast between the two articles allows potentially a comparison between events within and outside LMS.

Although the data gathered in this case study analysis suggest that the potential use of the enterprise LMS is yet to be realized at this institution , it nevertheless confirms that the LMS is central to the student learning experience - a reality that should highlight the importance of careful planning for future learning technology uptake .

In this paper a context has been described in which in order to assess the degree of interaction that students are having with a previously detected set of tools and among themselves , the LMS offers a very poor coverage .

Indeed, the two articles provide contrasting observations concerning the role of LMS, which are conveyed by the bold expressions

We know from meta - analytic studies of decades of available data , however , that the quality of education offered by an institution is not predicted by the size of institutional budgets , numbers of or dollar values of research awards , or even by measures such as student :

Citing a major 2004 study ( Gansemer - Topf , Saunders , Schuh , & Shelley , 2004 ) , Gibbs ( 2010 ) argues that the feature that distinguishes effective institutions from less effective schools is their strategic use of available funding to support a campus ethos devoted to student success ( p. 14 ) .

At least a decade of research and writing has demonstrated that learning technologies , when used appropriately , can help educators adopt the seven principles of good practice in undergraduate education ( Chickering & Gamson , 1987 ) and improve the overall quality of an institution 's educational delivery ( Chickering & Ehrmann , 2002 ) .

As a consequence , having a detailed account of the interactions that occur in a learning experience will likely offer a good predictor of academic performance , which itself is one of the most important aspects of an educational institution .

Another interesting common topic in the two articles is the evidence of predictors of the quality of education. The two articles recognize different predictors.

Table 4. Content analysis of 2 papers in the LAK collection

 

edm11_25

Lak_11_02

Comment

CONTRAST(However , besides distinguishing variables as potential causes and effects , the domain knowledge also restricts the set of variables to be considered as confounders and mediators .

SURPRISE_CONTRAST(Interestingly , adding domain knowledge can also address the problem of multicollinearity .

CONTRAST(Even if researchers are skeptical of the domain knowledge we have brought to bear and are dubious of the causal modeling assumptions , it is still possible to consider Figure 1 without the assumption that the edges represent causality .

CONTRAST(Working from expert domain knowledge , we might seek ways in which data - driven methods can fruitfully address this problem , providing specifications useful for causal discovery and for the learning analytics and educational research community at large .

Both papers mention relationships between expert domain knowledge and causal modeling.

CONTRAST(But those inferences cannot be claimed as accurate since causal modeling cannot identify spurious association caused by unobserved confounders and there are multiple Markov equivalent causal models that can be generated from the same data .

SUMMARY(If we relax this assumption and allow that there may be unmeasured or latent confounders of the variables we consider , we can deploy causal discovery algorithms that detect where such confounding may be present .

Both papers mention relationships between confounding and causal modeling.

Table 5. Content analysis of 2 papers in different collections

 

In the case of the pair edm08_18-edm11_19 we did not find any similar content in the two papers. In this case, however, the similarity measure is relatively low, so we might set up a threshold after more advanced testing.

These preliminary tests are a first attempt to evaluate the validity of our method. They have demonstrated that some papers related by our method do show similarity. Moreover we have provided a case where very precise idea-level connection has been pointed out between two papers, which would not have been related by traditional similarity indicators.



One expression (e.g. learning analytics) counts as one word. If a word is present in different expressions in the two titles/keywords, it counts as 0.5.

One expression (e.g. learning analytics) counts as one word. If a word is present in different expressions in the two titles/keywords, it counts as 0.5.

Ultimo aggiornamento ( Mercoledì 03 Aprile 2013 09:37 )