LLM Probabilities, Training Size, and Perturbation Thresholds in Entity Recognition

tldt arrow

Too Long; Didn't Read

This section details key training parameters for entity recognition models, including the impact of training size and perturbation thresholds on performance, and provides insights into the use of LLM probabilities.

People Mentioned

Mention Thumbnail
featured image - LLM Probabilities, Training Size, and Perturbation Thresholds in Entity Recognition
EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Authors:

(1) Anthi Papadopoulou, Language Technology Group, University of Oslo, Gaustadalleen 23B, 0373 Oslo, Norway and Corresponding author (anthip@ifi.uio.no);

(2) Pierre Lison, Norwegian Computing Center, Gaustadalleen 23A, 0373 Oslo, Norway;

(3) Mark Anderson, Norwegian Computing Center, Gaustadalleen 23A, 0373 Oslo, Norway;

(4) Lilja Øvrelid, Language Technology Group, University of Oslo, Gaustadalleen 23B, 0373 Oslo, Norway;

(5) Ildiko Pilan, Language Technology Group, University of Oslo, Gaustadalleen 23B, 0373 Oslo, Norway.

Abstract and 1 Introduction

2 Background

2.1 Definitions

2.2 NLP Approaches

2.3 Privacy-Preserving Data Publishing

2.4 Differential Privacy

3 Datasets and 3.1 Text Anonymization Benchmark (TAB)

3.2 Wikipedia Biographies

4 Privacy-oriented Entity Recognizer

4.1 Wikidata Properties

4.2 Silver Corpus and Model Fine-tuning

4.3 Evaluation

4.4 Label Disagreement

4.5 MISC Semantic Type

5 Privacy Risk Indicators

5.1 LLM Probabilities

5.2 Span Classification

5.3 Perturbations

5.4 Sequence Labelling and 5.5 Web Search

6 Analysis of Privacy Risk Indicators and 6.1 Evaluation Metrics

6.2 Experimental Results and 6.3 Discussion

6.4 Combination of Risk Indicators

7 Conclusions and Future Work

Declarations

References

Appendices

A. Human properties from Wikidata

B. Training parameters of entity recognizer

C. Label Agreement

D. LLM probabilities: base models

E. Training size and performance

F. Perturbation thresholds

A Human properties from Wikidata

The two tables below show the selected Wikidata properties mentioned in Section 4.1 that constitute the DEM and MISC gazetteers.

Table 8:


Table 8: (Continued)


Table 8: (Continued)


Table 8: (Continued)


Table 8: (Continued)

Table 9:


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


Table 9: (Continued)


B Training parameters of entity recognizer


Table 10 details the parameters employed to train the privacy-oriented entity recognition model from Section 4.


C Label Agreement

Frequently confused label pairs (see Section 4.4) are shown in Figure 4.

D LLM probabilities: base models

Table 11 describes the (ordered) based models the Autogluon tabular predictor employs for the LLM-probability based approach of Section 5.1

E Training size and performance

Figure 5 shows the F1 score of both the Tabular and the Multimodal Autogluon predictors (LLM probabilities Section 6.3 and span classification Section 6.3 respectively) at different training sizes for both datasets. We use a random sample of 1% to 100% for each training dataset split.

F Perturbation thresholds

Figure 6 shows the performance of different perturbation thresholds for both datasets for the training dataset split, with the black line indicating the threshold used in Section 5.3 for evaluation.



Fig. 4: Most common label confusion pairs common in the test sets of the annotated Wikipedia biographies and the TAB corpus. The first element of the pair corresponds to the gold standard label and the second to the output from the entity recognizer.




Table 11: Base models of the Tabular predictor in the order they are trained when using the AutoGluon library. This order based on training time and reliability to ensure efficient training time (Erickson et al., 2020).




Fig. 5: Performance of the tabular and multimodal predictors when different training sizes are used during training. We report the F1 score for the annotated Wikipedia test dataset and the TAB test dataset as well.




Fig. 6: Precision and recall score for direct and quasi identifiers at different thresholds of probability difference for the Wikipedia and TAB train datasets. The black line indicates the threshold where the cost function is maximized. This is approximately 3.5 for Wikipedia and 10 for TAB)




This paper is available on arxiv under CC BY 4.0 DEED license.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks