Low Dimensional Embeddings in Language Processing

Abhilash Majumder (~abhilash97)


12

Votes

Description:

Abstract

The presentation and the talk would be based on the mathematical and computational world of embeddings. Embeddings in generic language processing is the most important aspect to model intents, references , and attach "meaning" to textual words in a corpus. The SOTA language models from transformer based GPT /BERT to graphical counterparts such as Graph2 Vec provide an insight into this vast Reimanian space in manifold learning .The talk would encompass certain important attributes which goes into decision while making an embedding algorithm and to reduce the dimensions in generic machine learning. This talk would be mostly on the computations, mathematics and diversity of the embedding algorithms with their visualizations using the Python language through the use of Deep Manifold Learning.

Outline

Introductory Statistical Metrics The project /presentation starts with a brief mathematical model of graph topology,centrality ,node metrics ,high order ranking algorithms because there is a big portion allocated to graphical non linear embeddings.

Vectorized Word Embeddings Embeddings are intrinsically semantic meaning of a word in a corpus and it is extensively used in most language processing tasks and effective in feature representation,coreference resolution . The presentation is a division of vectorized embeddings which contains both static and dynamic pretrained embeddings and include models like Word2vec,Glove,ELMO .The reason of lowering dimensions to retrieve important information is a mathematical problem when it comes to higher order convex polynomials, generally when data is encoded as a one hot encoded manner ,it becomes very infeasible to process tensors having sparse matrices to have a faster convergence without risking local curvature optima. Vectorized models appear after many matrix decomposition /transformation techniques with the main aim of lowering the dimensionality. Word2vec variants such as doc2vec,node2vec and many other algorithmic paradigms are present.

Graph Embeddings The next section involves graphical embeddings which is non linear and uses node/subgraph importance for feature representations. This is a very detailed section containing SOTA graph embedding models- node2vec,graph2vec,SDNE,LINE,HARP and many more. The traditional architecture of graphs make it a sufficient module to explore different metrics based on normalization of weights, semantic weight matching. Applying non linear log likelihood models based on graph distributions has been explored to provide diverse results in network representation,feature mapping.

Transformer (DL) based Embeddings A further section on attention based graph models which uses Bi LSTMs with Skipgram based log likelihood objective functions is provided in details. The third section involves the static and dynamic embeddings which contains BERT/Open AI GPT /2, ULM, TransformerXl which heavily relies on an architecture based on transformers and attention layers to generate semantic meaning, sequence tagging, question answer and many other downstream tasks. This section provides model architectures with respect to transformers ,and self or bidirectional cross attention mechanisms which provide a further boost in generating embeddings.

Statistical Embeddings The fourth section is a statistical section involving exponential families and how embeddings can arise from both context and individual words and passed into Bernoulli family of distributions. Extensively these type of embeddings try to apply statistical distributions and patters to provide embedding vectors and context vectors conditioned by a probability distribution.Based on the kernel/basis or family, the embedding can be Gaussian,Poisson,non Gaussian and Bernoulli

Generic Topic Modelling The next and the concluding section is about topic modelling which tries to emphasize the importance of LSA and LDA as the major algorithmic paradigms. LSA is a cooccurrence frequency matrix index which tries to have polysemy and synonymy by examining positional frequencies and later is decomposed into submatrices by SVD . LDA on the other hand is a mixture model which tries to allocate Dirichlet multivariate continuous distribution pattern with negative sampled skipgram technique.

Additional Mathematical Concepts There are lots of mathematical terms which are included and these will be explained in course of time. The non linear embeddings which rely on harder concepts of Laplacian eigen maps,isomaps and other sophisticated concepts such as WL network or KL divergence have explanations as well.The information of mathematical non Euclidean geometry in embeddings such as hyperbolic skipgram has also been provided which helps in understanding a part of the graph models.

Aim of the Presentation and Takeaways The major aim of the presentation is to provide a general overview as to how the embeddings shape up in real life language modelling and compare different techniques.For simplicity ,GCN/GNN architectures have been avoided as it falls in another paradigm. This will be open sourced with resources such as code base and papers along with mathematical models as the time approaches.

Prerequisites:

Sufficiently strong in higher order calculus, topology,well versed in graph theory.

Has a strong grasp on python programming, ability to design networks - using either frameworks(pytorch or keras/tf)

Since this course is entirely mathematical , concepts such as non linear learning,exponential distributions are a good to have feature

Understands the basics of language processing pipeline.

Passionate about mathematical modelling,graph theory,statistics and networks

Passionate about core neural network architecture modelling in Python.

Knows or has familiarity with the source codes of Tensorflow/Pytorch (basic familiarity)-great to have

Should have a passion to love general machine learning.

Video URL:

https://youtu.be/FPdg7HRjoB4

Content URLs:

Slide Presentation: https://drive.google.com/file/d/1HqaQEwI7nSBfmlPz2VLOPOxPWp6dsraF/view?usp=drive_open

Source Code- Private repository will be shared at event. (Contains papers,researches,handwritten custom codes)

Speaker Info:

The speaker ,Abhilash Majumder, is a NLP research engineer for HSBC (UK/India), Technical Mentor for Udacity (NLP,DL), a prior High performance graphics intern at Unity Technologies(S.F,USA) and a mentor for Upgrad (India).

The speaker is a contributor for Google Research for language models ,Tensorflow and is a maintainer for ALBERT . Also implemented other language models and implemented papers with support from Google research. Prior to that, he was a contributor for Khronos group (OpenGL pipeline). Has worked on game engines,accelerated computing,CUDA architecture.Was a prior intern for Singapore Airlines in VR systems. Technical speaker at Unites 19,18 (graphics ,RL), several dev fests regarding NLP. Creator of libraries at Pypi related to semantic processing,convex optimizations and custom generative network. The speaker worked closely with the Deep Reinforcement Learning team of Unity (OpenAI Gym framework).

The speaker is a graduate from National Institute of Technology,Durgapur (NITD) with majors in NLP,theoretical ML ,Applied Mathematics,Analysis and Algorithms.He is an erstwhile second preference candidate in GSoc '19 for Blender (Embee BVH GPU for Intel- contributions post GSOC).

Speaker Links:

LinkedIn: https://www.linkedin.com/in/abhilash-majumder-1aa7b9138/

Twitter: https://twitter.com/abhilash1396

Github: https://github.com/abhilash1910

Pypi Library: https://pypi.org/user/abhilash1910/

Section: Data Science, Machine Learning and AI
Type: Talks
Target Audience: Advanced
Last Updated: