### @benitorosenbergBenedek Rozemberczki

PhD student at the University of Edinburgh studying machine learning

**Karate club** is an unattended machine learning extension library for the NetworkX Python package. See the documentation **Here.**

The Karate Club consists of the most modern methods for unsupervised learning on graphically structured data. Simply put, it’s a Swiss Army Knife for small-scale graph mining research. First, network embedding techniques are provided at the node and diagram level. Second, it includes a variety of overlapping and non-overlapping community discovery methods. The implemented methods cover a wide range of conferences, workshops and contributions by well-known scientists (NetSci, Complenet), data mining (ICDM, CIKM, KDD), artificial intelligence (AAAI, IJCAI) and machine learning (NeurIPS, ICML, ICLR) Magazines.

**A simple example**

Karate Club makes using modern community recognition techniques pretty easy (see here for the related tutorial). The following snippet uses an overlapping community detection algorithm for a synthetic chart.

import networkx how nx

of Karate club import EgoNetSplitter g = nx.newman_watts_strogatz_graph (1000, 20th, 0.05) splitter = EgoNetSplitter (1.0) splitter.fit (g) print (splitter.get_memberships ())

**Design principles**

When we started Karate Club, we used an API-oriented machine learning system design to create an end-user-friendly machine learning tool. This API-oriented design principle involves a few simple ideas. In this section, these ideas and their obvious advantages are discussed in detail using suitable illustrative examples.

**Encapsulated model hyperparameters and inspection**

An unattended Karate Club model instance is created using the constructor of the appropriate Python object. This constructor has a default setting for hyperparameters that allows useful out-of-the-box model usage. In simple terms, this means that the end user does not need to understand the mechanics of the inner model in detail in order to use the methods implemented in our framework.

We set these default hyperparameters to achieve adequate learning and runtime performance. If necessary, these model hyperparameters can be changed at the time of model creation with the corresponding parameterization of the constructor. The hyperparameters are saved as public attributes to enable the model settings to be checked.

import networkx how nx

of Karate club import DeepWalk graph = nx.gnm_random_graph (100, 1000) model = DeepWalk () print (model.dimensions) model = DeepWalk (dimensions =64) print (model.dimensions)

We demonstrate the encapsulation of hyperparameters using the code snippet above. First, we would like to embed an Erdos-Renyi diagram generated by NetworkX with the default settings for hyperparameters.

When the model is built, we do not change these default hyperparameters and can print the default dimension hyperparameters. Second, we decided to set a different number of dimensions so that we would have created a new model and still have public access to the dimensions hyperparameter.

**Consistency and non-proliferation of classes**

Each unsupervised model of machine learning in the karate club is implemented as a separate class that inherits from the **Estimator class**. Algorithms implemented in our framework have a limited number of public methods because we do not assume that the end user will be particularly interested in the algorithmic details related to a particular technique.

All models come with the **fit()** Method that uses the inputs (graphics, node functions) and calls the appropriate private methods to learn embedding or clustering. Node and diagram embeddings are returned by the **get_embedding ()** Public method and cluster memberships are obtained by calling **get_memberships ()****.**

import networkx how nx

of Karate club import DeepWalk graph = nx.gnm_random_graph (100, 1000) model = DeepWalk () model.fit (graph) embedding = model.get_embedding ()

In the above snippet we are creating a random chart and **DeepWalk**** **Model with the default hyperparameters, we adapt this model using the public **fit()** Method and return the embed by calling the public **get_embedding ()** Method.

This example can be modified to create one **Walklets**** **Embed with minimal effort by changing the model import and constructor – these changes result in the following snippet.

import networkx how nx

of Karate club import Walklets graph = nx.gnm_random_graph (100, 1000) model = Walklets () model.fit (graph) embedded = model.get_embedding ()

If you look at these two snippets, that’s the advantage of the** API driven design** is obvious as we just had to make a few changes. First, the import of the embedding model had to be changed. Second, we had to change the model design and the default hyperparameters were already set.

Third, the public methods of **DeepWalk **and **Walklets** Classes behave the same way. Embedding is also learned **fit() **and it is returned by **get_embedding ().** This allows for quick and minimal changes to code when an unattended upstream model used for feature extraction is performing poorly.

**Standardized data record recording**

We designed the Karate Club in such a way that a standardized data record is used when adapting a model. In practical terms, this means that algorithms with the same purpose use the same data types for model training. In detail:

- Neighborhood-based and structural knot embedding techniques use a single one
**NetworkX diagram**as input for the adjustment method. - Associated node embedding procedures require a
**NetworkX diagram**as input and the features are represented as**NumPy array**or as**Low density SciPy matrix**. In these matrices, rows correspond to nodes and columns correspond to features. - Graphic level embedding methods and statistical fingerprinting of graphs are required
**List of NetworkX diagrams**as input. - Use community discovery methods a
**NetworkX diagram**as input.

**High performance model mechanics**

The underlying mechanisms of the graph mining algorithms have been implemented using widely used Python libraries that are not dependent on the operating system and do not require any other external libraries such as **TensorFlow **or **PyTorch** does. The internal graphics in the Karate Club are used **NetworkX**.

Dense linear algebra operations are also carried out **NumPy **and use their sparse counterparts **SciPy**. Implicit matrix factorization techniques use the **GenSim **Package and methods based on the processing of graph signals **PyGSP.**

**Standardized output generation and interface**

The standardized output generation of **Karate club** ensures that unattended learning algorithms that serve the same purpose always return the same type of output with a consistent data point order.

There is a very important consequence of this design principle. If the same type of algorithm is substituted for a particular type of algorithm, there is no need to change the downstream code that uses the output of the upstream unattended model. In particular, the outputs generated with our framework use the following data structures:

**Algorithms for Embedding Nodes**(neighborhood preserving, attributed and structural) always return a**NumPy float array**if the**get_embedding ()**Method is called. The number of rows in the array is equal to the number of vertices, and the row index is always the same as the vertex index. In addition, the number of columns is the number of embedding dimensions.**Embedding methods for whole graphs**(spectral fingerprints, implicit matrix factorization techniques) return a**Numpy float array**if the**get_embedding ()**Method is called. The row index corresponds to the position of an individual diagram in the list of entered diagrams. Columns represent the embedding dimensions in the same way.**Community discovery process**Return a**dictionary**if the**get_memberships ()**Method is called. Node indexes are keys, and the values corresponding to the keys are the community memberships of vertices. Certain graph clustering techniques create a node embedding to find vertex clusters. These return a**NumPy float array**if the**get_embedding ()**Method is called. This array is structured as it is returned by node embedding algorithms.

We demonstrate the standardized output generation and interface through the following code fragment. We create clusters of a random diagram and return dictionaries containing the cluster memberships. Using the external community library, we can calculate the modularity of these clusters.

This shows that the standardized output generation simplifies the connection to external graph mining and machine learning libraries.

import Community

import networkx how nx

of Karate club import LabelPropagation, SCD graph = nx.gnm_random_graph (100, 1000) model = SCD () model.fit (graph) scd_memberships = model.get_memberships () model = LabelPropagation () model.fit (graph) lp_memberships = model.get_memberships () print (community.modularity (scd_memberships, graph)) print ( community.modularity (lp_memberships, graph))

**restrictions**

The current design of the karate club has certain limitations and we make assumptions about the input. We assume that the **NetworkX **The graph is undirected and consists of a single one **strongly connected component**. All algorithms assume that they are nodes **continuously indexed with whole numbers** and the starting node index is 0. We also assume that the graph is not multipart, the nodes are homogeneous, and the edges are unweighted (each edge has a unit weight).

In the case of the embedding algorithms for the whole chart, all charts in the chart set must change the input requirements listed earlier. The **Weisfeiler-Lehman** Feature-based embedding techniques allow nodes to have a single string feature that can be accessed by **feature **Key. Without this key, these algorithms use degree centrality as a nodal characteristic by default.

Read my stories

PhD student at the University of Edinburgh studying machine learning

similar posts

#### Keywords

Join Hacker Noon

Create your free account to unlock your custom reading experience.

**
**
Comments are closed.