banner
JackYoung

JackYoung

生活、摄影、写作、代码。
github
bilibili
email

Detect Duplicate IP

This is an old problem. In the company's products, it is often necessary to check whether newly added IPs already exist, so an algorithm for detecting duplicate IPs is needed. Here is just a transfer of algorithm ideas.

Question: Given an IP or a network segment, how to determine if the IP or the network segment already exists in the database?

  1. The simplest and most brute-force algorithm is to list all network segments, convert them to individual IPs, and then compare them one by one. This solution is the most common method and does not need further explanation. The advantage is that it is simple and easy to understand, but the disadvantage is that it consumes more resources.
  2. Slightly optimized, each IP can be treated as a numerical value (convert binary to decimal), and only the starting and ending values need to be recorded at the line segment. Then, compare the numerical values of the IPs that need to be checked for duplicates. Each line segment is a group, and they can be compared one by one. Advantages: Optimized storage resources compared to Solution 1 (O(n)), Disadvantages: Multiple traversals and comparisons (T(n)).
    ip-1
  3. Further optimization, if we only record the starting and ending points for a segment, and put all network segments on the same line segment. The starting point is marked as 1 and the ending point is marked as -1. Starting from the beginning of this line segment, add up the values each time by taking the first two values,
  • For all non-overlapping network segments, the sum is 0.
  • For overlapping network segments, the sum is not 0.
    ip-2

Here is the code for the key steps:

def get_first_last_ip(ip_cidr:str) -> list:
    try:
        ip, cidr = ip_cidr.split('/')
        octets = ip.split('.')
        subnet_bits = int(cidr)

        # Convert the IP address to a 32-bit binary string
        ip_binary = ''.join([bin(int(octet))[2:].zfill(8) for octet in octets])

        # Calculate the binary strings of the network address and broadcast address based on the subnet bits
        network_binary = ip_binary[:subnet_bits] + '0' * (32 - subnet_bits)
        broadcast_binary = ip_binary[:subnet_bits] + '1' * (32 - subnet_bits)

        # Convert the binary strings back to IP address format
        start_ip = '.'.join([str(int(network_binary[i:i+8], 2)) for i in range(0, 32, 8)])
        end_ip = '.'.join([str(int(broadcast_binary[i:i+8], 2)) for i in range(0, 32, 8)])

        return [start_ip, end_ip]
    except ValueError:
	    # logging.error("Invalid IP format")
        return [None, None]

def check_ip_cover(ip:str) -> bool:
    dic = build_ip_dic()
    try:
        if check_ip(ip) and valid_ip_netmask(ip):
            [first_ip, last_ip] = get_first_last_ip(ip)
            first_ip_num = ip_2_int(first_ip)
            last_ip_num = ip_2_int(last_ip)
            if first_ip_num not in dic:
                dic[first_ip_num] = 1
            else:
                logging.info("IP {} is repeated.".format(first_ip))
                return False
            if last_ip_num not in dic:
                dic[last_ip_num] = -1
            else:
                logging.info("IP {} is repeated.".format(last_ip))
                return False
            res = 0
            ## Sort the dictionary
            sorted_dic = sorted(dic.keys())
            ## Perform operations on two values from the dictionary each time
            ## If the sum is not 0, exit
            ip_key_iter = iter(sorted_dic.keys())
            while True:
                try:
                    ip_1 = next(ip_key_iter)
                    ip_2 = next(ip_key_iter)
                    value_1 = sorted_dic[ip_1]
                    value_2 = sorted_dic[ip_2]
                    if value_1 + value_2 != 0:
                        return False
                except StopIteration:
                    if ip_1:
                        return False
            return True
    except Exception as e:
        logging.error("Some wrong in here: {}".format(e))

This algorithm is quite clever. It only needs to look at the starting and ending IPs to check for duplicates among a large number of IPs.

7/13 Update:
There were some logical issues with the previous content, so I made some slight modifications.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.