Number of Distinct Substrings in a String

Problem Statement

Given a string s, determine the number of distinct substrings (including the empty substring) of the given string.

A string B is a substring of a string A if B can be obtained by deleting several characters (possible none) from the start of A and several characters (possibly none) from the end of A. Two strings X and Y are considered different if there is at least one index i such that the characters of X at index i is different from the character of Y at index i (X[i] != Y[i]).

Examples

Example 1:

Input: s = "aba"
Output: 6

Explanation: The distinct substrings are:
"a",
"ab",
"ba",
"b",
"aba",
""

Example 2:

Input: s = "abc"
Output: 7

Explanation: The distinct substrings are:
"a",
"ab",
"abc",
"b",
"bc",
"c",
""

Different Approaches

1️⃣ Brute Force Approach

Intuition:

We can find all distinct substrings of a string by starting at each character and extending to the end, creating every possible substring. Storing these substrings in a set filters out duplicates, ensuring unique substrings are counted.

Approach:

Initialize an empty set to store distinct substrings.
Use an outer loop to iterate through each character of the string. This character will serve as the starting point for substrings.
Construct Substrings:
1. For each starting character, use a nested loop to construct all possible substrings beginning from that character.
2. The outer loop sets the starting point, while the inner loop iterates over subsequent characters to form the substring.
3. Add each constructed substring to the set. Since sets only store unique values, duplicates will be automatically filtered out.
After iterating through the string and constructing substrings from each starting character, return the size of the set. This will give the number of distinct substrings of the given string.

Code:

#include <bits/stdc++.h>
using namespace std;

class Solution {
public:
    // To Calculate Distinct Substrings
    int countDistinctSubstring(string s) {
        set<string> substrings;
        
        // Iterate through each character of string
        for (int i = 0; i < s.length(); ++i) {
            string substring = "";
           /* Construct all possible substrings 
           starting from the current character*/
            for (int j = i; j < s.length(); ++j) {
                substring += s[j];
                substrings.insert(substring);
            }
        }
        
        // Include empty substring
        substrings.insert("");
        
        // Return number of distinct substrings
        return substrings.size();
    }
};

int main() {
    Solution solution;
    string input = "abc";
    cout << "Number of distinct substrings: " << solution.countDistinctSubstring(input) << endl;
    return 0;
}

Complexity Analysis:

Time Complexity:O(N^2 X log M) where N is the size of the string and M is the number of elements in the set. Constructing all possible substrings involves O(N^2) operations. Each substring insertion into the set takes O(log M) time due to the balanced tree structure of the set, ensuring uniqueness.
Space Complexity:O(N^3), where N is the length of the string. This complexity arises because there can be up to O(N^2) distinct substrings, and each substring can be up to N characters long. Storing all these substrings requires O(N^2) substrings times O(N) length, resulting in O(N^3) space.
- The set stores all distinct substrings. In the worst-case (when all substrings are unique, such as when all characters in the string are distinct), there are O(n^2) substrings.
- The space required for each substring can be up to O(n) characters.
- Thus, overall space complexity is roughly O(n^3) in the worst-case scenario.

2️⃣ Optimal Approach

Intuition:

Instead of using nested loops in the previous approach, use a Trie data structure which will minimize the number of comparisons required while traversing the substrings. The idea is to store only essential information and reduce redundancy by sharing common prefixes among substrings. This approach is particularly advantageous for long strings with many repeated substrings, as it optimizes space utilization.

Approach:

Start by initializing a root node for the Trie.
Iterate through the input string. For each character:
1. Traverse the Trie, creating new nodes as necessary to represent the substrings formed by the characters seen so far.
Initialize a counter to keep track of the number of distinct substrings.
1. Iterate through all possible starting positions of the substring.
2. Begin from the root node for each substring.
3. For each character in the substring starting from the current position:
  1. Check if the current node has a child node for that character.
  2. If not, insert a new child node and increment the counter, indicating a new substring is found.
  3. Move to the child node corresponding to the current character.
4. Repeat this process for all substrings starting from the current position.
Finally, return the total count of distinct substrings, adding one to account for the empty string.

Code:

#include <iostream>
#include <string>
using namespace std;

class Solution {
public:
    // Definition of TrieNode class for the Trie structure.
    // Each node has an array of 26 pointers (one for each lowercase letter).
    class TrieNode {
    public:
        TrieNode* node[26]; // Pointers to child nodes representing 'a' to 'z'
        
        // Constructor initializes all child pointers to nullptr.
        TrieNode() {
            for (int i = 0; i < 26; i++) {
                node[i] = nullptr;
            }
        }
    };

    // Function to count distinct substrings of a string (including the empty substring)
    int countDistinctSubstring(string s) {
        int count = 0; // This will store the number of distinct substrings

        // Create a root node for the Trie.
        TrieNode* root = new TrieNode();

        // Loop over each starting position in the string.
        for (int i = 0; i < s.length(); i++) {
            TrieNode* node = root; // Start each substring from the root of the Trie
            
            // For each starting position, extend the substring character by character.
            for (int j = i; j < s.length(); j++) {
                int index = s[j] - 'a'; // Convert character to index (0 for 'a', 1 for 'b', etc.)
                
                // If the path for the current character doesn't exist, it's a new substring.
                if (node->node[index] == nullptr) {
                    count++; // Increment distinct substring count
                    node->node[index] = new TrieNode(); // Create a new Trie node for this character
                }
                // Move to the next node in the Trie (for the next character in the substring)
                node = node->node[index];
            }
        }
        // Include the empty substring in the count (+1).
        return count + 1;
    }
};

int main() {
    // Example usage:
    Solution solution;
    string input = "abc";
    cout << "Number of distinct substrings: " << solution.countDistinctSubstring(input) << endl;
    return 0;
}

Complexity Analysis:

Time Complexity:O(N^2) where N is the size of the string. This complexity arises because, for each starting position in the string, we iterate through the remaining characters to form substrings. The nested loops result in a quadratic number of operations.
Space Complexity:O(N^2) where N is the size of the string. The space complexity is due to storing all distinct substrings in the trie. In the worst case, the number of distinct substrings can be quadratic relative to the size of the string, leading to O(N^2) space usage.

Problem Statement

Examples

Different Approaches

1️⃣ Brute Force Approach

Intuition:

Approach:

Code:

Complexity Analysis:

2️⃣ Optimal Approach

Intuition:

Approach:

Complexity Analysis:

About me

Popular Posts

Variadic Function Working in C

Understanding Complex C/C++ Declarations

How Characters are Stored in Memory

Quick links

Tags

Newsletter

🛠️ Settings

Problem Statement

Examples

Different Approaches

1️⃣ Brute Force Approach

Intuition:

Approach:

Code:

Complexity Analysis:

2️⃣ Optimal Approach

Intuition:

Approach:

Complexity Analysis:

Variadic Function Working in C

Understanding Complex C/C++ Declarations

How Characters are Stored in Memory