Problem Statement
Given a string s
, determine the number of distinct substrings (including the empty substring) of the given string.
A string B
is a substring of a string A
if B
can be obtained by deleting several characters (possible none) from the start of A
and several characters (possibly none) from the end of A
. Two strings X
and Y
are considered different if there is at least one index i
such that the characters of X
at index i
is different from the character of Y
at index i
(X[i] != Y[i]
).
Examples
Example 1:
Input: s = "aba"
Output: 6
Explanation: The distinct substrings are:
"a",
"ab",
"ba",
"b",
"aba",
""
Example 2:
Input: s = "abc"
Output: 7
Explanation: The distinct substrings are:
"a",
"ab",
"abc",
"b",
"bc",
"c",
""
Different Approaches
1️⃣ Brute Force Approach
Intuition:
We can find all distinct substrings of a string by starting at each character and extending to the end, creating every possible substring. Storing these substrings in a set filters out duplicates, ensuring unique substrings are counted.
Approach:
- Initialize an empty set to store distinct substrings.
- Use an outer loop to iterate through each character of the string. This character will serve as the starting point for substrings.
- Construct Substrings:
- For each starting character, use a nested loop to construct all possible substrings beginning from that character.
- The outer loop sets the starting point, while the inner loop iterates over subsequent characters to form the substring.
- Add each constructed substring to the set. Since sets only store unique values, duplicates will be automatically filtered out.
- After iterating through the string and constructing substrings from each starting character, return the size of the set. This will give the number of distinct substrings of the given string.
Code:
#include <bits/stdc++.h>
using namespace std;
class Solution {
public:
// To Calculate Distinct Substrings
int countDistinctSubstring(string s) {
set<string> substrings;
// Iterate through each character of string
for (int i = 0; i < s.length(); ++i) {
string substring = "";
/* Construct all possible substrings
starting from the current character*/
for (int j = i; j < s.length(); ++j) {
substring += s[j];
substrings.insert(substring);
}
}
// Include empty substring
substrings.insert("");
// Return number of distinct substrings
return substrings.size();
}
};
int main() {
Solution solution;
string input = "abc";
cout << "Number of distinct substrings: " << solution.countDistinctSubstring(input) << endl;
return 0;
}
Complexity Analysis:
- Time Complexity:
O(N^2 X log M)
whereN
is the size of the string andM
is the number of elements in the set. Constructing all possible substrings involvesO(N^2)
operations. Each substring insertion into the set takesO(log M)
time due to the balanced tree structure of the set, ensuring uniqueness. - Space Complexity:
O(N^3)
, whereN
is the length of the string. This complexity arises because there can be up toO(N^2
) distinct substrings, and each substring can be up toN
characters long. Storing all these substrings requiresO(N^2)
substrings timesO(N)
length, resulting inO(N^3)
space.- The set stores all distinct substrings. In the worst-case (when all substrings are unique, such as when all characters in the string are distinct), there are
O(n^2)
substrings. - The space required for each substring can be up to
O(n)
characters. - Thus, overall space complexity is roughly
O(n^3)
in the worst-case scenario.
- The set stores all distinct substrings. In the worst-case (when all substrings are unique, such as when all characters in the string are distinct), there are
2️⃣ Optimal Approach
Intuition:
Instead of using nested loops in the previous approach, use a Trie data structure which will minimize the number of comparisons required while traversing the substrings. The idea is to store only essential information and reduce redundancy by sharing common prefixes among substrings. This approach is particularly advantageous for long strings with many repeated substrings, as it optimizes space utilization.
Approach:
- Start by initializing a root node for the Trie.
- Iterate through the input string. For each character:
- Traverse the Trie, creating new nodes as necessary to represent the substrings formed by the characters seen so far.
- Initialize a counter to keep track of the number of distinct substrings.
- Iterate through all possible starting positions of the substring.
- Begin from the root node for each substring.
- For each character in the substring starting from the current position:
- Check if the current node has a child node for that character.
- If not, insert a new child node and increment the counter, indicating a new substring is found.
- Move to the child node corresponding to the current character.
- Repeat this process for all substrings starting from the current position.
- Finally, return the total count of distinct substrings, adding one to account for the empty string.
Code:
#include <iostream>
#include <string>
using namespace std;
class Solution {
public:
// Definition of TrieNode class for the Trie structure.
// Each node has an array of 26 pointers (one for each lowercase letter).
class TrieNode {
public:
TrieNode* node[26]; // Pointers to child nodes representing 'a' to 'z'
// Constructor initializes all child pointers to nullptr.
TrieNode() {
for (int i = 0; i < 26; i++) {
node[i] = nullptr;
}
}
};
// Function to count distinct substrings of a string (including the empty substring)
int countDistinctSubstring(string s) {
int count = 0; // This will store the number of distinct substrings
// Create a root node for the Trie.
TrieNode* root = new TrieNode();
// Loop over each starting position in the string.
for (int i = 0; i < s.length(); i++) {
TrieNode* node = root; // Start each substring from the root of the Trie
// For each starting position, extend the substring character by character.
for (int j = i; j < s.length(); j++) {
int index = s[j] - 'a'; // Convert character to index (0 for 'a', 1 for 'b', etc.)
// If the path for the current character doesn't exist, it's a new substring.
if (node->node[index] == nullptr) {
count++; // Increment distinct substring count
node->node[index] = new TrieNode(); // Create a new Trie node for this character
}
// Move to the next node in the Trie (for the next character in the substring)
node = node->node[index];
}
}
// Include the empty substring in the count (+1).
return count + 1;
}
};
int main() {
// Example usage:
Solution solution;
string input = "abc";
cout << "Number of distinct substrings: " << solution.countDistinctSubstring(input) << endl;
return 0;
}
Complexity Analysis:
- Time Complexity:
O(N^2)
where N is the size of the string. This complexity arises because, for each starting position in the string, we iterate through the remaining characters to form substrings. The nested loops result in a quadratic number of operations. - Space Complexity:
O(N^2)
where N is the size of the string. The space complexity is due to storing all distinct substrings in the trie. In the worst case, the number of distinct substrings can be quadratic relative to the size of the string, leading to O(N^2) space usage.