作者: CarlR696 時間: 2023-11-4 01:04 標題: 搵軟件管理HDD/NAS嘅10TB文件
提示: 作者被禁止或刪除 內容自動屏蔽
作者: bongbong3481 時間: 2023-11-4 09:56
留名學野。
作者: eo38cl 時間: 2023-11-4 14:53
本帖最後由 eo38cl 於 2023-11-4 14:54 編輯
Windows下,最好用係Whereisit,但已停止更新,最後一個版本係2014(build 220),官網已買唔到。
然後係WinCatalog、Everything、abeMeda (前身CDWinder)等
作者: CarlR696 時間: 2023-11-4 20:57
提示: 作者被禁止或刪除 內容自動屏蔽
作者: ericauky 時間: 2023-11-4 22:51
咁多ching,搭單問一問,有無prog 係可check hdd 內有無重複的file呢?
作者: CarlR696 時間: 2023-11-5 12:07
提示: 作者被禁止或刪除 內容自動屏蔽
作者: SuperElephant 時間: 2023-11-13 16:31
本帖最後由 SuperElephant 於 2023-11-13 19:46 編輯
回覆 5# ericauky
大型檔案比對需時, 以下答案會對比每個檔案的全部內容(hash), 而非檔案名稱
開cmd打"python --version"然後enter, 如無顯示version, 請安裝Python
新增檔案C:\dedup.py將以下代碼copy+paste落dedup.py然後save
再開cmd, 行: python "C:\dedup.py" "D:\target_directory"
"D:\target_directory" 改成你需要搵重複的路徑
代碼:
- import os
- import hashlib
- import argparse
- def calculate_sha256(file_path):
- with open(file_path, 'rb') as file:
- bytes = file.read()
- readable_hash = hashlib.sha256(bytes).hexdigest()
- return readable_hash
- def find_duplicates(target_directory):
- file_hashes = {}
- for dirpath, dirnames, filenames in os.walk(target_directory):
- for filename in filenames:
- file_path = os.path.join(dirpath, filename)
- file_hash = calculate_sha256(file_path)
- if file_hash in file_hashes:
- file_hashes[file_hash].append(file_path)
- else:
- file_hashes[file_hash] = [file_path]
- duplicates = {k: v for k, v in file_hashes.items() if len(v) > 1}
- return duplicates
- # Parse command-line arguments
- parser = argparse.ArgumentParser(description='Find duplicate files in a directory.')
- parser.add_argument('directory', type=str, help='The target directory.')
- args = parser.parse_args()
- target_directory = args.directory
- duplicates = find_duplicates(target_directory)
- for hash, file_paths in duplicates.items():
- print(f"Duplicate files for hash {hash}:")
- for file_path in file_paths:
- print(f"\t{file_path}")
- print(f"Scan completed. Found {len(duplicates)} duplicate hashes.")
作者: hoho1986 時間: 2023-11-17 09:57
WizTree
可以Export CSV,外置HDD都可以用
作者: Jip仔 時間: 2023-11-17 16:24
Duplicate Cleaner Free
https://www.duplicatecleaner.com/
作者: CarlR696 時間: 2023-11-17 17:16
提示: 作者被禁止或刪除 內容自動屏蔽
作者: CarlR696 時間: 2023-11-17 17:17
提示: 作者被禁止或刪除 內容自動屏蔽
作者: sonichkhk 時間: 2023-11-17 22:07
回覆 5# ericauky
用fclones or ddh
作者: Jip仔 時間: 2023-11-17 23:01
請問樓主你回我喺乜嘢意思
我回5樓(搵重複的file)並非回你喎
作者: s20012797 時間: 2023-11-17 23:42
回覆 ericauky
大型檔案比對需時, 以下答案會對比每個檔案的全部內容(hash), 而非檔案名稱
開cmd打"pyth ...
SuperElephant 發表於 2023/11/13 16:31
比我我會咁寫
DupFileFind.py
- # This script is more robust when dealing with large files and exceptions, and presents the results in a more user-friendly way.
- import os
- import hashlib
- import argparse
- # This function is used to calculate the SHA256 hash of a file
- def calculate_sha256(file_path):
- try:
- # Create a new hashlib.sha256 object
- hash_sha256 = hashlib.sha256()
- # Open the file and read its content
- with open(file_path, 'rb') as file:
- # Read the file content in chunks and update the hash
- for chunk in iter(lambda: file.read(4096), b""):
- hash_sha256.update(chunk)
- # Return the hash in hexadecimal format
- return hash_sha256.hexdigest()
- except Exception as e:
- # If an error occurs while reading the file, print the error message and return None
- print(f"Error reading file {file_path}: {str(e)}")
- return None
- # This function is used to find duplicate files in the target directory
- def find_duplicates(target_directory):
- # Create a dictionary to store the file hashes and their corresponding file paths
- file_hashes = {}
- # Traverse the target directory and all its subdirectories
- for dirpath, dirnames, filenames in os.walk(target_directory):
- # Traverse each file
- for filename in filenames:
- # Get the full path of the file
- file_path = os.path.join(dirpath, filename)
- # Calculate the hash of the file
- file_hash = calculate_sha256(file_path)
- # If the file hash is already in the dictionary, add the file path to the corresponding list
- # Otherwise, create a new list and add the file path to it
- if file_hash:
- if file_hash in file_hashes:
- file_hashes[file_hash].append(file_path)
- else:
- file_hashes[file_hash] = [file_path]
- # Select the hashes that have more than one file path from the dictionary, these are the duplicate files
- duplicates = {k: v for k, v in file_hashes.items() if len(v) > 1}
- return duplicates
- # Parse command-line arguments
- parser = argparse.ArgumentParser(description='Find duplicate files in a directory.')
- parser.add_argument('directory', type=str, help='The target directory.')
- args = parser.parse_args()
- # Get the target directory
- target_directory = args.directory
- # Find duplicate files
- duplicates = find_duplicates(target_directory)
- # Print the information of duplicate files
- for hash, file_paths in duplicates.items():
- print(f"Duplicate files for hash {hash}:")
- for file_path in file_paths:
- print(f"\t{file_path}")
- # Print the completion message
- print(f"Scan completed. Found {len(duplicates)} duplicate hashes.")
作者: SuperElephant 時間: 2023-11-18 15:00
我哋咁多位都知道回覆既意義 亦都係抱著同一理念回po

作者: CarlR696 時間: 2023-11-18 15:43
提示: 作者被禁止或刪除 內容自動屏蔽
作者: SuperElephant 時間: 2023-11-18 16:29
回覆 16# CarlR696
宏觀d睇
我哋既興趣係觀賞一啲實力相差太遠既人
用最高姿態 最深入資訊 最詳細既回覆
反差當中相映成趣
諷刺當中帶點侮辱
明既自然明 epc垃唔垃圾亦控制唔到
回呢d po唔是淨係為咗答問題

作者: rabbit82047 時間: 2023-11-18 17:09
要問就問,要回就回,無程度高低之分
要講搵 google 就唔好問,不如搬埋 chatgpt 出黎
討論區執哂佢算數,個個都高手到唔洗問

